SHAOJIE'S BOOK

Posted 2026-05-19Updated 2026-07-03Artificial Intelligence4 minutes read (About 648 words)

导言

这篇文章聚焦 verl 的训练链路：RayPPOTrainer.fit() 如何组织 rollout、reward、logprob、ref 和 actor update，以及这些阶段如何通过 worker 和 DataProto 串起来。