SHAOJIE'S BOOK

Posted 2026-06-30Updated 2026-07-03Artificial Intelligence11 minutes read (About 1706 words)

导言

RL rollout 中的 speculative decoding 不是普通推理加速的简单移植。普通 serving 只关心 latency、throughput 和用户体验；RL rollout 还必须保证 response、old logprob、reward、advantage 和 policy loss 都对应同一个 verifier policy。

换句话说，draft model 可以帮助系统更快地产生候选 token，但训练语义必须仍然属于 target / verifier policy。

Posted 2026-06-30Updated 2026-07-03Artificial Intelligence9 minutes read (About 1298 words)

VeRL Feature Survey

导言

这篇文章现在作为 verl / RL infra 特性地图：把 vLLM 图模式、speculative decoding、router replay、FullAsync / AsyncFlow 和 TransferQueue 放到同一张系统图里，但不再承载所有细节。

核心结论仍然是：这些特性不在同一层。 有的减少推理执行开销，有的解决 decode 串行性，有的保证 MoE 路由一致性，有的把 rollout 与训练重叠，有的把数据从 single controller 中解耦。真正的收益来自先定位瓶颈，再打开对应特性。

Categories

Subscribe for updates

follow.it

Links

Recents

Archives

Tags