SHAOJIE'S BOOK

Posted 2026-05-19Updated 2026-07-03Artificial Intelligence11 minutes read (About 1601 words)

导言

RL 中的 rollout 不是普通离线推理。它不仅要生成 response，还要和训练阶段共享策略版本、返回 token 级信息，并参与后续 logprob、reward 和 advantage 计算。

因此 vLLM 图模式也不能只写成“开不开 CUDA Graph”。在 verl rollout 里，enforce_eager、compilation_config.cudagraph_mode 和 cudagraph_capture_sizes 共同决定性能、显存、capture 成本和兼容性。

Categories

Subscribe for updates

follow.it

Links

Recents

Archives

Tags