Posted 2026-02-27Updated 2026-02-27Artificial Intelligence2 minutes read (About 278 words)Business Trip: 2601-2602 verl + DanceGRPO 导言 ZJ内部出差,从0到1完成verl + MindSpeed MM + DanceGRPO算法的 t2v RL,达成reward快速持续上升。 Read more
2026-02-05The Mechanics of RL: How Inference Sampling Shapes the Probability LandscapeArtificial Intelligence