SHAOJIE'S BOOK

Posted 2026-07-01Updated 2026-07-03Artificial Intelligence20 minutes read (About 3031 words)

导言

GMM 在 Qwen3.5 MoE 里的接入点是 routed experts 的两次矩阵乘：hidden -> gate/up 和 intermediate -> hidden。shared_expert 仍是普通 Qwen3_5MoeMLP，attention 不动，Dense 版 Qwen3.5 的普通 MLP 也不是替换对象。

PR #2664 的公开 diff 主要是给 mindspeed_mm.fsdp.ops.moe_ops.gemm.grouped_matmul 增加 fused/eager 一致性 UT，并放宽 unpermute UT 容差；它可以作为 GMM wrapper 接口被测试覆盖的证据，不能写成完整功能接入 PR。[^gmm-pr-api][^gmm-pr-files]

Categories

Subscribe for updates

follow.it

Links

Recents

Archives

Tags