1
$\phi$-Balancing for Mixture-of-Experts Training
新框架直接优化群体级专家平衡,解决MoE训练中负载均衡偏差问题。
arXiv:2605.15403v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) models rely on balanced expert utilization to fully realize their scalability…
新框架直接优化群体级专家平衡,解决MoE训练中负载均衡偏差问题。
arXiv:2605.15403v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) models rely on balanced expert utilization to fully realize their scalability…