1
DOT-MoE: Differentiable Optimal Transport for MoEfication
用可微分最优传输优化MoE架构,提升大模型推理效率且训练更稳定。
arXiv:2606.01666v1 Announce Type: new Abstract: The scaling of Large Language Models (LLMs) has driven significant performance gains but created subst…