1
Mixture of Experts for Low-Resource LLMs
揭秘低资源语言大模型中的专家路由:对比Transformer与Mamba混合架构的MoE表现。
arXiv:2605.17598v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) architectures enable efficient model scaling, yet expert routing behavior acr…