1
L2R: Low-Rank and Lipschitz-Controlled Routing for Mixture-of-Experts
创新MoE路由方法,结合低秩分解与Lipschitz控制提升专家专化性和模型性能。
arXiv:2601.21349v2 Announce Type: replace-cross Abstract: Mixture-of-Experts (MoE) models scale neural networks by conditionally activating a small su…