1
ReMoE: Boosting Expert Reuse through Router Fine-Tuning in Memory-Constrained MoE LLM Inference
针对内存受限MoE大模型推理的优化方案,通过微调路由器实现专家重用,在Jetson Orin NX上获得1.77-1.99倍解码加速。
arXiv:2605.27081v1 Announce Type: cross Abstract: Fine-grained Mixture-of-Experts (MoE) models sparsely activate only a subset of experts per token, r…