1
PALS: Power-Aware LLM Serving for Mixture-of-Experts Models
MoE模型功耗优化新方案,将GPU功率从静态约束变为可控资源,提升能效
arXiv:2605.21427v1 Announce Type: new Abstract: Large language model (LLM) inference has become a dominant workload in modern data centers, driving si…