1
Beyond Sunk Costs: Boosting LLM Pre-training Efficiency via Orthogonal Growth of Mixture-of-Experts
全新方法利用MoE正交生长,大幅节省LLM预训练成本,突破沉没成本陷阱。
arXiv:2510.08008v2 Announce Type: replace Abstract: As the computational demands for pre-training Large Language Models (LLMs) continue to surge, the …