Strong Teacher Not Needed? On Distillation in LLM Pretraining
颠覆认知?弱教师模型也能有效蒸馏LLM,预训练阶段教师强度并非关键。
arXiv:2605.23857v1 Announce Type: new Abstract: Knowledge distillation generally assumes a strong-to-weak relationship where stronger teachers yield b…
颠覆认知?弱教师模型也能有效蒸馏LLM,预训练阶段教师强度并非关键。
arXiv:2605.23857v1 Announce Type: new Abstract: Knowledge distillation generally assumes a strong-to-weak relationship where stronger teachers yield b…
多模态大模型训练新范式:阶段感知稀疏性动态消除冗余,大幅提升效率而保持性能。
arXiv:2509.18150v2 Announce Type: replace Abstract: Multimodal Large Language Models (MLLMs) have demonstrated outstanding performance across a variet…
提出Token叠加技术,颠覆预训练效率瓶颈,大幅降低算力需求,LLM训练优化必读。
arXiv:2605.06546v2 Announce Type: replace Abstract: Pre-training of Large Language Models is often prohibitively expensive and inefficient at scale, r…
提出异构感知数据集调度方法,提升音频大模型训练效率与效果的新方案。
arXiv:2605.19101v1 Announce Type: cross Abstract: Training general-purpose Audio Large Language Models (ALLMs) across diverse datasets is essential fo…
突破大模型长上下文推理瓶颈,百步内将全注意力高效转为稀疏,平衡效率与精度。
arXiv:2605.16928v1 Announce Type: new Abstract: Long-context inference in large language models is bottlenecked by the quadratic cost of full attentio…
论文提出即插即用的振荡式数据体积调度方法,超越传统样本选择,显著提升模型训练效率。
arXiv:2605.14773v1 Announce Type: cross Abstract: Data selection accelerates training by identifying representative training data while preserving mod…
OpenAI 详解高效训练语言模型完成中间填充(FIM)的新方法,提升代码补全与文本生成能力
最新研究用自动引导和在线数据筛选优化扩散模型训练,显著提升效率
arXiv:2509.15267v2 Announce Type: replace-cross Abstract: The costs of generative model compute rekindled promises and hopes for efficient data curati…