Strong Teacher Not Needed? On Distillation in LLM Pretraining
颠覆认知?弱教师模型也能有效蒸馏LLM,预训练阶段教师强度并非关键。
arXiv:2605.23857v1 Announce Type: new Abstract: Knowledge distillation generally assumes a strong-to-weak relationship where stronger teachers yield b…
颠覆认知?弱教师模型也能有效蒸馏LLM,预训练阶段教师强度并非关键。
arXiv:2605.23857v1 Announce Type: new Abstract: Knowledge distillation generally assumes a strong-to-weak relationship where stronger teachers yield b…
无需辅助组件的投影引导跨分词器知识蒸馏,有效解决词汇不兼容问题。
arXiv:2605.21699v1 Announce Type: cross Abstract: Cross-tokenizer knowledge distillation allows a student model to learn from teachers with incompatib…
多轮对话代理只能“一刀切”蒸馏?这篇论文给出何时蒸馏、蒸馏什么的智能选择策略
arXiv:2605.19447v1 Announce Type: new Abstract: Reinforcement learning can train LLM agents from sparse task rewards, but long-horizon credit assignme…
提出稀疏到稠密奖励原则,四阶段后训练流程更高效利用稀缺标注数据,为LLM推理优化提供新范式。
arXiv:2605.12483v2 Announce Type: replace-cross Abstract: When labeled verifiable training data is scarce, each checked example should be used where i…
首份大模型在线策略蒸馏综述,系统梳理方法、挑战与未来方向,适合研究者深挖。
arXiv:2604.00626v3 Announce Type: replace Abstract: As Large Language Models (LLMs) continue to grow in both capability and cost, transferring frontie…
用认知不确定性引导知识蒸馏,解决学生误解分类中数据稀疏与边界模糊难题。
arXiv:2605.14752v1 Announce Type: cross Abstract: Accurately identifying student misconceptions is crucial for personalized education but faces three …
双向联邦知识蒸馏框架,破解非独立同分布与长尾心电图监测的隐私与效率难题
arXiv:2605.14886v1 Announce Type: new Abstract: Electrocardiogram (ECG) monitoring in Internet of Medical Things (IoMT) networks is constrained by str…
提出一种无损抗蒸馏采样方法,为保护大模型知识产权提供新思路
arXiv:2605.18829v1 Announce Type: new Abstract: Frontier commercial generative models face a growing threat from distillation, whereby a distiller har…
细粒度图像识别中教师模型如何选?这篇研究为资源受限设备的知识蒸馏提供了新思路。
arXiv:2605.15689v1 Announce Type: new Abstract: Fine-grained image recognition classifies subcategories such as bird species or car models. While stat…
提出Flow-OPD新方法,用同策略蒸馏解决流匹配模型在多任务对齐中的奖励稀疏和梯度干扰问题。
arXiv:2605.08063v3 Announce Type: replace-cross Abstract: Existing Flow Matching (FM) text-to-image models suffer from two critical bottlenecks under …
脉冲语言模型新突破:无Softmax注意力与对齐蒸馏打造超低能耗大模型。
arXiv:2605.13859v1 Announce Type: cross Abstract: Spiking Neural Networks (SNNs) offer promising energy-efficient alternatives to large language model…
AMiD提出了一种统一的知识蒸馏框架,通过α-混合辅助分布系统性地桥接了教师与学生的容量鸿沟,解决了因高维输出近零概率引发的训练不稳定问题——这是LLM蒸馏中关键却长期碎片化的挑战。
arXiv:2510.15982v3 Announce Type: replace-cross Abstract: Autoregressive large language models (LLMs) have achieved remarkable improvement across many…