牛哥精选 · 本月

🤖 AI·大模型 arXiv AI 2026-05-19

AMiD: Knowledge Distillation for LLMs with $\alpha$-mixture Assistant Distribution

AMiD提出了一种统一的知识蒸馏框架，通过α-混合辅助分布系统性地桥接了教师与学生的容量鸿沟，解决了因高维输出近零概率引发的训练不稳定问题——这是LLM蒸馏中关键却长期碎片化的挑战。

arXiv:2510.15982v3 Announce Type: replace-cross Abstract: Autoregressive large language models (LLMs) have achieved remarkable improvement across many…

知识蒸馏大语言模型 α-混合辅助分布 amid 散度

🐂 牛哥精选

AMiD: Knowledge Distillation for LLMs with $\alpha$-mixture Assistant Distribution

📅 日期