1
AMiD: Knowledge Distillation for LLMs with $\alpha$-mixture Assistant Distribution
AMiD提出了一种统一的知识蒸馏框架,通过α-混合辅助分布系统性地桥接了教师与学生的容量鸿沟,解决了因高维输出近零概率引发的训练不稳定问题——这是LLM蒸馏中关键却长期碎片化的挑战。
arXiv:2510.15982v3 Announce Type: replace-cross Abstract: Autoregressive large language models (LLMs) have achieved remarkable improvement across many…