1
Strong Teacher Not Needed? On Distillation in LLM Pretraining
颠覆认知?弱教师模型也能有效蒸馏LLM,预训练阶段教师强度并非关键。
arXiv:2605.23857v1 Announce Type: new Abstract: Knowledge distillation generally assumes a strong-to-weak relationship where stronger teachers yield b…