1
Decoupling KL and Trajectories: A Unified Perspective for SFT, DAgger, Offline RL, and OPD in LLM Distillation
一篇统一SFT、DAgger、离线RL和OPD视角的LLM蒸馏论文,解耦KL与轨迹,为模型优化提供新理论框架。
arXiv:2605.16826v1 Announce Type: new Abstract: Knowledge distillation is central to LLM post-training, yet its design space remains poorly understood…