1
Backtracking When It Strays: Mitigating Dual Exposure Biases in LLM Reasoning Distillation
看点在通过回溯机制缓解LLM推理蒸馏中的双重暴露偏差,提升长链思维迁移效率
arXiv:2605.19433v1 Announce Type: new Abstract: Large language models (LLMs) have achieved remarkable success in complex reasoning tasks via long chai…