1
Anti-Length Shift: Dynamic Outlier Truncation for Training Efficient Reasoning Models
提出动态异常截断方法,解决强化推理模型长链思考导致的高昂部署成本与性能损失问题。
arXiv:2601.03969v2 Announce Type: replace Abstract: Large reasoning models enhanced by reinforcement learning with verifiable rewards have achieved si…