You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories
揭示RLVR训练中参数轨迹的秩一结构,仅需极小规模训练即可外推LLM推理能力,颠覆传统认知。
arXiv:2605.21468v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has become a dominant paradigm for improving rea…