1
SAGE: Shaping Anchors for Guided Exploration in RLVR of LLMs
提出SAGE框架,通过塑造锚点引导LLM在RLVR(强化学习与验证器推理)中高效探索,提升推理能力与验证效果。
arXiv:2605.18864v1 Announce Type: new Abstract: Recent studies observe that reinforcement learning with verifiable rewards (RLVR) reliably improves pa…