From Reasoning Chains to Verifiable Subproblems: Curriculum Reinforcement Learning Enables Credit Assignment for LLM Reasoning
将推理链分解为可验证子问题,用课程强化学习精准分配LLM推理中的信用,突破训练瓶颈。
arXiv:2605.22074v1 Announce Type: cross Abstract: Reinforcement learning from verifiable rewards (RLVR) has shown strong promise for LLM reasoning, bu…