1
Boosting LLM Reasoning via Human-Inspired Reward Shaping
受人类学习启发,T2T框架将LLM推理强化学习巧妙拆分为“加厚”探索与“变薄”巩固两阶段,在数学推理上显著超越现有方法。
arXiv:2602.04265v3 Announce Type: replace-cross Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a promising paradigm fo…