1
Leveraging Error Diversity in Group Rollouts for Reinforcement Learning
强化学习新方法:利用群组展开中的误差多样性提升性能,理论分析与实验验证兼具。
arXiv:2605.17333v1 Announce Type: new Abstract: Reinforcement Learning from Verifiable Rewards (RLVR) typically samples multiple responses per prompt …