1
The Unlearnability Phenomenon in RLVR for Language Models
揭示RLVR训练中LLM对困难样本无法学习的反直觉现象,挑战现有认知
arXiv:2605.16787v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Reward (RLVR) has proven effective in improving Large Language …