1
D$^2$Evo: Dual Difficulty-Aware Self-Evolution for Data-Efficient Reinforcement Learning
提出双难度感知自进化方法,解决强化学习训练数据稀缺与动态难度转移的挑战。
arXiv:2605.17037v1 Announce Type: new Abstract: Reinforcement learning (RL) has demonstrated potential for enhancing reasoning in large language model…