1
CPMobius: Iterative Coach-Player Reasoning for Data-Free Reinforcement Learning
无需人工标注数据,LLM通过迭代教练-玩家推理实现强化学习突破
arXiv:2602.02979v2 Announce Type: cross Abstract: Large Language Models (LLMs) have demonstrated strong potential in complex reasoning, yet their prog…