1
Search Self-play: Pushing the Frontier of Agent Capability without Supervision
无需人工标注,通过自主搜索对弈持续提升AI Agent能力极限,开辟无监督进化新方向
arXiv:2510.18821v3 Announce Type: replace Abstract: Reinforcement learning with verifiable rewards (RLVR) has become the mainstream technique for trai…