1
TSR: Trajectory-Search Rollouts for Multi-Turn RL of LLM Agents
提出TSR轨迹搜索展开方法,精准提升LLM Agent在多轮交互中的强化学习表现
arXiv:2602.11767v3 Announce Type: replace-cross Abstract: Advances in large language models (LLMs) are driving a shift toward using reinforcement lear…