1
3SPO: State-Score-Supervised Policy Optimization for LLM Agents
提出3SPO新方法,以状态分数监督优化LLM智能体策略,提升决策效率和可解释性
arXiv:2606.09961v1 Announce Type: cross Abstract: Training large language models (LLMs) as autonomous agents via reinforcement learning (RL) has enabl…