1
HINT-SD: Targeted Hindsight Self-Distillation for Long-Horizon Agents
从长时域智能体学习挑战入手,提出目标后见自蒸馏方法,提升复杂任务表现。
arXiv:2605.17873v1 Announce Type: new Abstract: Training long-horizon LLM agents with reinforcement learning is challenging because sparse outcome rew…