1
GEAR: Granularity-Adaptive Advantage Reweighting for LLM Agents via Self-Distillation
提出粒度自适应优势重加权方法,用自蒸馏实现LLM Agent的细粒度信用分配,改进策略学习效率。
arXiv:2605.11853v2 Announce Type: replace-cross Abstract: Reinforcement learning has become a widely used post-training approach for LLM agents, where…