1
AstraFlow: Dataflow-Oriented Reinforcement Learning for Agentic LLMs
提出数据流导向的强化学习框架,降低智能体LLM训练成本并支持多策略协同,大幅提升可扩展性。
arXiv:2605.15565v1 Announce Type: cross Abstract: Reinforcement learning (RL) is increasingly used to improve the reasoning, coding, and tool-use capa…