1
Gaia2: Benchmarking LLM Agents on Dynamic and Asynchronous Environments
LLM代理评估迎来新标杆:Gaia2在动态异步环境中测试智能体的时间感知与适应能力,告别静态评测。
Article URL: https://arxiv.org/abs/2602.11964 Comments URL: https://news.ycombinator.com/item?id=48430918 Points: 2 # Comments: 0