1
Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tasks with Large-Scale File Dependencies
AI智能体工作空间任务的首个大规模基准,关注复杂文件依赖下的评估难题。
arXiv:2605.03596v4 Announce Type: replace Abstract: Workspace learning requires AI agents to identify, reason over, exploit, and update explicit and i…