1
Design and Report Benchmarks for Knowledge Work
知识工作的设计与报告基准,为AI系统在真实办公场景中的表现提供量化评估新维度。
arXiv:2605.23262v1 Announce Type: new Abstract: The development of LLM agents has led to a growing body of work on knowledge-work AI, including coding…