1
SWE-Chain: Benchmarking Coding Agents on Chained Release-Level Package Upgrades
全新基准测试聚焦代码智能体在连续软件包版本升级中的能力,比单次任务更贴近真实维护场景。
arXiv:2605.14415v1 Announce Type: cross Abstract: Coding agents powered by large language models are increasingly expected to perform realistic softwa…