1
BeyondSWE: Can Current Code Agent Survive Beyond Single-Repo Bug Fixing?
最新研究:现有代码智能体能否应对单仓库之外的复杂任务?实验揭示其能力边界与挑战。
arXiv:2603.03194v2 Announce Type: replace Abstract: Current code-agent benchmarks primarily evaluate localized issue resolution within a single target…