1
DevBench: A Realistic, Developer-Informed Benchmark for Code Generation Models
首个融入真实开发反馈的代码生成模型评测基准,直击现有基准脱离实际代码场景的痛点。
arXiv:2601.11895v3 Announce Type: replace Abstract: DevBench is a telemetry-driven benchmark designed to evaluate Large Language Models (LLMs) on real…