1
Position: AI Evaluations Should be Grounded on a Theory of Capability
AI评估不能只靠基准测试,要基于系统的能力理论来构建更可靠的评价体系
arXiv:2509.19590v2 Announce Type: replace-cross Abstract: Evaluations of generative models are now ubiquitous, and their outcomes critically shape pub…