1
The Evaluation Trap: Benchmark Design as Theoretical Commitment
AI基准测试暗藏理论假设,窄化进步定义,警惕评估陷阱重塑能力概念
arXiv:2605.14167v1 Announce Type: new Abstract: Every AI benchmark operationalizes theoretical assumptions about the capability it claims to assess. W…