1
Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs
数学家精选的Soohak基准测试,专攻LLM科研级数学推理能力,挑战最高阶思维极限
arXiv:2605.09063v2 Announce Type: replace Abstract: Following the recent achievement of gold-medal performance on the IMO by frontier LLMs, the commun…