1
GIM: Evaluating models via tasks that integrate multiple cognitive domains
新基准GIM通过整合多个认知领域评估大模型,避免死记硬背与脱离现实的推理。
arXiv:2605.18663v1 Announce Type: cross Abstract: As LLM benchmarks saturate, the evaluation community has pursued two strategies to increase difficul…