1
LLM Benchmark Datasets Should Be Contamination-Resistant
ICML 2026收录:LLM基准数据集需抗污染,防止训练数据泄露导致评估失真。
arXiv:2605.19999v1 Announce Type: new Abstract: Benchmark datasets are critical for reproducible, reliable, and discriminative evaluation of LLMs. How…