1
Large Language Models Could Be Rote Learners
大语言模型可能通过死记硬背在测试基准上虚高成绩,揭示评测漏洞
arXiv:2504.08300v5 Announce Type: replace-cross Abstract: Benchmark-based evaluation, e.g., multiple-choice questions (MCQs) and open-ended questions …
大语言模型可能通过死记硬背在测试基准上虚高成绩,揭示评测漏洞
arXiv:2504.08300v5 Announce Type: replace-cross Abstract: Benchmark-based evaluation, e.g., multiple-choice questions (MCQs) and open-ended questions …