1
Judging Against the Reference: Uncovering Knowledge-Driven Failures in LLM-Judges on QA Evaluation
揭示LLM作为评判者在QA评估中因知识驱动而失败的深层原因,实验严谨发现惊人。
arXiv:2601.07506v2 Announce Type: replace Abstract: While large language models (LLMs) are increasingly used as automatic judges for question answerin…