1
Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
质疑LLM在科研评估中的可靠性,这项研究揭示了关键局限。
arXiv:2605.19196v1 Announce Type: new Abstract: Deep research agents increasingly automate complex information-seeking tasks, producing evidence-groun…