1
Augmenting Human Evaluation with LLM Judges: How Many Human Reviews Do You Need?
探讨如何用LLM评估人效,量化所需人类评审数量,高效平衡AI系统评估的成本与质量。
arXiv:2605.16354v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used as automated evaluators of AI systems, including in…