1
Towards Reliable Multilingual LLMs-as-a-Judge: An Empirical Study
多语言LLM评判可靠性实证研究:探索如何确保AI裁判在不同语言下的公正与准确
arXiv:2605.28710v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly used for the automatic evaluation of generated text, y…