1
Quantifying the Impact of Translation Errors on Multilingual LLM Evaluation
量化翻译误差对多语言大模型评估的偏倚,揭示评估中的关键陷阱并改进可靠性。
arXiv:2605.24904v1 Announce Type: new Abstract: Machine-translated benchmarks are widely used to assess the multilingual capabilities of large languag…