1
Benchmark of Benchmarks: Unpacking Influence and Code Repository Quality in LLM Safety Benchmarks
首次系统评估LLM安全基准间的相互影响与代码仓库质量,揭示基准研究的潜在偏差
arXiv:2603.04459v3 Announce Type: replace-cross Abstract: The rapid expansion of research in LLM safety presents challenges in tracking advancements, …