1
The Evaluation Game: Beyond Static LLM Benchmarking
超越静态基准,探索LLM评估新范式,引入游戏化思想打破传统测试局限。
arXiv:2605.19377v1 Announce Type: new Abstract: As jailbreaks, adversarially crafted inputs that bypass safety constraints, continue to be discovered …