1
One Token to Fool LLM-as-a-Judge
只需一个token就能轻松骗过LLM评判者,揭示AI评估体系的安全软肋。
arXiv:2507.08794v3 Announce Type: replace-cross Abstract: Large language models (LLMs) are increasingly trusted as automated judges, assisting evaluat…
只需一个token就能轻松骗过LLM评判者,揭示AI评估体系的安全软肋。
arXiv:2507.08794v3 Announce Type: replace-cross Abstract: Large language models (LLMs) are increasingly trusted as automated judges, assisting evaluat…
系统研究强化学习对LLM的越狱攻击,揭示AI安全新风险,值得关注
arXiv:2605.07032v2 Announce Type: replace-cross Abstract: The evolution of generative models from next-token predictors to autonomous engines of compl…