Metis: Learning to Jailbreak LLMs via Self-Evolving Metacognitive Policy Optimization
提出自进化元认知策略优化方法,让LLM红队测试更智能高效地发现安全漏洞。
arXiv:2605.10067v3 Announce Type: replace-cross Abstract: Red teaming is critical for uncovering vulnerabilities in Large Language Models (LLMs). Whil…