1
Automated Framework to Evaluate and Harden LLM System Instructions against Encoding Attacks
提出自动化框架,系统评估并加固LLM系统指令以抵御编码攻击,为AI安全提供新工具。
arXiv:2604.01039v2 Announce Type: replace-cross Abstract: System Instructions in Large Language Models (LLMs) are commonly used to enforce safety poli…