1
Exploring and Developing a Pre-Model Safeguard with Draft Models
用草稿模型在推理前拦截有害输出,为AI安全提供轻量级新方案
arXiv:2605.19321v1 Announce Type: cross Abstract: Large Language Model (LLM) alignment remains vulnerable to jailbreak attacks that elicit unsafe resp…