1
ConsisGuard: Aligning Safety Deliberation with Policy Enforcement in LLM Guardrails
提出 ConsisGuard 方法,对齐安全审议与策略执行,提升大模型护栏的可靠性与一致性。
arXiv:2605.31073v1 Announce Type: new Abstract: Reasoning-based LLM guardrails improve safety moderation by generating explicit rationales before issu…