1
Reducing the Safety Tax in LLM Safety Alignment with On-Policy Self-Distillation
论文提出on-policy self-distillation方法,在不牺牲推理能力的前提下降低LLM安全对齐中的“安全税”。
arXiv:2605.15239v1 Announce Type: new Abstract: Safety alignment often improves robustness to harmful queries at the cost of reasoning ability, a trad…