Revisiting Robustness for LLM Safety Alignment via Selective Geometry Control
通过选择性几何控制,提升大模型安全对齐的鲁棒性,为AI防御攻击提供新思路。
arXiv:2602.07340v2 Announce Type: replace Abstract: Safety alignment of large language models remains brittle under domain shift and noisy preference …