1
BarrierSteer: LLM Safety via Learning Barrier Steering
新方法「BarrierSteer」通过学习障碍引导机制提升大模型安全性,理论创新强,但缺乏实验细节。
arXiv:2602.20102v2 Announce Type: replace-cross Abstract: Despite the strong performance of large language models (LLMs) across diverse tasks, their s…