1
Activation Steering with a Feedback Controller
用反馈控制器实现精准激活引导,为提升大模型可控性提供新思路,ICLR2026论文。
arXiv:2510.04309v3 Announce Type: replace Abstract: Controlling the behaviors of large language models (LLM) is fundamental to their safety alignment …