1
Graph-Regularized Sparse Autoencoders for LLM Safety Steering
提出图正则化稀疏自编码器,提升大模型安全行为干预的精准度。
arXiv:2512.06655v3 Announce Type: replace-cross Abstract: Sparse autoencoders (SAEs) are increasingly used to extract activation directions for infere…