1
Beyond a Single Direction: Chain-of-Thought Disrupts Simple Steering of Refusal
揭秘链式思维推理如何打破AI拒绝行为的方向性操控,大模型安全新视角
arXiv:2605.26772v1 Announce Type: new Abstract: Large reasoning models (LRMs) generate chain-of-thought (CoT) traces before producing final outputs, i…