Attention-Guided Reward for Reinforcement Learning-based Jailbreak against Large Reasoning Models
用注意力机制优化强化学习奖励,成功攻破大型推理模型的安全防线,揭示AI安全新挑战。
arXiv:2605.19485v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) have demonstrated remarkable capabilities in solving complex problems by…