Adaptive Probe-based Steering for Robust LLM Jailbreaking
针对LLM越狱新方法,采用自适应探针引导,克服了传统对比引导的偏差和手动调参局限,提升鲁棒性与有效性。
arXiv:2605.20286v1 Announce Type: cross Abstract: Recent work has demonstrated the potential of contrastive steering for jailbreaking Large Language M…