1
When Behavioral Safety Evaluation Fails: A Representation-Level Perspective
从表示层揭示AI安全评估的盲点,为模型对齐提供全新视角与洞见
arXiv:2606.08044v1 Announce Type: new Abstract: Large Language Model (LLM) safety has often been evaluated at the behavior level, which provides limit…