1
CIAware-Bench: Benchmarking Control Intervention Awareness Across Frontier LLMs
最新基准测试CIAware-Bench,评估前沿大模型对控制干预的感知能力,揭示AI安全新挑战。
arXiv:2606.11063v1 Announce Type: new Abstract: AI control protocols oversee untrusted models by monitoring their actions and modifying potentially un…