牛哥精选 · 所有

📋 全部 ☁️ 云服务 🤖 AI 平台 🔗 API 中转 🔐 安全/认证 💳 支付 📧 通讯 📊 数据分析 🖼 媒体处理 🌐 域名/DNS

📝 深度技术 arXiv AI 2026-05-19

Do Linear Probes Generalize Better in Persona Coordinates?

论文探索线性探针在角色坐标下对LLM有害行为的泛化监测，直指战略欺骗与沙袋问题。

arXiv:2605.09391v2 Announce Type: replace Abstract: It is becoming increasingly necessary to have monitors check for harmful behaviors during language…

线性探针模型安全角色坐标 llm泛化

📅 日期

2026-05-20 2026-05-19

🐂 牛哥精选

Do Linear Probes Generalize Better in Persona Coordinates?

📅 日期