牛哥精选 · 本月

📝 深度技术 arXiv AI 2026-06-04

BioBlue: Systematic runaway-optimiser-like LLM failure modes on biologically and economically aligned AI safety benchmarks for LLMs with simplified observation format

系统性揭示LLM在生物与经济安全基准上的类失控优化器失败模式，视角新颖，观察格式简化。

arXiv:2509.02655v3 Announce Type: replace-cross Abstract: Many AI alignment discussions of "runaway optimisation" focus on RL agents: unbounded utilit…

llm安全 ai对齐失控优化器基准测试生物模拟

🐂 牛哥精选

BioBlue: Systematic runaway-optimiser-like LLM failure modes on biologically and economically aligned AI safety benchmarks for LLMs with simplified observation format

📅 日期