Wake-Up Call: Why AI Safety Guardrails Break Under Pressure
六大模型压力测试揭示AI安全护栏在持续追问下会失效,值得关注
This is a submission for the Google I/O Writing Challenge This is a submission for the Google I/O Writing Challenge We treat AI safety as a static sta…
六大模型压力测试揭示AI安全护栏在持续追问下会失效,值得关注
This is a submission for the Google I/O Writing Challenge This is a submission for the Google I/O Writing Challenge We treat AI safety as a static sta…
再看自主LLM智能体在CTF挑战中的表现,更新发现与能力边界。
arXiv:2605.21497v1 Announce Type: cross Abstract: Large Language Model (LLM) agents are increasingly proposed to automate offensive security tasks, wi…
OpenAI官方发布GPT-4V系统卡,详解多模态模型能力、安全评估与局限性。
GPT-5.4 系统卡全面披露模型能力、限制与安全评估细节,深度解析下一代大模型技术。
GPT-5.3系统卡正式发布,详解最新模型能力、安全评估与技术细节
OpenAI官方发布GPT-5.5 Instant系统卡,详解安全评估、能力边界与性能提升,值得关注。
提出有限k分解方法,预测机器学习模型部署时的失败率,提升安全评估可行性。
arXiv:2605.15134v2 Announce Type: replace Abstract: Estimating how often an ML model will fail at deployment scale is central to pre-deployment safety…
OpenAI官方发布GPT-5.5系统卡,深度披露模型能力、安全评估与性能细节
提出动态自进化安全评估框架,解决大模型静态基准无法应对AI风险演变的问题。
arXiv:2509.26100v2 Announce Type: replace Abstract: The rapid integration of Large Language Models (LLMs) into high-stakes domains necessitates reliab…