牛哥精选 · 本月

📋 全部 🤖 AI·大模型 ⚡ 效率工具 📝 深度技术 🚀 产品观察 💰 商业科技 🔓 开源项目 🎨 设计创意 📖 阅读推荐 🏷 资源合集 🌱 成长效率

📝 深度技术 arXiv NLP 2026-06-10

HarDBench: A Benchmark for Draft-Based Co-Authoring Jailbreak Attacks for Safe Human-LLM Collaborative Writing

人机协作写作竟藏越狱风险？新基准揭示大模型安全新盲区

arXiv:2604.19274v2 Announce Type: replace Abstract: Large language models (LLMs) are increasingly used as co-authors in collaborative writing, where u…

ai安全越狱攻击人机协作基准测试 llm安全性

🤖 AI·大模型 arXiv AI 2026-06-01

EUDAIMONIA: Evaluating Undesirable Dynamics in AI

AI伴侣暗藏风险？新框架评估大模型社交对话中的负面动态。

arXiv:2605.30654v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly used as conversational partners for companionship, emo…

llm安全性社交ai评估有害动态 ai伴侣风险评估框架

📝 深度技术 arXiv AI 2026-05-29

How Reliable Are AI Attackers Against a Fixed Vulnerable Target? A 400-Run Empirical Study of LLM Penetration Testing Consistency

400次重复实验揭示：大模型做黑客竟如此「不稳定」？首个LLM渗透测试一致性量化研究。

arXiv:2605.30096v1 Announce Type: cross Abstract: Large language models (LLMs) can autonomously conduct multi-stage cyber attacks, but the consistency…

llm 自主攻击一致性渗透测试实证研究

🤖 AI·大模型 arXiv NLP 2026-05-27

On the Sensitivity of Instruction-tuned LLMs to Harmful Sentences in Long Inputs

最新研究揭示指令微调LLM在长上下文输入中，对有害句子的敏感性存在显著风险。

arXiv:2510.05864v2 Announce Type: replace Abstract: Large language models (LLMs) increasingly operate on long inputs, yet their behavior when harmful …

大语言模型指令微调有害内容长输入鲁棒性

🤖 AI·大模型 arXiv NLP 2026-05-26

MultiHaluDet: Multilingual Hallucination Detection via LLM Hidden State Probing

基于LLM隐藏状态探测的多语言幻觉检测新方法，有效应对非英语场景的可靠性挑战。

arXiv:2605.24919v1 Announce Type: new Abstract: Hallucinations in Large Language Models (LLMs) represent a critical barrier to their reliable deployme…

幻觉检测多语言隐藏状态 llm安全性大模型评测

📅 日期

2026-05-20 2026-05-19