牛哥精选 · 本月

📋 全部 🤖 AI·大模型 ⚡ 效率工具 📝 深度技术 🚀 产品观察 💰 商业科技 🔓 开源项目 🎨 设计创意 📖 阅读推荐 🏷 资源合集 🌱 成长效率

📝 深度技术 arXiv AI 2026-06-01

Toxic HallucinAItions: Perturbing Prompts and Tracing LLM Circuits

揭示大语言模型产生毒性幻觉的内部机制，通过扰动提示词并追踪神经网络电路路径，为AI安全提供新思路。

arXiv:2605.30913v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly deployed in conversational settings where user tone ra…

llm 毒性幻觉提示扰动神经电路追踪安全性

📝 深度技术 arXiv NLP 2026-05-26

Toxicity in Twitch Chats: An LLM-Based Analysis Across Gaming Communities

用大模型分析Twitch各游戏社区聊天毒性差异，揭示不同社区的独特模式。

arXiv:2605.24000v1 Announce Type: new Abstract: Toxicity in online gaming communities remains a persistent challenge, manifesting across genres, platf…

twitch 在线游戏毒性分析 llm应用社交网络

🤖 AI·大模型 arXiv NLP 2026-05-21

Toxic Subword Pruning for Dialogue Response Generation on Large Language Models

LLM对话生成新防御：剪除有毒子词，提升模型安全性

arXiv:2410.04155v2 Announce Type: replace Abstract: How to defend large language models (LLMs) from generating toxic content is an important research …

大语言模型对话生成毒性子词模型安全 nlp

🤖 AI·大模型 arXiv 机器学习 2026-05-19

Measuring and Mitigating Toxicity in Large Language Models: A Comprehensive Replication Study

一篇对LLM毒性问题进行系统性复制研究的最新论文，验证了现有测量与缓解方法、揭示关键发现，值得关注。

arXiv:2605.14087v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) trained on web-scale corpora inherently absorb toxic patterns f…

llm 毒性安全复制研究 arxiv

📅 日期

2026-05-20 2026-05-19

🐂 牛哥精选

Toxic HallucinAItions: Perturbing Prompts and Tracing LLM Circuits

Toxicity in Twitch Chats: An LLM-Based Analysis Across Gaming Communities

Toxic Subword Pruning for Dialogue Response Generation on Large Language Models

Measuring and Mitigating Toxicity in Large Language Models: A Comprehensive Replication Study

📅 日期