牛哥精选 · 本月

📋 全部 🤖 AI·大模型 ⚡ 效率工具 📝 深度技术 🚀 产品观察 💰 商业科技 🔓 开源项目 🎨 设计创意 📖 阅读推荐 🏷 资源合集 🌱 成长效率

🤖 AI·大模型 arXiv AI 2026-06-10

Superficial Beliefs in LLM Decision-Making

揭示LLM决策背后的真相：它们真的在推理还是仅仅模仿理由？这篇新研究深入探讨AI的潜意识。

arXiv:2606.11016v1 Announce Type: new Abstract: We ask whether large language models (LLMs) merely imitate rationales when choosing between two option…

llm 决策机制信念模仿人工智能研究大模型行为

🤖 AI·大模型 arXiv AI 2026-05-27

Beyond a Single Direction: Chain-of-Thought Disrupts Simple Steering of Refusal

揭秘链式思维推理如何打破AI拒绝行为的方向性操控，大模型安全新视角

arXiv:2605.26772v1 Announce Type: new Abstract: Large reasoning models (LRMs) generate chain-of-thought (CoT) traces before producing final outputs, i…

chain-of-t ai安全拒绝机制大模型行为推理调控

📝 深度技术 arXiv NLP 2026-05-21

Do as I Say, Not as I Do: Instruction-Induction Conflict in LLMs

大模型会听指令还是学案例？论文揭示LLMs在指令服从与上下文归纳之间的行为冲突，揭秘“说一套做一套”的根源。

arXiv:2605.20382v1 Announce Type: new Abstract: Language models are trained to follow instructions, but they are also powerful pattern completers. Wha…

llm 指令冲突归纳学习 ai安全一致性研究

🤖 AI·大模型 Hacker News LLM 2026-05-19

Prompt eval cues predicted refusal shifts across 32k LLM rollouts

32k次LLM部署实验揭示：提示中的评估线索能准确预测模型拒绝回答行为的变化规律。

Article URL: https://medium.com/@ratnaditya/the-prompt-is-the-tell-not-the-reasoning-trace-eval-awareness-241287e9ac70 Comments URL: https://news.ycom…

大模型行为分析提示工程拒绝回答评估线索 llm安全

📅 日期

2026-05-20 2026-05-19

🐂 牛哥精选

Superficial Beliefs in LLM Decision-Making

Beyond a Single Direction: Chain-of-Thought Disrupts Simple Steering of Refusal

Do as I Say, Not as I Do: Instruction-Induction Conflict in LLMs

Prompt eval cues predicted refusal shifts across 32k LLM rollouts

📅 日期