牛哥精选 · 本月

1

🔓 开源项目 Hacker News LLM 2026-06-13

Show HN: Rubric – test what your LLM agent did, not just what it said

LLM agent 行为测试新工具，验证做了什么而非说了什么，开源且实用

Article URL: https://github.com/Kareem-Rashed/rubric-eval Comments URL: https://news.ycombinator.com/item?id=48509073 Points: 1 # Comments: 0

llm agent 测试行为评估 rubric 开源

2

📝 深度技术 arXiv 机器学习 2026-06-11

When is Your LLM Steerable?

激活引导何时生效？一篇论文揭示LLM行为控制的边界与条件，帮你省去盲目网格搜索的功夫。

arXiv:2606.11599v1 Announce Type: cross Abstract: Activation steering offers a lightweight approach to control language models' behavior at inference …

llm 激活引导模型控制推理控制行为调控

3

🤖 AI·大模型 arXiv AI 2026-06-11

Are LLMs Bad at Moral Reasoning?

研究揭示大语言模型在道德推理上的不足，为安全AI发展敲响警钟。

arXiv:2606.11635v1 Announce Type: cross Abstract: For highly capable AI systems to operate safely in dynamic, open-ended environments, they must be ab…

llms 道德推理人工智能安全行为约束评估方法

4

🤖 AI·大模型 arXiv AI 2026-06-10

Superficial Beliefs in LLM Decision-Making

揭示LLM决策背后的真相：它们真的在推理还是仅仅模仿理由？这篇新研究深入探讨AI的潜意识。

arXiv:2606.11016v1 Announce Type: new Abstract: We ask whether large language models (LLMs) merely imitate rationales when choosing between two option…

llm 决策机制信念模仿人工智能研究大模型行为

5

🚀 产品观察 arXiv AI 2026-06-10

From Perception to Action: Can UI Interventions Foster Sustainable LLM Chatbot

从界面设计入手，探讨如何通过UI干预引导用户更可持续地使用LLM聊天机器人，跳出传统模型优化思路。

arXiv:2606.10861v1 Announce Type: cross Abstract: LLM-powered chatbots are increasingly embedded in everyday workflows, raising sustainability concern…

可持续ai ui干预用户行为能源效率 llm聊天机器人

6

🤖 AI·大模型 arXiv AI 2026-06-10

Mobility Anomaly Generation using LLM-Driven Behavior with Kinematic Constraints

利用大模型驱动行为并加入运动学约束，生成逼真的移动异常场景

arXiv:2606.10314v1 Announce Type: new Abstract: Although the study of human trajectory anomalies is critical for advancing spatial data mining, empiri…

llm 运动学约束异常生成移动行为论文

7

📝 深度技术 arXiv 机器学习 2026-06-09

Payoff scaling shapes cooperation in LLM agents across languages

跨语言大模型代理中收益缩放如何塑造合作行为，揭示博弈论新视角

arXiv:2601.19082v2 Announce Type: replace-cross Abstract: Large language models (LLMs) are increasingly deployed as autonomous agents that negotiate, …

llm agents 合作行为收益缩放跨语言博弈论

8

📝 深度技术 arXiv 机器学习 2026-06-09

When Behavioral Safety Evaluation Fails: A Representation-Level Perspective

从表示层揭示AI安全评估的盲点，为模型对齐提供全新视角与洞见

arXiv:2606.08044v1 Announce Type: new Abstract: Large Language Model (LLM) safety has often been evaluated at the behavior level, which provides limit…

ai安全行为评估表示层模型对齐安全评估失败

9

🔓 开源项目 Hacker News LLM 2026-06-08

Show HN: Tinytasktree – Behavior-tree-style task orchestration for LLM agents

行为树任务编排利器，让LLM代理的任务调用更模块化、可组合

Article URL: https://github.com/orion-arm-ai/tinytasktree Comments URL: https://news.ycombinator.com/item?id=48443382 Points: 1 # Comments: 0

llm代理任务编排行为树 python库异步

10

📝 深度技术 arXiv NLP 2026-06-08

PromptPrint: Behavioral Biometrics Through Natural Language Prompting in LLMs

传统作者识别依赖长文本，而LLM交互中的短促提示是否也蕴含独特的「笔迹」？这项研究提出了PromptPrint，用行为生物特征识别你。

arXiv:2606.06755v1 Announce Type: new Abstract: Authorship attribution research has traditionally focused on long-form, expressive texts; however, int…

promptprin 行为生物特征作者识别提示工程 llm安全

11

💳 支付 IT 之家 2026-06-08

网易支付被罚 220 万元，涉 4 项违法行为

IT之家 6 月 8 日消息，中国人民银行浙江省分行 6 月 8 日披露的行政处罚决定信息显示，因违反账户管理规定、违反清算管理规定、违反数据安全管理规定、未按照规定开展客户尽职调查，网易支付（杭州）有限公司被警告，并处 220.4 万元罚款。该公司技术中心余某，对违反数据安全管理规定负有责任…

网易支付被罚万元项违法行为

12

🤖 AI·大模型 arXiv NLP 2026-06-05

Do MLLMs Capture How Interfaces Guide User Behavior? A Benchmark for Multimodal UI/UX Design Understanding

首个衡量多模态大模型理解界面引导用户行为的基准，来自ACL 2026，揭示MLLMs在UI/UX设计认知中的能力与局限。

arXiv:2505.05026v5 Announce Type: replace Abstract: User interface (UI) design goes beyond visuals to shape user experience (UX), underscoring the shi…

mllms ui/ux设计用户行为多模态基准测试

13

🤖 AI 工具 MIT Technology Review 2026-06-05

Are AI chatbots making us lose control of our brains?

自动追踪你在电脑和手机上的活动，用数据帮你找回丢失的注意力，从此告别失控的碎片化浏览

This week I’ve been at SXSW London. There’s been music, film, and a lot—and I mean a lot—of talk about AI. I also had the opportunity to sit down with…

时间追踪注意力管理效率提升专注力行为分析

14

💰 商业科技 Hacker News AI 2026-06-05

Companies Are Using Reddit to Manipulate ChatGPT and Google AI Search

企业正利用Reddit操控ChatGPT和Google AI搜索结果，导致内容质量严重下滑。

Article URL: https://www.404media.co/companies-are-using-reddit-to-manipulate-chatgpt-and-google-ai-search/ Comments URL: https://news.ycombinator.com…

reddit ai搜索引擎数据操纵 chatgpt google ai

15

📝 深度技术 arXiv AI 2026-06-05

The Self-Correction Illusion: LLMs Correct Others but Not Themselves

研究发现LLM能轻易纠正别人错误，却对自己推理错误“睁一只眼闭一只眼”，揭示自我纠正的认知错觉。

arXiv:2606.05976v1 Announce Type: new Abstract: Recent work shows that LLM agents struggle to correct errors in their own reasoning traces yet show ma…

llm 自我纠正推理偏差认知错觉 ai研究

16

📝 深度技术 arXiv 机器学习 2026-06-05

Reasoning Shift: How Context Silently Shortens LLM Reasoning

发现新机制：上下文环境会悄无声息地缩短大模型的推理链条，揭示LLM行为的内在规律。

arXiv:2604.01161v2 Announce Type: replace Abstract: Large language models (LLMs) exhibiting test-time scaling behavior, such as extended reasoning tra…

llm推理上下文影响 reasoning 推理缩短模型行为

17

🌱 成长效率 Dev.to 2026-06-05

Your Screen Is a Stage

拖延不是懒，是神经系统在捣鬼——理解背后的科学，比整理桌面更有效。

What Stage Fright Can Teach Developers About Procrastination You've built the project before. You know the tools. The readme is written. Everything is…

拖延症神经系统生产力工作效率行为模式

18

💰 商业科技 36氪 2026-06-05

深交所：本周共对199起证券异常交易行为采取了自律监管措施

36氪获悉，深交所公告，2026年6月1日至6月5日，本所共对199起证券异常交易行为采取了自律监管措施，涉及盘中拉抬打压、虚假申报等异常交易情形；共对3起上市公司重大事项进行核查，并上报证监会4起涉嫌违法违规案件线索。

深交所本周共对起证券异常交易行为采取了自律监管措施

19

🤖 AI·大模型 arXiv AI 2026-06-04

Unpredictable Safety: Domain-Dependent Compliance and the Transparency Gap in Open-Weight LLMs

系统揭示开源大模型在不同伦理领域的安全行为差异，直指透明度缺口与合规不可预测性

arXiv:2606.04035v1 Announce Type: cross Abstract: We present a systematic study of domain-dependent safety behavior in open-weight LLMs: 7 standardize…

大模型安全开源权重伦理领域行为一致性透明度

20

🚀 产品观察 IT 之家 2026-06-04

何小鹏辟谣加价 2 万插队提车，称公司绝对禁止该行为

IT之家 6 月 4 日消息，6 月 3 日，小鹏集团董事长、CEO 何小鹏在直播中表示，针对小鹏 GX 热销导致提车周期延长，网上流传“加价 2 万能插队提车”说法是谣言，公司政策角度是绝对禁止的。他强调小鹏不会采取加价插队提车的做法，小鹏 GX 的交付顺序严格按照用户下单的顺序。 IT之家注…

何小鹏辟谣加万插队提车称公司绝对禁止该行为

🐂 牛哥精选