牛哥精选 · 所有

1

🤖 AI·大模型 arXiv AI 2026-07-14

AgentAbstain: Do LLM Agents Know When Not to Act?

LLM代理能否智能地“不作为”？新研究探索Agent安全决策边界，为AI可靠性提供关键视角。

arXiv:2607.10059v1 Announce Type: new Abstract: Agent systems based on large language models (LLMs) are increasingly deployed for autonomous tasks, ye…

llm ai代理弃权决策安全性

2

🤖 AI·大模型 arXiv NLP 2026-07-07

LLM-based Human Simulations Have Not Yet Been Reliable

顶级AI学者论文直指：基于大语言模型的人类模拟仍不可靠，结论值得关注。

arXiv:2501.08579v3 Announce Type: replace Abstract: Large Language Models (LLMs) are increasingly employed for simulating human behaviors across diver…

llm 人类模拟可靠性研究论文 ai安全

3

📝 深度技术 arXiv AI 2026-07-07

Teaming Up with AI: Coordination and Cooperation

AI成功融入劳动力需要经济价值作为桥梁，这篇论文探讨人机协调与合作的底层逻辑。

arXiv:2607.03181v1 Announce Type: cross Abstract: Successful diffusion of AI in the workforce hinges on the economic value that AI brings to human end…

人工智能人机协作经济价值劳动力研究论文

4

🤖 AI·大模型 arXiv AI 2026-07-01

Wisdom Of The (AI) Crowd: Investigating Artificial Swarm Intelligence In Large Language Models

探索LLM如何模拟人类群体智能，突破规模与协调限制，带来更优集体决策

arXiv:2606.31404v1 Announce Type: new Abstract: Human swarm intelligence demonstrates remarkable collective accuracy but faces scalability constraints…

群体智能大语言模型人工蜂群智能协作研究论文

5

🤖 AI·大模型 arXiv 计算机视觉 2026-06-30

The Human Creativity Benchmark

新论文提出人类创造力基准，为评估AI与人类创造力提供量化标尺，推动创造性AI研究。

arXiv:2606.30561v1 Announce Type: cross Abstract: Modern AI evaluation frameworks treat evaluator disagreement as noise to be resolved. In creative do…

创造力基准人类创造力 ai评估研究论文

6

🤖 AI·大模型 Hacker News AI 2026-06-26

The Shift to Agentic AI: Evidence from Codex [pdf]

OpenAI最新研究，基于Codex实验揭示AI从工具向自主智能体转变的关键证据。

Article URL: https://cdn.openai.com/pdf/5d1e1489-21c0-43e4-9d42-f87efdbf0082/the-shift-to-agentic-ai-evidence-from-codex.pdf Comments URL: https://new…

agentic ai codex openai 人工智能自主智能体

7

🤖 AI·大模型 OpenAI 官方博客 2026-06-25

How agents are transforming work

OpenAI最新研究揭示，AI代理正通过处理更复杂、更持久的任务，全面重塑工作模式与生产力边界。

A new OpenAI research paper shows how AI agents are transforming work, enabling longer, more complex tasks and expanding productivity across roles.

ai代理 openai 工作自动化研究论文生产力提升

8

🤖 AI·大模型 arXiv AI 2026-06-23

Hypothesis-Driven Skill Optimization for LLM Agents

新方法！假设驱动优化LLM智能体技能，显著提升复杂任务适应能力

arXiv:2606.22330v1 Announce Type: new Abstract: External skills can improve action-oriented LLM agents without changing model weights, but persistent …

llm agents 假设驱动技能优化大语言模型强化学习

9

⚡ 效率工具 Hacker News Ask 2026-06-17

Ask HN: What are your best Claude hacks?

用Claude把研究论文变成临时领域专家，这个工作流太聪明了。

One workflow I like: collecting high-quality research papers on a topic, uploading them to Claude, and turning it into a temporary expert on that doma…

claude 研究论文 ai工作流提示词技巧知识管理

10

🤖 AI·大模型 arXiv AI 2026-06-10

LLM-Based Code Documentation Generation and Multi-Judge Evaluation

用大模型自动生成代码文档，并引入多评判机制评估质量，提升开发效率

arXiv:2606.09852v1 Announce Type: cross Abstract: High-quality source code documentation is vital yet often neglected, especially in critical domains …

llm 代码文档生成多裁判评估代码质量自动化文档

11

🏷 资源合集 Hacker News LLM 2026-06-09

LLM Research Papers: The 2026 List (January to May)

2026年上半年必看LLM研究论文清单，由机器学习专家精心筛选，帮你省时高效追踪前沿进展。

Article URL: https://magazine.sebastianraschka.com/p/llm-research-papers-2026-part1 Comments URL: https://news.ycombinator.com/item?id=48446264 Points…

llm 论文清单 2026 研究论文前沿技术

12

📝 深度技术 arXiv AI 2026-06-02

Failure of contextual invariance in large language models

大模型在复杂语境下会「失忆」，系统揭示 LLM 上下文不变性的严重缺陷与根源

arXiv:2603.23485v2 Announce Type: replace-cross Abstract: Standard evaluation practices assume that large language model (LLM) outputs are stable when…

大语言模型上下文不变性语言理解模型缺陷研究论文

13

📝 深度技术 arXiv AI 2026-06-02

Bridging the Last Mile of Time Series Forecasting with LLM Agents

用LLM智能体打通时间序列预测的“最后一公里”，前沿方法探索。

arXiv:2606.02497v1 Announce Type: new Abstract: Time series forecasting has advanced rapidly, especially with the emergence of foundation models that …

时间序列预测 llm智能体人工智能研究论文深度学习

14

📝 深度技术 arXiv AI 2026-05-28

AgensFlow: A Coordination-Policy Substrate for Multi-Agent Systems

多智能体系统新框架AgensFlow，提出协调策略基底，为复杂AI协作提供理论支持。

arXiv:2605.27466v1 Announce Type: cross Abstract: Multi-agent systems built on large language models (LLMs) require many coordination choices that are…

agensflow 多智能体系统协调策略研究论文

15

🤖 AI·大模型 arXiv AI 2026-05-27

Can LLMs Introspect? A Reality Check

质疑LLM是否具备真正的内省能力，一项基于实证的严谨检验。

arXiv:2605.26242v1 Announce Type: new Abstract: Can large language models detect and report their own internal states? A number of studies have argued…

llm 内省自我认知现实检验大模型评估

16

📝 深度技术 arXiv NLP 2026-05-26

SEAL: Synergistic Co-Evolution of Agents and Learning Environments

LLM代理自我进化新突破：协同策略与环境共同进化，解决代理-环境错配难题

arXiv:2605.24426v1 Announce Type: new Abstract: Large Language Model (LLM) agents are increasingly improved through interaction, yet most self-evoluti…

llm代理协同进化学习环境策略与环境错配自进化

17

⚡ 效率工具 Dev.to 2026-05-26

Solving Complex Logic with Claude and Research Papers

利用Claude和研究论文，解决实时语音翻译中的性别识别难题，AI辅助编程的实战案例。

Introduction When building apps with AI-assisted coding, you get to decide "what to build." You can set the design and the policy yourself. However, y…

claude 语音识别性别识别研究论文 ai辅助编码

18

🤖 AI·大模型 Hacker News AI 2026-05-26

An AI Interface for Research Papers

用AI重新定义科研论文的阅读与检索方式，一篇探讨交互界面创新的思考。

Article URL: https://justinross.substack.com/p/an-ai-interface-for-research-papers Comments URL: https://news.ycombinator.com/item?id=48269299 Points:…

ai交互研究论文学术工具界面创新科研效率

19

📝 深度技术 Dev.to 2026-05-24

What Developers Don’t Say in Interviews—but Show on GitHub

从研究论文看GitHub上的开发者行为，揭示面试中难以言说的真实一面

When I started working on my usability study project with KServe, I interacted with KServe users to understand the challenges they were experiencing w…

github 开发者行为面试开源研究论文

20

📝 深度技术 arXiv 机器学习 2026-05-20

Step-wise Rubric Rewards for LLM Reasoning

提出逐步评分奖励机制，优化LLM推理的中间步骤监督，突破传统仅奖励最终答案的局限。

arXiv:2605.17291v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) is widely used to improve reasoning in large lan…

llm推理强化学习分步奖励 rlvr 研究论文

🐂 牛哥精选