牛哥精选 · 三个月

1

📝 深度技术 arXiv NLP 2026-07-15 NEW

Tracing Agentic Failure from the Flow of Success

从成功流程中追溯智能体失败根源，揭示AI自主决策的脆弱性

arXiv:2607.12747v1 Announce Type: cross Abstract: Failure attribution for LLM-based agentic systems, i.e., identifying which steps in a failure trajec…

智能体失败成功流程 ai自主决策脆弱性学术论文

2

🤖 AI·大模型 arXiv NLP 2026-07-15 NEW

Can Induced Emotion Bias LLM Behaviors in Sequential Decision Making?

最新研究揭示情绪诱导如何使大语言模型在序列决策中产生行为偏差，视角独特。

arXiv:2607.12631v1 Announce Type: new Abstract: As Large Language Models (LLMs) are increasingly deployed as autonomous agents in high-stakes domains,…

情绪诱导大语言模型序列决策行为偏差 arxiv论文

3

🤖 AI·大模型 arXiv NLP 2026-07-15 NEW

Agentic systems for breast cancer treatment recommendations

基于智能体系统，为乳腺癌治疗提供个性化推荐，前沿AI医疗研究。

arXiv:2607.12051v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly being explored for clinical decision support, but their …

乳腺癌治疗推荐智能体系统 ai医疗 arxiv论文

4

🤖 AI·大模型 arXiv 机器学习 2026-07-15 NEW

RAFP: Identifying LLM Lineages via Rare-Region Fingerprints

用「稀有区域指纹」精准追踪大模型血统，一篇学术新方法直接看。

arXiv:2505.12682v2 Announce Type: replace Abstract: Large language models (LLMs) are increasingly released under restricted licenses, creating a growi…

llm谱系识别稀有区域指纹模型溯源大模型安全 arxiv

5

🤖 AI·大模型 arXiv AI 2026-07-14

AgentAbstain: Do LLM Agents Know When Not to Act?

LLM代理能否智能地“不作为”？新研究探索Agent安全决策边界，为AI可靠性提供关键视角。

arXiv:2607.10059v1 Announce Type: new Abstract: Agent systems based on large language models (LLMs) are increasingly deployed for autonomous tasks, ye…

llm ai代理弃权决策安全性

6

🤖 AI·大模型 arXiv AI 2026-07-14

Tool-MCoT: Tool Augmented Multimodal Chain-of-Thought for Content Safety Moderation

多模态链式推理+工具调用，提升内容安全审核的准确性与可解释性。

arXiv:2604.06205v2 Announce Type: replace-cross Abstract: The growth of online platforms and user content requires strong content moderation systems t…

内容安全多模态 chain-of-t 工具增强 arxiv

7

🤖 AI·大模型 arXiv AI 2026-07-14

Agentic Context Learning with Self-Discovered Specification

让AI自动发现任务规范，在上下文中动态学习，实现更强自主推理能力。

arXiv:2607.09794v1 Announce Type: new Abstract: Context learning is an emerging inference-time task where LLMs must learn and apply novel, task-specif…

agentic ai 上下文学习自我发现规范大模型推理自主智能

8

📝 深度技术 arXiv 计算机视觉 2026-07-14

EvoGuard: An Extensible Agentic RL-based Framework for Practical and Evolving AI-Generated Image Detection

一种基于智能体强化学习的可扩展框架，让AI生成图像检测能随新攻击方式持续进化。

arXiv:2603.17343v2 Announce Type: replace Abstract: The rapid proliferation of AI-Generated Images (AIGIs) poses severe misinformation risks, making A…

ai图像检测强化学习智能体框架可扩展性 arxiv论文

9

🤖 AI·大模型 arXiv AI 2026-07-14

Depth-Entropy Guided Sampling for Training-Free LLM Reasoning

无需训练的LLM推理新方法：深度熵引导采样，提升推理效率与质量。

arXiv:2607.09693v1 Announce Type: cross Abstract: Reinforcement learning (RL) has become the dominant paradigm for improving the reasoning capabilitie…

深度熵引导采样无需训练 llm推理采样策略 arxiv论文

10

📝 深度技术 arXiv 机器学习 2026-07-14

Forecasting Generative Amplification

一篇探索生成式AI放大效应预测的前沿论文，从理论框架到实验验证均有突破性贡献。

arXiv:2509.08048v4 Announce Type: replace-cross Abstract: Generative networks are perfect tools to enhance the speed and precision of LHC simulations.…

生成式放大预测框架 ai模型扩展性能缩放 arxiv论文

11

🤖 AI·大模型 arXiv AI 2026-07-14

RVN-Bench: A Benchmark for Reactive Visual Navigation

全新基准测试RVN-Bench，专为反应式视觉导航设计，推动机器人自主导航评估标准化

arXiv:2603.03953v2 Announce Type: replace-cross Abstract: Safe visual navigation is critical for indoor mobile robots operating in cluttered environme…

反应式视觉导航基准测试机器人视觉导航导航评估

12

🤖 AI·大模型 arXiv AI 2026-07-14

SETA: Scaling Environments for Terminal Agents

终端智能体新突破：SETA提出规模化环境，为AI代理研究提供强大基准

arXiv:2607.10891v1 Announce Type: new Abstract: Large language models (LLMs) are rapidly shifting toward agents that solve tasks through diverse inter…

seta terminal a 扩展环境 ai代理 arxiv论文

13

📝 深度技术 arXiv NLP 2026-07-13

Hierarchical Chain-of-Thought: Enhancing LLM Reasoning Performance and Efficiency

层次化思维链新方法，显著提升大模型推理性能与效率，值得关注的前沿研究。

arXiv:2604.00130v2 Announce Type: replace Abstract: Chain-of-Thought (CoT) prompting has significantly improved the reasoning capabilities of large la…

层次化思维链 llm推理推理效率思维链 arxiv论文

14

📝 深度技术 arXiv AI 2026-07-13

ProofCouncil: An LLM Agent for Solving Open Mathematical Problems

大模型智能体闯入数学证明领域，ProofCouncil挑战开放数学问题，在FirstProof基准中展现惊人推理能力。

arXiv:2607.09474v1 Announce Type: new Abstract: Large language models (LLMs) have shown increasing promise in solving open problems in mathematics. Ho…

proofcounc llm agent 开放数学问题数学证明大模型推理

15

🤖 AI·大模型 arXiv AI 2026-07-13

Contrastive Weak-to-strong Generalization

一篇探讨对比学习框架下弱到强泛化的新论文，理论分析和实验验证结合，为AI大模型泛化研究提供新视角。

arXiv:2510.07884v2 Announce Type: replace-cross Abstract: Weak-to-strong generalization provides a promising paradigm for scaling large language model…

对比学习弱到强泛化泛化能力大模型理论分析

16

📝 深度技术 arXiv AI 2026-07-09

Effective Strategies for Asynchronous Software Engineering Agents

最新arXiv论文揭示异步软件工程代理的核心策略，为构建高效自主编程智能体提供系统方法论。

arXiv:2603.21489v2 Announce Type: replace-cross Abstract: AI agents have become increasingly capable at isolated software engineering (SWE) tasks such…

异步软件工程 ai代理策略研究 arxiv论文 2026

17

📝 深度技术 arXiv AI 2026-07-09

Measuring the metacognition of AI

如何量化AI的元认知能力？这篇论文探索了AI系统在不确定环境中决策的内在机制。

arXiv:2603.29693v3 Announce Type: replace Abstract: A robust decision-making process must take into account uncertainty, especially when the choice in…

元认知 ai决策不确定性管理 arxiv

18

📝 深度技术 arXiv AI 2026-07-09

NonTextual Target Attack

新研究揭示非文本目标攻击方法，探索无需文本线索的对抗样本生成，为AI安全防御提供新视角。

arXiv:2510.02999v5 Announce Type: replace-cross Abstract: Existing gradient-based jailbreak attacks on Large Language Models (LLMs) typically optimize…

对抗攻击 ai安全非文本目标攻击 arxiv论文

19

📝 深度技术 arXiv AI 2026-07-09

Digital Fragmentation and Generative AI Use Across 103 Million Application Events

基于1.03亿应用事件的大规模研究，揭示数字碎片化与生成式AI使用行为的深层关联。

arXiv:2607.06681v1 Announce Type: cross Abstract: Knowledge workers switch between applications thousands of times per day, spending nearly a tenth of…

数字碎片化生成式ai 用户行为大规模数据分析 arxiv论文

20

📝 深度技术 arXiv NLP 2026-07-08

BabyVision: Visual Reasoning Beyond Language

突破语言束缚，BabyVision让视觉推理不再依赖文字，打造更接近人类婴儿的纯视觉认知能力。

arXiv:2601.06521v2 Announce Type: replace-cross Abstract: While humans develop core visual skills long before acquiring language, contemporary Multimo…

babyvision 视觉推理无语言推理多模态 ai模型

🐂 牛哥精选

Tracing Agentic Failure from the Flow of Success

Can Induced Emotion Bias LLM Behaviors in Sequential Decision Making?

Agentic systems for breast cancer treatment recommendations

RAFP: Identifying LLM Lineages via Rare-Region Fingerprints

AgentAbstain: Do LLM Agents Know When Not to Act?

Tool-MCoT: Tool Augmented Multimodal Chain-of-Thought for Content Safety Moderation

Agentic Context Learning with Self-Discovered Specification

EvoGuard: An Extensible Agentic RL-based Framework for Practical and Evolving AI-Generated Image Detection

Depth-Entropy Guided Sampling for Training-Free LLM Reasoning

Forecasting Generative Amplification

RVN-Bench: A Benchmark for Reactive Visual Navigation

SETA: Scaling Environments for Terminal Agents

Hierarchical Chain-of-Thought: Enhancing LLM Reasoning Performance and Efficiency

ProofCouncil: An LLM Agent for Solving Open Mathematical Problems

Contrastive Weak-to-strong Generalization

Effective Strategies for Asynchronous Software Engineering Agents

Measuring the metacognition of AI

NonTextual Target Attack

Digital Fragmentation and Generative AI Use Across 103 Million Application Events

BabyVision: Visual Reasoning Beyond Language

📅 日期