牛哥精选 · 半年

📋 全部 🤖 AI·大模型 ⚡ 效率工具 📝 深度技术 🚀 产品观察 💰 商业科技 🔓 开源项目 🎨 设计创意 📖 阅读推荐 🏷 资源合集 🌱 成长效率

📝 深度技术 arXiv AI 2026-07-07

Measuring Harness-Induced Belief Divergence in Multi-Step LLM Agents

论文揭示多步LLM代理中控制框架导致的信念分歧，拷问基准测试的隐藏偏差。

arXiv:2607.04528v1 Announce Type: new Abstract: Software-agent benchmarks usually report whether an agent solves a task, but the agent reaches that ou…

llm代理信仰分歧基准测试多步推理评估方法

📝 深度技术 arXiv 机器学习 2026-06-18

REVES: REvision and VErification--Augmented Training for Test-Time Scaling

新方法REVES通过修订与验证增强训练，破解LLM测试时缩放与多步推理的错位难题。

arXiv:2606.18910v1 Announce Type: new Abstract: Test-time scaling via sequential revision has emerged as a powerful paradigm for enhancing Large Langu…

llm推理测试时缩放修订与验证后训练多步推理

🤖 AI·大模型 arXiv NLP 2026-06-16

MAWARITH: A Dataset and Benchmark for Legal Inheritance Reasoning with LLMs

用12,500条数据考验大模型在伊斯兰继承法中的复杂多步推理能力

arXiv:2603.07539v3 Announce Type: replace Abstract: Islamic inheritance law is challenging for large language models because solving inheritance cases…

数据集 llm 法律推理伊斯兰继承法基准测试

🤖 AI·大模型 arXiv AI 2026-06-10

RKSC: Reasoning-Aware KV Cache Sharing and Confident Early Exit for Multi-Step LLM Inference

多步LLM推理新突破：通过推理感知KV缓存共享与自信提前退出机制，大幅提升效率，已被ICML 2026 Workshop收录。

arXiv:2606.09937v1 Announce Type: cross Abstract: We introduce RKSC (Reasoning-Aware KV Cache Sharing), a training-free inference framework that elimi…

rksc kv缓存共享自信提前退出多步推理 llm推理优化

📝 深度技术 arXiv AI 2026-06-09

Context-Fractured Decomposition Attacks on Tool-Using LLM Agents: Exploiting Artifact Provenance Gaps

新型攻击揭示工具型LLM智能体因工件溯源缺口而存在的多步越狱漏洞，安全防御需突破单步文本隔离思维。

arXiv:2606.09084v1 Announce Type: cross Abstract: Tool-using LLM agents interact with the world through actions that persist state in artifacts (e.g.,…

llm agent 工具使用安全攻击工件溯源越狱漏洞

🤖 AI·大模型 arXiv AI 2026-05-29

Graph-Enhanced Policy Optimization in LLM Agent Training

将图结构融入LLM Agent策略优化，显著提升多步推理和任务完成能力。

arXiv:2510.26270v2 Announce Type: replace Abstract: Multi-step LLM agents in interactive environments represent a crucial step toward long-horizon dec…

llm agent 图增强策略优化强化学习图神经网络大模型训练

📝 深度技术 arXiv 机器学习 2026-05-23

VRPRM: Process Reward Modeling via Visual Reasoning

通过视觉推理提升过程奖励建模精度，为复杂任务训练提供新思路。

arXiv:2508.03556v3 Announce Type: replace Abstract: Process Reward Model (PRM) is widely used in the post-training of Large Language Model (LLM) becau…

过程奖励模型视觉推理多步推理奖励信号模型训练

📝 深度技术 arXiv 机器学习 2026-05-20

Diagnosing Multi-step Reasoning Failures in Black-box LLMs via Stepwise Confidence Attribution

被ICML 2026收录，提出逐步置信度归因方法，精准诊断黑盒大模型的多步推理失败原因。

arXiv:2605.19228v1 Announce Type: cross Abstract: Large Language Models have achieved strong performance on reasoning tasks with objective answers by …

大模型推理黑盒模型置信度归因多步推理 icml 2026

📝 深度技术 arXiv AI 2026-05-20

Unlocking Complex Visual Generation via Closed-Loop Verified Reasoning

闭环验证推理突破复杂视觉生成，用可验证的多步推理解决规划幻觉问题，效果惊艳。

arXiv:2605.14876v1 Announce Type: cross Abstract: Despite rapid advancements, current text-to-image (T2I) models predominantly rely on a single-step g…

闭环验证推理复杂视觉生成文本到图像多步推理规划幻觉

🚀 产品观察 OpenAI 官方博客 2026-05-19

Netomi’s lessons for scaling agentic systems into the enterprise

企业级AI智能体规模化实战：Netomi如何用GPT-4.1和GPT-5.2实现并发、治理与多步推理

How Netomi scales enterprise AI agents using GPT-4.1 and GPT-5.2—combining concurrency, governance, and multi-step reasoning for reliable production w…

netomi 企业ai智能体 gpt-4.1 gpt-5.2 扩展

📅 日期

2026-05-20 2026-05-19

🐂 牛哥精选

📅 日期