牛哥精选 · 半年

1

🤖 AI·大模型 arXiv AI 2026-05-19

Zero-Shot Goal Recognition with Large Language Models

大语言模型在零样本目标识别中展现基于世界知识的溯因推理能力，超越符号规划方法。

arXiv:2605.15333v1 Announce Type: new Abstract: Large language models have recently reached near-parity with classical planners on well-known planning…

零样本目标识别大语言模型世界知识溯因推理规划

2

📝 深度技术 arXiv AI 2026-05-19

RTL-BenchMT: Dynamic Maintenance of RTL Generation Benchmark Through Agent-Assisted Analysis and Revision

大语言模型辅助RTL生成基准亟需动态维护，本文提出基于智能体分析修订的RTL-BenchMT框架，解决现有基准案例缺陷问题。

arXiv:2605.15537v1 Announce Type: new Abstract: This paper introduces RTL-BenchMT, an agentic framework for dynamically maintaining RTL generation ben…

rtl生成基准维护大语言模型 eda 智能体

3

🤖 AI·大模型 arXiv AI 2026-05-19

DRS-GUI: Dynamic Region Search for Training-Free GUI Grounding

无需训练的GUI元素定位新方法，动态区域搜索提升高分辨率屏幕下的指令相关性识别效率。

arXiv:2605.15542v1 Announce Type: new Abstract: GUI agents powered by Multimodal Large Language Models (MLLMs) have demonstrated impressive capability…

gui ground multimodal 动态区域搜索训练-free 屏幕理解

4

📝 深度技术 arXiv AI 2026-05-19

STAR: A Stage-attributed Triage and Repair framework for RCA Agents in Microservices

LLM-based RCA代理易出错？STAR框架通过阶段分诊与修复提升微服务故障诊断可靠性。

arXiv:2605.15581v1 Announce Type: new Abstract: LLM-based root cause analysis (RCA) agents have recently emerged as a promising paradigm for incident …

star llm 根因分析微服务 aiops

5

📝 深度技术 arXiv AI 2026-05-19

Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design

用LLM代理自主设计基础模型架构，AIRA-Compose与AIRA-Design双框架实现递归自改进，跳出标准Transformer限制。

arXiv:2605.15871v1 Announce Type: new Abstract: Toward recursive self-improvement, we investigate LLM agents autonomously designing foundation models …

llm代理神经架构搜索自主设计 aira-compo aira-desig

6

🤖 AI·大模型 arXiv AI 2026-05-19

Reasoners or Translators? Contamination-aware Evaluation and Neuro-Symbolic Robustness in Tax Law

最新研究：LLM在税法推理中存在数据污染风险，别被“假懂”骗了！

arXiv:2605.16052v1 Announce Type: new Abstract: Recent advances in large language models (LLMs) have significantly enhanced automated legal reasoning.…

大型语言模型法律推理数据污染神经符号鲁棒性税法

7

📝 深度技术 arXiv AI 2026-05-19

Formal Methods Meet LLMs: Auditing, Monitoring, and Intervention for Compliance of Advanced AI Systems

形式化方法与LLM结合，为AI系统合规提供审计、监控和干预的全生命周期治理方案。

arXiv:2605.16198v1 Announce Type: new Abstract: We examine one particular dimension of AI governance: how to monitor and audit AI-enabled products and…

形式化方法大语言模型 ai治理审计监控合规性

8

🤖 AI·大模型 arXiv AI 2026-05-19

Fully Open Meditron: An Auditable Pipeline for Clinical LLMs

完全开源且可审计的临床大模型流水线，解决AI医疗黑箱问题，数据来源与训练过程全透明。

arXiv:2605.16215v1 Announce Type: new Abstract: Clinical decision support systems (CDSS) require scrutable, auditable pipelines that enable rigorous, …

临床llm 开源可审计流水线数据追溯临床决策支持

9

📝 深度技术 arXiv AI 2026-05-19

AgentStop: Terminating Local AI Agents Early to Save Energy in Consumer Devices

提出AgentStop方法，在消费设备上提前终止本地AI代理以节省能耗，兼顾隐私、成本与能效优化。

arXiv:2605.15206v1 Announce Type: cross Abstract: Autonomous agents powered by large language models (LLMs) are increasingly used to automate complex,…

ai代理本地部署能效优化大语言模型早期终止

10

📝 深度技术 arXiv AI 2026-05-19

Quantization Undoes Alignment: Bias Emergence in Compressed LLMs Across Models and Precision Levels

研究揭示量化压缩大模型会破坏对齐导致偏差涌现，不同精度下影响显著

arXiv:2605.15208v1 Announce Type: cross Abstract: Large Language Models are routinely compressed via post-training quantization to reduce inference co…

量化对齐偏差大模型压缩

11

🤖 AI·大模型 arXiv AI 2026-05-19

An LLM-RAG Approach for Healthy Eating Index-Informed Personalized Food Recommendations

用大模型配合检索增强，结合健康饮食指数，精准推荐个性化餐食，AI赋能营养科学。

arXiv:2605.15213v1 Announce Type: cross Abstract: Diet quality is a leading determinant of chronic disease risk. Advances in artificial intelligence (…

llm rag 健康饮食个性化推荐食品推荐系统

12

📝 深度技术 arXiv AI 2026-05-19

Effective Harness Engineering for Algorithm Discovery with Coding Agents

探索LLM与进化搜索结合时，执行基础设施设计对算法发现成功的关键影响，揭示三大工程设计问题。

arXiv:2605.15221v1 Announce Type: cross Abstract: AlphaEvolve and FunSearch have demonstrated the potential of combining large language models (LLMs) …

llm 进化搜索算法发现编码代理 harness en

13

📝 深度技术 arXiv AI 2026-05-19

GenAI-Driven Approach to RISC-V Supply Chain Exploration

用多模态AI大模型突破RISC-V供应链异构数据分析，打通视觉与文本的芯片溯源新范式

arXiv:2605.15223v1 Announce Type: cross Abstract: This paper presents an LLM-empowered workflow for RISC-V supply chain analysis, integrating Vision-L…

risc-v 供应链分析 llm vlm 模型驱动工程

14

📝 深度技术 arXiv AI 2026-05-19

Do Biological Structural Guarantees Earn Their Complexity?

生物启发式AI框架声称结构保证更可靠，这篇论文用三个深度基准实证检验其是否优于朴素替代方案。

arXiv:2605.15225v1 Announce Type: cross Abstract: Biologically-inspired AI agent frameworks claim reliability benefits through structural guarantees a…

生物启发式ai 结构保证基因调控网络免疫系统代谢控制

15

📝 深度技术 arXiv AI 2026-05-19

Is Agentic AI Ready for Real-World Hardware Engineering? A Deep Dive with Phoenix-bench

首个评估AI Agent在真实硬件工程中导航、定位、EDA验证与修复能力的基准，揭示软件工程AI迁移至硬件的挑战。

arXiv:2605.15226v1 Announce Type: cross Abstract: We ask whether agentic AI systems built for software engineering transfer to realistic hardware engi…

ai agent 硬件工程基准测试 eda 代码修复

16

📝 深度技术 arXiv AI 2026-05-19

A3D: Agentic AI flow for autonomous Accelerator Design

用AI代理自动设计硬件加速器，突破传统高综合流程的劳动密集型瓶颈，为高效芯片设计带来新思路。

arXiv:2605.15237v1 Announce Type: cross Abstract: Accelerating applications through the design of hardware accelerators can significantly enhance syst…

ai代理硬件加速器自动设计高综合学术论文

17

📝 深度技术 arXiv AI 2026-05-19

GESD: Beyond Outcome-Oriented Fairness

突破传统结果公平，提出GESD度量过程解释稳定性，揭示算法偏见新维度

arXiv:2605.15295v1 Announce Type: cross Abstract: Machine learning (ML) algorithms are increasingly deployed in high-stakes decision-making domains su…

gesd 公平性度量机器学习过程公平解释稳定性

18

🤖 AI·大模型 arXiv AI 2026-05-19

PhysBrain 1.0 Technical Report

从人类自我中心视频提取物理常识监督，助力机器人学习更广物理理解的新方法

arXiv:2605.15298v1 Announce Type: cross Abstract: Vision-language-action models have advanced rapidly, but robot trajectories alone provide limited co…

physbrain 机器人学习自我中心视频物理常识视觉语言动作模型

19

🤖 AI·大模型 arXiv AI 2026-05-19

Hidden in Memory: Sleeper Memory Poisoning in LLM Agents

LLM Agent的持久记忆被恶意内容污染，揭示新型睡眠记忆投毒攻击风险。

arXiv:2605.15338v1 Announce Type: cross Abstract: Large language models are increasingly augmented with persistent memory, allowing assistants to stor…

llm 记忆投毒持久记忆安全风险对抗攻击

20

📝 深度技术 arXiv AI 2026-05-19

LEAP: Trajectory-Level Evaluation of LLMs in Iterative Scientific Design

提出轨迹级评估框架LEAP，首次量化LLM在科学设计中的迭代学习过程，而非仅关注结果快照。

arXiv:2605.15341v1 Announce Type: cross Abstract: LLMs are increasingly deployed in autonomous laboratories, under the assumption that their domain pr…

llm评估科学设计迭代学习轨迹评估

🐂 牛哥精选

Zero-Shot Goal Recognition with Large Language Models

RTL-BenchMT: Dynamic Maintenance of RTL Generation Benchmark Through Agent-Assisted Analysis and Revision

DRS-GUI: Dynamic Region Search for Training-Free GUI Grounding

STAR: A Stage-attributed Triage and Repair framework for RCA Agents in Microservices

Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design

Reasoners or Translators? Contamination-aware Evaluation and Neuro-Symbolic Robustness in Tax Law

Formal Methods Meet LLMs: Auditing, Monitoring, and Intervention for Compliance of Advanced AI Systems

Fully Open Meditron: An Auditable Pipeline for Clinical LLMs

AgentStop: Terminating Local AI Agents Early to Save Energy in Consumer Devices

Quantization Undoes Alignment: Bias Emergence in Compressed LLMs Across Models and Precision Levels

An LLM-RAG Approach for Healthy Eating Index-Informed Personalized Food Recommendations

Effective Harness Engineering for Algorithm Discovery with Coding Agents

GenAI-Driven Approach to RISC-V Supply Chain Exploration

Do Biological Structural Guarantees Earn Their Complexity?

Is Agentic AI Ready for Real-World Hardware Engineering? A Deep Dive with Phoenix-bench

A3D: Agentic AI flow for autonomous Accelerator Design

GESD: Beyond Outcome-Oriented Fairness

PhysBrain 1.0 Technical Report

Hidden in Memory: Sleeper Memory Poisoning in LLM Agents

LEAP: Trajectory-Level Evaluation of LLMs in Iterative Scientific Design

📅 日期