Zero-Shot Goal Recognition with Large Language Models
大语言模型在零样本目标识别中展现基于世界知识的溯因推理能力,超越符号规划方法。
arXiv:2605.15333v1 Announce Type: new Abstract: Large language models have recently reached near-parity with classical planners on well-known planning…
大语言模型在零样本目标识别中展现基于世界知识的溯因推理能力,超越符号规划方法。
arXiv:2605.15333v1 Announce Type: new Abstract: Large language models have recently reached near-parity with classical planners on well-known planning…
大语言模型辅助RTL生成基准亟需动态维护,本文提出基于智能体分析修订的RTL-BenchMT框架,解决现有基准案例缺陷问题。
arXiv:2605.15537v1 Announce Type: new Abstract: This paper introduces RTL-BenchMT, an agentic framework for dynamically maintaining RTL generation ben…
无需训练的GUI元素定位新方法,动态区域搜索提升高分辨率屏幕下的指令相关性识别效率。
arXiv:2605.15542v1 Announce Type: new Abstract: GUI agents powered by Multimodal Large Language Models (MLLMs) have demonstrated impressive capability…
LLM-based RCA代理易出错?STAR框架通过阶段分诊与修复提升微服务故障诊断可靠性。
arXiv:2605.15581v1 Announce Type: new Abstract: LLM-based root cause analysis (RCA) agents have recently emerged as a promising paradigm for incident …
用LLM代理自主设计基础模型架构,AIRA-Compose与AIRA-Design双框架实现递归自改进,跳出标准Transformer限制。
arXiv:2605.15871v1 Announce Type: new Abstract: Toward recursive self-improvement, we investigate LLM agents autonomously designing foundation models …
最新研究:LLM在税法推理中存在数据污染风险,别被“假懂”骗了!
arXiv:2605.16052v1 Announce Type: new Abstract: Recent advances in large language models (LLMs) have significantly enhanced automated legal reasoning.…
形式化方法与LLM结合,为AI系统合规提供审计、监控和干预的全生命周期治理方案。
arXiv:2605.16198v1 Announce Type: new Abstract: We examine one particular dimension of AI governance: how to monitor and audit AI-enabled products and…
完全开源且可审计的临床大模型流水线,解决AI医疗黑箱问题,数据来源与训练过程全透明。
arXiv:2605.16215v1 Announce Type: new Abstract: Clinical decision support systems (CDSS) require scrutable, auditable pipelines that enable rigorous, …
提出AgentStop方法,在消费设备上提前终止本地AI代理以节省能耗,兼顾隐私、成本与能效优化。
arXiv:2605.15206v1 Announce Type: cross Abstract: Autonomous agents powered by large language models (LLMs) are increasingly used to automate complex,…
研究揭示量化压缩大模型会破坏对齐导致偏差涌现,不同精度下影响显著
arXiv:2605.15208v1 Announce Type: cross Abstract: Large Language Models are routinely compressed via post-training quantization to reduce inference co…
用大模型配合检索增强,结合健康饮食指数,精准推荐个性化餐食,AI赋能营养科学。
arXiv:2605.15213v1 Announce Type: cross Abstract: Diet quality is a leading determinant of chronic disease risk. Advances in artificial intelligence (…
探索LLM与进化搜索结合时,执行基础设施设计对算法发现成功的关键影响,揭示三大工程设计问题。
arXiv:2605.15221v1 Announce Type: cross Abstract: AlphaEvolve and FunSearch have demonstrated the potential of combining large language models (LLMs) …
用多模态AI大模型突破RISC-V供应链异构数据分析,打通视觉与文本的芯片溯源新范式
arXiv:2605.15223v1 Announce Type: cross Abstract: This paper presents an LLM-empowered workflow for RISC-V supply chain analysis, integrating Vision-L…
生物启发式AI框架声称结构保证更可靠,这篇论文用三个深度基准实证检验其是否优于朴素替代方案。
arXiv:2605.15225v1 Announce Type: cross Abstract: Biologically-inspired AI agent frameworks claim reliability benefits through structural guarantees a…
首个评估AI Agent在真实硬件工程中导航、定位、EDA验证与修复能力的基准,揭示软件工程AI迁移至硬件的挑战。
arXiv:2605.15226v1 Announce Type: cross Abstract: We ask whether agentic AI systems built for software engineering transfer to realistic hardware engi…
用AI代理自动设计硬件加速器,突破传统高综合流程的劳动密集型瓶颈,为高效芯片设计带来新思路。
arXiv:2605.15237v1 Announce Type: cross Abstract: Accelerating applications through the design of hardware accelerators can significantly enhance syst…
突破传统结果公平,提出GESD度量过程解释稳定性,揭示算法偏见新维度
arXiv:2605.15295v1 Announce Type: cross Abstract: Machine learning (ML) algorithms are increasingly deployed in high-stakes decision-making domains su…
从人类自我中心视频提取物理常识监督,助力机器人学习更广物理理解的新方法
arXiv:2605.15298v1 Announce Type: cross Abstract: Vision-language-action models have advanced rapidly, but robot trajectories alone provide limited co…
LLM Agent的持久记忆被恶意内容污染,揭示新型睡眠记忆投毒攻击风险。
arXiv:2605.15338v1 Announce Type: cross Abstract: Large language models are increasingly augmented with persistent memory, allowing assistants to stor…
提出轨迹级评估框架LEAP,首次量化LLM在科学设计中的迭代学习过程,而非仅关注结果快照。
arXiv:2605.15341v1 Announce Type: cross Abstract: LLMs are increasingly deployed in autonomous laboratories, under the assumption that their domain pr…