Roundtables: Can AI Learn to Understand the World?
AI能否真正理解世界?MIT权威圆桌聚焦2026年世界模型新突破
Listen to the session or watch below AI companies want to build systems that understand the external world and overcome the limitations of LLMs. Recen…
AI能否真正理解世界?MIT权威圆桌聚焦2026年世界模型新突破
Listen to the session or watch below AI companies want to build systems that understand the external world and overcome the limitations of LLMs. Recen…
一篇深度解析客户理解四层次的实战指南,用“可能”与“大概率”的语义对比揭示语言陷阱,助你精准把握用户需求。
What people say, feel, think, and do are often very different things. To understand the underlying reasons for user behavior, it helps to look beyond …
链式思维加持3D点云推理,PointLLM-R让大模型空间理解更精准
arXiv:2605.22013v1 Announce Type: new Abstract: Understanding 3D point clouds through language remains a fundamental challenge in computer graphics an…
语言模型如何理解“稍微”与“大幅”的细微差别?这项研究用一个精心构建的程度词量表测试了LLM在数值行动中的语义一致性。
arXiv:2605.21827v1 Announce Type: new Abstract: Do language models preserve the ordinal meaning of intensity words when those words must produce numer…
这篇论文探讨如何通过技能规格说明帮助用户准确理解LLM Agent的边界,基于878个网络安全技能的分析。
arXiv:2605.19362v1 Announce Type: cross Abstract: Users often interpret and select agent skills through their \texttt{SKILL.md} specifications. To pro…
提出“数据探针”新范式,从因果层面揭示数据如何影响大模型性能,AI研究必读。
arXiv:2605.18801v1 Announce Type: cross Abstract: Data is fundamental to large language models (LLMs). However, understanding of what makes certain da…
ACL 2025论文提出Sparse-to-Dense方法,实现视频理解在LLM中的无损加速,堪称"免费午餐
arXiv:2505.19155v2 Announce Type: replace-cross Abstract: Due to the auto-regressive nature of current video large language models (Video-LLMs), the i…
Intuitive Surgical挑战赛2022-2025成果揭晓,聚焦手术工具定位与视频理解前沿进展
arXiv:2305.07152v4 Announce Type: replace Abstract: Robotic assisted (RA) surgery promises to transform surgical intervention. Intuitive Surgical is c…
提出高效视觉编码器,解决Video LLM长视频中视觉token爆炸难题,突破帧扩展瓶颈。
arXiv:2605.17260v1 Announce Type: new Abstract: The fundamental challenge in scaling Video Large Language Models (Video LLMs) to long-form video lies …
重新思考高分辨率多模态大模型中的Zoom-IN方法,提出Hierarchical Decoupling框架,显著提升视觉理解性能。
arXiv:2510.00054v2 Announce Type: replace Abstract: Multimodal Large Language Models (MLLMs) have made significant strides in visual understanding tas…
研究发现视频MLLMs的音频理解实际依赖视觉线索,揭示模型幻觉问题,挑战多模态真实性。
arXiv:2605.16403v1 Announce Type: new Abstract: Despite rapid progress in video-capable MLLMs, we find that their apparent audio understanding in vide…
对比人类与LLM在条件句预设投射上的推理能力,揭示模型语义理解的差异与局限。
arXiv:2605.18352v1 Announce Type: new Abstract: Presupposition projection in conditionals is central to theories of meaning and pragmatics, yet it rem…
多模态大模型新突破,通过自蒸馏策略让AI学会捕捉视觉细节,显著提升细粒度理解能力。
arXiv:2605.18740v1 Announce Type: cross Abstract: Multimodal Large Language Models (MLLMs) still struggle with fine-grained visual understanding, wher…
EgoVis 2026 CASTLE挑战赛亚军方案的技术报告,详解多模态场景理解新方法MARS,视觉AI进阶必读
arXiv:2605.18176v1 Announce Type: new Abstract: This report presents MARS, short for Multimodal Agentic Reasoning with Source selection, our system fo…
探讨大型单体仓库如何适应AI工具,核心是让代码更规范、可理解,实现AI就绪等于人类就绪。
The org is pushing for AI-readiness! We run a fairly large monorepo, part of that is a shared web-platform. Our team is struggling with the inbound co…
研究韩语口语问答中ASR与LLM级联系统的错误传播机制,为改进多语言语音交互提供关键参考。
arXiv:2605.17443v1 Announce Type: new Abstract: We analyze how automatic speech recognition (ASR) errors propagate through ASR-LLM cascades in Korean …
多模态大模型在空间智能上的突破,赋予AI更强的视觉感知与推理能力。
arXiv:2505.23747v2 Announce Type: replace-cross Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) have significantly enhanced …
AI赋能歌词深度解析,帮你发现歌曲背后的诗歌、隐喻与故事
This is a submission for the Gemma 4 Challenge: Build with Gemma 4 What I Built LyricLens is an AI-powered lyrical forensics tool that helps listeners…
新型方法让视觉语言模型突破3D密集几何感知瓶颈,实现高效深度估计。
arXiv:2605.15876v1 Announce Type: new Abstract: Vision-Language Models (VLMs) excel at 2D tasks such as grounding and captioning, yet remain limited i…
揭示大模型英语偏见真相,证明持续预训练成本优势不存在,语言专用投资或成必然。
arXiv:2605.15613v1 Announce Type: new Abstract: Through an analysis of sequences generated by open-weight large language models (LLMs), we demonstrate…