牛哥精选 · 半年

1

🤖 AI·大模型 arXiv NLP 2026-07-15

Modeling Story Expectations: A Generative Framework using LLMs

利用LLM建模故事期待，提出生成框架，为叙事理解与文本生成开辟新思路。

arXiv:2412.15239v4 Announce Type: replace Abstract: Consumers' engagement with stories is shaped by their expectations about what will happen next, ye…

llm 故事生成叙事理解生成框架自然语言处理

2

🤖 AI·大模型 arXiv AI 2026-07-14

Do Video-LLMs Actually Watch? Diagnosing Character-Tracking Failures in Long-Form Video

诊断Video-LLM在长视频中跟踪角色的失败，揭示当前最强开源模型未能真正“观看”视频的局限。

arXiv:2607.11078v1 Announce Type: cross Abstract: Can a Video Large Language Model (Video-LLM) follow one person through a long video, keeping track o…

video-llm 视觉大语言模型长视频理解目标跟踪失败诊断

3

🤖 AI·大模型 IT 之家 2026-07-13

商汤发布并开源 SenseNova-Vision 理解生成统一视觉大模型，能力超越 Vision Banana

商汤开源统一视觉大模型 SenseNova-Vision，全任务能力超越 Vision Banana，视觉理解与生成实现原生统一。

IT之家 7 月 13 日消息，商汤科技今日发布并全面开源日日新 SenseNova-Vision 理解生成统一视觉大模型，这是商汤日日新大模型体系的重要视觉能力升级。商汤科技表示，行业以往的 " 统一视觉 " 多是把检测、分割、深度预测等多个专家模型打包封装，本质还是割裂的。SenseNov…

商汤发布并开理解生成统一视觉大模型能力超越商汤

4

🤖 AI·大模型 Dev.to 2026-07-13

Building an AI Agent Application: My Experiment with Intelligent Workflows 🚀

作者跳出简单LLM界面，动手探索AI Agent如何理解上下文、推理任务并赋能智能工作流

I’m excited to share one of my recent projects — an AI agent application I built to explore how intelligent systems can move beyond simple chat intera…

ai agent 智能工作流 llm 应用构建上下文理解

5

📝 深度技术 arXiv AI 2026-07-09

Does AI Understand Imaging? A Systematic Benchmark of Agentic AI for Computational Imaging Tasks

系统化基准测试揭示智能体AI在计算成像任务中的真实表现与理解局限

arXiv:2607.07189v1 Announce Type: new Abstract: Vision-language models (VLMs) and agentic AI have shown strong performance on semantic visual tasks, b…

智能体ai 计算成像基准测试图像理解 ai能力评估

6

🤖 AI·大模型 arXiv NLP 2026-07-08

EgoDyn-Bench: Evaluating Ego-Motion Understanding in Vision-Centric Foundation Models for Autonomous Driving

新基准EgoDyn-Bench专测自动驾驶视觉基础模型的自我运动理解能力，诊断漏洞提升安全性。

arXiv:2604.22851v2 Announce Type: replace-cross Abstract: While Vision-Language Models (VLMs) have advanced high-level reasoning in autonomous driving…

自动驾驶基础模型自我运动理解视觉感知评估基准

7

🤖 AI·大模型 arXiv 机器学习 2026-07-07

Agentic Very Long Video Understanding

提出智能体方法攻克超长视频理解难题，论文系统全面，含29页与8图表

arXiv:2601.18157v3 Announce Type: replace-cross Abstract: The advent of always-on personal AI assistants, enabled by all-day wearable devices such as …

长时间视频理解智能体视频理解论文 arxiv

8

🤖 AI·大模型 arXiv AI 2026-07-07

K9-Bench: Evaluating Multimodal LLMs on Canine-Centric Videos

首个针对狗狗视频的多模态LLM评估基准，测模型到底多懂汪星人。

arXiv:2607.02680v1 Announce Type: cross Abstract: MLLMs have shown strong zero-shot capabilities across diverse inputs such as across images, video, a…

k9-bench 多模态大模型评估基准犬类视频 ai理解

9

📝 深度技术 arXiv 计算机视觉 2026-07-07

VKnowU: Evaluating Visual Knowledge Understanding in Multimodal LLMs

评估多模态大模型视觉知识理解新基准，揭示AI对物理与社会常识的认知短板

arXiv:2511.20272v2 Announce Type: replace Abstract: While Multimodal Large Language Models (MLLMs) have become adept at recognizing objects, they ofte…

多模态大模型视觉知识评估基准常识理解 mllm

10

🤖 AI·大模型 arXiv AI 2026-07-07

Query2Diagram: Answering Developer Queries with UML Diagrams

用大模型自动将开发者查询转化为UML图，精准解答代码结构与逻辑问题，提升调试与理解效率。

arXiv:2604.23816v2 Announce Type: replace-cross Abstract: Software documentation frequently becomes outdated or fails to exist entirely, yet developer…

query2diag uml图大模型开发者查询代码理解

11

🔓 开源项目 Hacker News LLM 2026-07-03

Claude-real-video － any LLM can watch a video

开源工具让Claude等LLM真正“看懂”视频，不只抽帧，还结合音频分析动态内容。

Article URL: https://github.com/HUANGCHIHHUNGLeo/claude-real-video Comments URL: https://news.ycombinator.com/item?id=48766005 Points: 3 # Comments: 0

claude 视频理解 llm 开源工具帧提取

12

💰 商业科技 36氪 2026-07-02

36氪首发 | 前大疆科学家创业，半年内连获四轮数亿融资，耀途资本、锦秋基金等押注

前大疆科学家张富创业，半年获四轮数亿融资，用新一代空间理解技术让飞行器「可操作」世界

作者丨欧雪编辑丨袁斯来硬氪获悉，聚焦通用空中智能的硅羽科技（SPARO）半年内已连续完成四轮数亿元融资，在种子轮耀途资本之后，锦秋基金、阿里巴巴、弘毅投资、普洛斯隐山资本、云时资本相继入局。资金用途方面，现阶段融资将主要⽤于扩充关键岗位的团队建设、推进产品线的商业化落地与规模化交付，以及加速技术…

氪首发前大疆科学家创业半年内连获四轮数亿融资

13

🤖 AI·大模型 arXiv 计算机视觉 2026-07-01

M3CoTBench: Benchmark Chain-of-Thought of MLLMs in Medical Image Understanding

首个专为医学图像理解设计的思维链基准测试，揭示多模态大模型推理缺陷！

arXiv:2601.08758v4 Announce Type: replace-cross Abstract: Chain-of-Thought (CoT) reasoning has proven effective in enhancing large language models by …

m3cotbench 医学图像理解思维链多模态大模型基准测试

14

🤖 AI·大模型 arXiv 计算机视觉 2026-06-30

BrepLLM: Enabling Large Language Models to Understand Boundary Representations

让大语言模型首次理解3D CAD边界表示，打破自然语言与几何建模的壁垒。

arXiv:2512.16413v2 Announce Type: replace Abstract: Current token-sequence-based Large Language Models (LLMs) struggle to directly process 3D Boundary…

brepllm 大语言模型边界表示 3d cad 几何理解

15

📝 深度技术 arXiv 计算机视觉 2026-06-30

MotionAtlas: Detailed Region Captioning for Motion-Centric Videos

ECCV 2026新作，MotionAtlas攻克运动中心视频的精细区域描述，为视频理解提供更精准的细节维度。

arXiv:2606.29531v1 Announce Type: new Abstract: We propose MotionAtlas, a system for detailed captioning of motion-centric videos, comprising (1) a de…

motionatla 视频理解区域描述运动分析 eccv 2026

16

🤖 AI·大模型 arXiv 计算机视觉 2026-06-30

Enhancing Part-Level Point Grounding for Any Open-Source MLLMs

提出无需额外训练即可增强任意开源多模态大模型对物体部件级点指向理解能力的新方法

arXiv:2606.29267v1 Announce Type: new Abstract: Visual grounding aims to associate free-form textual queries with specific regions in an image. While …

点定位多模态大模型部件级理解 open-sourc 模型增强

17

🤖 AI·大模型 arXiv NLP 2026-06-30

Poller: Are LLMs Suitable for Evaluating the Poetry Understanding Task?

研究LLM能否胜任诗歌理解评估任务，首次提出Poller方法探索AI在文学评价中的可行性。

arXiv:2606.30556v1 Announce Type: new Abstract: Traditional automatic evaluation methods have been shown to be unsuitable for modern Chinese poetry be…

llm评估诗歌理解自动评估 poller 文学评价

18

🤖 AI·大模型 arXiv NLP 2026-06-30

A Hybrid Framework for Song Lyric Annotation Based on Human-LLM Alignment

人类智慧与AI大模型协同，突破歌词标注的语义理解瓶颈

arXiv:2606.29273v1 Announce Type: new Abstract: Emotion recognition of song lyrics is a challenging task since lyrics may not necessarily align with t…

歌词标注混合框架人机对齐 llm 知识图谱

19

📝 深度技术 arXiv NLP 2026-06-30

Latent Bridges for Multi-Table Question Answering

针对多表自然语言问答，提出「潜在桥梁」方法，隐式跨表关联实现高效推理。

arXiv:2606.28916v1 Announce Type: new Abstract: We introduce GRAB, a constructor-encoder-bridge pipeline for table question answering. Our method lift…

多表问答潜在桥梁表格推理关系理解深度学习

20

📝 深度技术 arXiv AI 2026-06-29

JD Oxygen AI Item Center (Oxygen AIIC) V1: An Industrial-Scale LLM/VLM-Centric Solution for Item Understanding, Management, and Applications

京东发布工业级商品理解系统Oxygen AIIC V1，基于LLM/VLM处理数十亿SKU，提升电商运营效率与用户体验。

arXiv:2606.28070v1 Announce Type: new Abstract: JD.com, one of the world's largest e-commerce platforms, serves over 700 million active users and mill…

京东 llm vlm 商品理解工业级

🐂 牛哥精选