牛哥精选 · 所有

1

📝 深度技术 arXiv NLP 2026-07-07

Token-level Response-visual Attention Guidance for Multimodal LLMs Knowledge Distillation

针对多模态大语言模型压缩难题，提出Token级响应-视觉注意力引导，提升蒸馏效果

arXiv:2607.02593v1 Announce Type: cross Abstract: While knowledge distillation (KD) is widely adopted for training lightweight models by leveraging su…

知识蒸馏多模态大语言模型注意力引导模型压缩 token级

2

📝 深度技术 arXiv 计算机视觉 2026-07-02

Decompose, Compare, and Decide: Multimodal LLMs are Implicit Few-Shot Learners

多模态大模型暗藏隐式少样本学习能力，DCD框架帮你拆解、对比、决策。

arXiv:2607.00125v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) have demonstrated remarkable abilities when analyzing images,…

多模态大语言模型少样本学习隐式学习分解比较决策

3

📝 深度技术 arXiv AI 2026-07-01

Learning from Failure: Inference-Time Self-Improvement for Computer-Use Agents

让计算机代理从失败中自我进化，推理时实时改进，打破高质量轨迹数据瓶颈。

arXiv:2606.31270v1 Announce Type: cross Abstract: Computer-use agents, which leverage multimodal large language models (MLLMs) to operate computers an…

计算机使用代理多模态大语言模型推理时自我改进失败学习轨迹收集

4

📝 深度技术 arXiv AI 2026-07-01

ADAPT: Attention Dynamics Alignment with Preference Tuning for Faithful MLLMs

提出注意力动态对齐与偏好微调方法，从内部注意力演化破解多模态幻觉难题

arXiv:2606.31054v1 Announce Type: cross Abstract: Multimodal Large Language Models (MLLMs) are critically hampered by hallucination, generating conten…

多模态大语言模型幻觉抑制注意力对齐偏好微调跨模态注意力

5

🤖 AI·大模型 arXiv 计算机视觉 2026-06-30

Personalizing MLLMs via Reinforced Multimodal Reference Game

通过强化多模态参考游戏实现MLLM的个性化定制，论文已被ECCV 2026接收。

arXiv:2606.28845v1 Announce Type: new Abstract: Personalizing Multimodal Large Language Models (MLLMs) aims to recognize users' unique concepts from v…

多模态大语言模型个性化强化学习参考游戏 eccv 2026

6

🤖 AI·大模型 arXiv 计算机视觉 2026-06-26

CORTEX: A Structured Reasoning Benchmark for Trustworthy 3D Chest CT MLLMs

新基准CORTEX专为3D胸部CT的多模态大语言模型设计，强调结构化推理而非仅看最终答案，让诊断结果可溯源至证据，提升可信度。

arXiv:2606.27264v1 Announce Type: new Abstract: Reasoning in multimodal large language models (MLLMs) has shown strong promise in medical imaging. How…

医学影像多模态大模型 3d ct 推理基准可信ai

7

🤖 AI·大模型 arXiv 计算机视觉 2026-06-25

Are We There Yet? Exploring the Capabilities of MLLMs in Assistive AI Applications

最新研究系统评估多模态大模型在辅助AI应用（如视觉、听觉辅助）中的真实能力，揭示当前进展与局限。

arXiv:2606.25084v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) have redefined visual understanding by combining vision encod…

多模态大语言模型辅助ai 能力评估应用研究

8

📝 深度技术 arXiv 机器学习 2026-06-25

Curvature-Guided Mixing for MLLM Adaptation

理论严谨的曲率引导混合方法，有效缓解多模态大模型微调中的灾难性遗忘。

arXiv:2606.24963v1 Announce Type: cross Abstract: Fine-tuning Multimodal Large Language Models (MLLMs) on specialized tasks often leads to catastrophi…

curvature- mllm 微调灾难性遗忘模型融合

9

🤖 AI·大模型 arXiv AI 2026-06-23

AIR: Adaptive Interleaved Reasoning with Code in MLLMs

提出一种自适应交错代码推理方法，让多模态大语言模型在混合推理中更灵活高效

arXiv:2606.23678v1 Announce Type: cross Abstract: Following the paradigm shift initiated by OpenAI o3, interleaved reasoning with code to enhance mult…

多模态大语言模型自适应推理代码推理交错推理 mllm

10

📝 深度技术 arXiv AI 2026-06-23

Is Our Benchmark Enough? An Analysis of Continual Learning for MLLMs

多模态大模型持续学习基准够用吗？ICML 2026 Workshop论文深度剖析现有评估体系的不足。

arXiv:2606.20961v1 Announce Type: cross Abstract: Continual adaptation is essential for multimodal large language models (MLLMs) deployed across evolv…

mllms 持续学习基准评估多模态大语言模型持续适应

11

🤖 AI·大模型 arXiv 计算机视觉 2026-06-23

Faithful Grounded Visual Reasoning via Learned Proxy-Tokens

提出通过可学习的代理令牌实现忠实的基础视觉推理，破解多模态大模型在视觉问答中的黑盒困境，增强可解释性。

arXiv:2606.23354v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) have achieved remarkable success in Visual Question Answering…

多模态大模型视觉问答可解释性 gvr 代理令牌

12

🤖 AI·大模型 arXiv 计算机视觉 2026-06-23

OmniSpace: Efficient Geometry Awareness for Autonomous Vehicles MLLMs

自动驾驶MLLM新突破，高效几何感知让大模型更懂空间结构

arXiv:2606.22617v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) have achieved remarkable performance on 2D visual tasks, yet …

自动驾驶多模态大模型几何感知效率优化 omnispace

13

🤖 AI·大模型 arXiv AI 2026-06-19

Evaluating and Enhancing Negation Comprehension in Remote Sensing MLLMs

远程感知MLLM在否定句理解上存在显著短板，新研究首次系统评估并提出增强策略，精准提升多模态模型语义层次。

arXiv:2606.20177v1 Announce Type: cross Abstract: Multimodal Large Language Models (MLLMs) have demonstrated remarkable success in various Remote Sens…

否定理解遥感多模态大模型评估增强

14

🤖 AI·大模型 arXiv 计算机视觉 2026-06-16

Multimodal LLM-Empowered Re-Ranking for Generalizable Person Re-Identification

多模态大语言模型如何提升行人重识别的泛化能力？本文提出全新重排序框架，为计算机视觉领域带来突破性思路。

arXiv:2606.16161v1 Announce Type: new Abstract: Domain Generalizable (DG) person re-identification (Re-ID) has attracted growing research interest due…

多模态大模型行人重识别重排序泛化性计算机视觉

15

📝 深度技术 arXiv AI 2026-06-15

COGNITION: From Evaluation to Defense against Multimodal LLM CAPTCHA Solvers

被USENIX Security'26收录，系统评估并防御多模态大模型破解验证码的最新研究。

arXiv:2512.02318v4 Announce Type: replace-cross Abstract: This paper studies how multimodal large language models (MLLMs) undermine the security guara…

多模态大模型 captcha安全对抗防御验证码破解 usenix sec

16

📝 深度技术 arXiv AI 2026-06-10

Beyond APIs: Probing the Limits of MLLMs in Physical Tool Use

MLLMs从调用API到操纵物理工具，一文揭示其能力边界与核心挑战。

arXiv:2606.10803v1 Announce Type: cross Abstract: Multimodal Large Language Models (MLLMs) excel at utilizing digital APIs and increasingly serve as t…

mllms 物理工具使用具身ai 能力边界多模态大模型

17

🤖 AI·大模型 arXiv AI 2026-06-10

Mitigating Manifold Departure: Uncertainty-Aware Subspace Rectification for Trustworthy MLLM Decoding

ICML 2026新研究用不确定性感知子空间纠正，让多模态大模型解码更可信，有效缓解流形偏离问题。

arXiv:2606.09859v1 Announce Type: cross Abstract: MLLMs frequently hallucinate objects inconsistent with visual inputs. This issue is typically attrib…

多模态大模型不确定性量化流形学习可信解码子空间修正

18

📝 深度技术 arXiv AI 2026-06-10

From Senses to Decisions: The Information Flow of Auditory and Visual Perception in Multimodal LLMs

多模态大模型能听会看，但音频和视觉信号如何内部流动最终影响回答？这篇论文深入追踪其信息路径。

arXiv:2606.10147v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) can listen and see, but how do audio and visual signals actua…

多模态大语言模型音频感知视觉感知信息流动内部机制

19

📝 深度技术 arXiv 计算机视觉 2026-06-09

AVI-Bench: Toward Human-like Audio-Visual Intelligence of Omni-MLLMs

首个专为多模态大模型设计的视听融合基准AVI-Bench，推动类人智能理解音视频协同信息。

arXiv:2606.07643v1 Announce Type: new Abstract: Recent advances in Omni-Multimodal Large Language Models (Omni-MLLMs) have enabled strong integration …

音频-视觉智能多模态大语言模型基准测试 icml2026 融合理解

20

🤖 AI·大模型 arXiv 计算机视觉 2026-06-09

Reason Twice: Segmentation via Candidate Discovery and Comparative Reasoning

多模态大模型新框架：先候选发现再比较推理，解决复杂查询下的图像分割难题

arXiv:2606.09303v1 Announce Type: new Abstract: The rapid development of pretrained foundation models has enabled more general image segmentation. Mul…

图像分割多模态大语言模型候选发现比较推理视觉理解

🐂 牛哥精选