牛哥精选 · 本月

1

📝 深度技术 arXiv 机器学习 2026-05-25

Millimeter-wave Imaging for Anthropometric Body Measurement

毫米波技术赋能人体测量，非接触高精度成像新方案，arXiv最新研究突破。

arXiv:2605.23064v1 Announce Type: cross Abstract: Body shape and circumferences are clinically informative biomarkers for risk stratification, includi…

毫米波成像人体测量计算机视觉信号处理深度学习

2

🤖 AI·大模型 arXiv 计算机视觉 2026-05-22

VChain: Chain-of-Visual-Thought for Reasoning in Video Generation

视频生成新范式，VChain用视觉思维链提升推理能力，获ICCV 2025 Workshop最佳论文奖，值得关注。

arXiv:2510.05094v2 Announce Type: replace Abstract: Recent video generation models can produce smooth and visually appealing clips, but they often str…

视频生成视觉思维链推理多模态计算机视觉

3

📝 深度技术 arXiv 计算机视觉 2026-05-21

FruitEnsemble: MLLM-Guided Arbitration for Heterogeneous ensemble in Fine-Grained Fruit Recognition

将多模态大语言模型引入细粒度水果识别，通过仲裁机制实现异构集成，大幅提升准确率。

arXiv:2605.20892v1 Announce Type: new Abstract: Fine-grained fruit classification is a critical yet challenging task in agricultural computer vision, …

细粒度识别 mllm 集成学习计算机视觉水果识别

4

🤖 AI·大模型 arXiv 计算机视觉 2026-05-21

TextSculptor: Training and Benchmarking Scene Text Editing

场景文本编辑新框架TextSculptor，训练与基准测试双突破，AI文字处理再升级。

arXiv:2605.21090v1 Announce Type: new Abstract: Recent advances in Multimodal Large Language Models (MLLMs) and diffusion-based generative models have…

textsculpt 场景文字编辑基准测试深度学习计算机视觉

5

📝 深度技术 arXiv 计算机视觉 2026-05-20

InterLight: Leveraging Intrinsic Illumination Priors for Low-Light Image Enhancement

IJCAI 2026 接收，利用内在光照先验突破低光图像增强，效果显著。

arXiv:2605.19982v1 Announce Type: new Abstract: Low-Light Image Enhancement (LLIE) has long been a challenging problem in low-level vision, as insuffi…

低光图像增强内在光照先验计算机视觉 ijcai 2026 图像处理

6

📝 深度技术 arXiv 计算机视觉 2026-05-20

Intuitive Surgical SurgToolLoc and SurgVU Challenges Results: 2022-2025

Intuitive Surgical挑战赛2022-2025成果揭晓，聚焦手术工具定位与视频理解前沿进展

arXiv:2305.07152v4 Announce Type: replace Abstract: Robotic assisted (RA) surgery promises to transform surgical intervention. Intuitive Surgical is c…

intuitive 手术工具定位视频理解挑战赛医学ai

7

📝 深度技术 arXiv AI 2026-05-20

ICED: Concept-level Machine Unlearning via Interpretable Concept Decomposition

提出ICED方法，通过可解释概念分解实现视觉语言模型中的概念级遗忘，精准移除目标知识而不影响无关语义。

arXiv:2605.14309v1 Announce Type: cross Abstract: Machine unlearning in Vision-Language Models (VLMs) is typically performed at the image or instance …

机器遗忘概念级遗忘可解释概念分解视觉语言模型计算机视觉

8

🤖 AI·大模型 arXiv 机器学习 2026-05-20

PhyWorld: Physics-Faithful World Model for Video Generation

为视频生成注入物理规律，让模型理解重力、碰撞等真实性，推动世界模型迈向新高度

arXiv:2605.19242v1 Announce Type: cross Abstract: World simulators can provide safe and scalable environments for training Physical AI systems before …

物理保真世界模型视频生成 ai研究计算机视觉

9

📝 深度技术 arXiv 机器学习 2026-05-20

A More Word-like Image Tokenization for MLLMs

让图像分词更接近文本语义，提出新方法优化多模态大语言模型的融合效果。

arXiv:2605.17954v1 Announce Type: cross Abstract: Modern multimodal large language models (MLLMs) typically keep the language model fixed and train a …

多模态大语言模型图像分词 tokenizati 视觉语义对齐计算机视觉

10

📝 深度技术 arXiv 计算机视觉 2026-05-20

Fast 4D Mesh Generation by Spatio-Temporal Attention Chains

一种通过时空注意力链实现高速4D网格生成的新方法，大幅提升动态3D场景建模效率。

arXiv:2605.19786v1 Announce Type: new Abstract: 4D mesh generation has recently emerged as a powerful paradigm for recovering dynamic 3D structure fro…

4d网格生成时空注意力动态场景重建深度学习计算机图形学

11

📝 深度技术 arXiv 计算机视觉 2026-05-20

See Silhouettes in Motion with Neuromorphic Vision

新研究用神经形态视觉技术高效捕捉运动中的人体轮廓，提升动态识别精度与速度。

arXiv:2605.17984v1 Announce Type: cross Abstract: Quasi-bimodal objects, such as text, road signs, and barcodes, play a basic yet vital role in daily …

神经形态视觉动态轮廓运动识别计算机视觉 arxiv论文

12

📝 深度技术 arXiv 计算机视觉 2026-05-20

Leveraging Latent Visual Reasoning in Silence

论文揭示多模态推理中潜伏视觉令牌的非必需性，随机噪声替代不影响性能，挑战现有认知。

arXiv:2605.18641v1 Announce Type: new Abstract: Latent visual reasoning involves visual evidence more directly in multimodal reasoning by inserting co…

多模态推理潜在视觉推理计算机视觉大模型推理效率

13

📝 深度技术 arXiv 计算机视觉 2026-05-20

Training-Free Occluded Text Rendering via Glyph Priors and Attention-Guided Semantic Blending

用字形先验和注意力引导语义混合，无需训练即可高质量渲染图像中被遮挡的文字，极具创新性。

arXiv:2605.16810v1 Announce Type: new Abstract: We present a training-free framework for occluded text rendering with a pretrained FLUX.1-dev backbone…

训练无关遮挡文字渲染字形先验注意力引导语义混合计算机视觉

14

📝 深度技术 arXiv 计算机视觉 2026-05-20

When Vision Speaks for Sound

研究发现视频MLLMs的音频理解实际依赖视觉线索，揭示模型幻觉问题，挑战多模态真实性。

arXiv:2605.16403v1 Announce Type: new Abstract: Despite rapid progress in video-capable MLLMs, we find that their apparent audio understanding in vide…

多模态大语言模型音频理解视觉驱动多模态大模型视觉依赖

15

📝 深度技术 arXiv 机器学习 2026-05-20

Principal Component Analysis for Lunar Crater Detection

用主成分分析高效检测月球陨石坑，为天体图像识别领域带来新思路。

arXiv:2605.17125v1 Announce Type: cross Abstract: Optical navigation is a critical component for lunar orbiter and lander missions. Image-based crater…

主成分分析月球陨石坑检测计算机视觉模式识别天体图像分析

16

📝 深度技术 arXiv 机器学习 2026-05-20

Inducing Spatial Locality in Vision Transformers through the Training Protocol

通过修改训练协议而非架构，让Vision Transformer更好地利用空间局部性，提升性能与效率。

arXiv:2605.16390v1 Announce Type: cross Abstract: We investigate whether the training protocol can induce spatial locality in the early layers of a Vi…

vision tra 空间局部性训练协议计算机视觉深度学习

17

📝 深度技术 arXiv AI 2026-05-20

ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both

提出ATLAS方法，用单个词统一智能体和潜在视觉推理，突破中间状态计算瓶颈

arXiv:2605.15198v1 Announce Type: cross Abstract: Visual reasoning, often interleaved with intermediate visual states, has emerged as a promising dire…

atlas 视觉推理智能体推理潜在推理计算机视觉

18

📝 深度技术 arXiv AI 2026-05-20

VGGT-Edit: Feed-forward Native 3D Scene Editing with Residual Field Prediction

残差场预测实现前馈原生3D场景编辑，单次前向即可完成复杂编辑，无需逐场景优化。

arXiv:2605.15186v1 Announce Type: cross Abstract: High-quality 3D scene reconstruction has recently advanced toward generalizable feed-forward archite…

3d场景编辑残差场预测前馈架构计算机视觉场景重建

19

📝 深度技术 arXiv AI 2026-05-20

Quantitative Video World Model Evaluation for Geometric-Consistency

提出PDI-Bench框架，量化评估生成视频模型的几何一致性，攻克3D结构合理性难题。

arXiv:2605.15185v1 Announce Type: cross Abstract: Generative video models are increasingly studied as implicit world models, yet evaluating whether th…

视频世界模型几何一致性评估框架计算机视觉生成模型

20

📝 深度技术 arXiv AI 2026-05-20

EverAnimate: Minute-Scale Human Animation via Latent Flow Restoration

突破分钟级人类动画生成难题，通过潜在流恢复保持长时视觉与身份一致性

arXiv:2605.15042v1 Announce Type: cross Abstract: We propose EverAnimate, an efficient post-training method for long-horizon animated video generation…

人像动画视频生成潜在流恢复长时动画身份保持

🐂 牛哥精选

Millimeter-wave Imaging for Anthropometric Body Measurement

VChain: Chain-of-Visual-Thought for Reasoning in Video Generation

FruitEnsemble: MLLM-Guided Arbitration for Heterogeneous ensemble in Fine-Grained Fruit Recognition

TextSculptor: Training and Benchmarking Scene Text Editing

InterLight: Leveraging Intrinsic Illumination Priors for Low-Light Image Enhancement

Intuitive Surgical SurgToolLoc and SurgVU Challenges Results: 2022-2025

ICED: Concept-level Machine Unlearning via Interpretable Concept Decomposition

PhyWorld: Physics-Faithful World Model for Video Generation

A More Word-like Image Tokenization for MLLMs

Fast 4D Mesh Generation by Spatio-Temporal Attention Chains

See Silhouettes in Motion with Neuromorphic Vision

Leveraging Latent Visual Reasoning in Silence

Training-Free Occluded Text Rendering via Glyph Priors and Attention-Guided Semantic Blending

When Vision Speaks for Sound

Principal Component Analysis for Lunar Crater Detection

Inducing Spatial Locality in Vision Transformers through the Training Protocol

ATLAS: Agentic or Latent Visual Reasoning? One Word is Enough for Both

VGGT-Edit: Feed-forward Native 3D Scene Editing with Residual Field Prediction

Quantitative Video World Model Evaluation for Geometric-Consistency

EverAnimate: Minute-Scale Human Animation via Latent Flow Restoration

📅 日期