Millimeter-wave Imaging for Anthropometric Body Measurement
毫米波技术赋能人体测量,非接触高精度成像新方案,arXiv最新研究突破。
arXiv:2605.23064v1 Announce Type: cross Abstract: Body shape and circumferences are clinically informative biomarkers for risk stratification, includi…
毫米波技术赋能人体测量,非接触高精度成像新方案,arXiv最新研究突破。
arXiv:2605.23064v1 Announce Type: cross Abstract: Body shape and circumferences are clinically informative biomarkers for risk stratification, includi…
视频生成新范式,VChain用视觉思维链提升推理能力,获ICCV 2025 Workshop最佳论文奖,值得关注。
arXiv:2510.05094v2 Announce Type: replace Abstract: Recent video generation models can produce smooth and visually appealing clips, but they often str…
将多模态大语言模型引入细粒度水果识别,通过仲裁机制实现异构集成,大幅提升准确率。
arXiv:2605.20892v1 Announce Type: new Abstract: Fine-grained fruit classification is a critical yet challenging task in agricultural computer vision, …
场景文本编辑新框架TextSculptor,训练与基准测试双突破,AI文字处理再升级。
arXiv:2605.21090v1 Announce Type: new Abstract: Recent advances in Multimodal Large Language Models (MLLMs) and diffusion-based generative models have…
IJCAI 2026 接收,利用内在光照先验突破低光图像增强,效果显著。
arXiv:2605.19982v1 Announce Type: new Abstract: Low-Light Image Enhancement (LLIE) has long been a challenging problem in low-level vision, as insuffi…
Intuitive Surgical挑战赛2022-2025成果揭晓,聚焦手术工具定位与视频理解前沿进展
arXiv:2305.07152v4 Announce Type: replace Abstract: Robotic assisted (RA) surgery promises to transform surgical intervention. Intuitive Surgical is c…
提出ICED方法,通过可解释概念分解实现视觉语言模型中的概念级遗忘,精准移除目标知识而不影响无关语义。
arXiv:2605.14309v1 Announce Type: cross Abstract: Machine unlearning in Vision-Language Models (VLMs) is typically performed at the image or instance …
为视频生成注入物理规律,让模型理解重力、碰撞等真实性,推动世界模型迈向新高度
arXiv:2605.19242v1 Announce Type: cross Abstract: World simulators can provide safe and scalable environments for training Physical AI systems before …
让图像分词更接近文本语义,提出新方法优化多模态大语言模型的融合效果。
arXiv:2605.17954v1 Announce Type: cross Abstract: Modern multimodal large language models (MLLMs) typically keep the language model fixed and train a …
一种通过时空注意力链实现高速4D网格生成的新方法,大幅提升动态3D场景建模效率。
arXiv:2605.19786v1 Announce Type: new Abstract: 4D mesh generation has recently emerged as a powerful paradigm for recovering dynamic 3D structure fro…
新研究用神经形态视觉技术高效捕捉运动中的人体轮廓,提升动态识别精度与速度。
arXiv:2605.17984v1 Announce Type: cross Abstract: Quasi-bimodal objects, such as text, road signs, and barcodes, play a basic yet vital role in daily …
论文揭示多模态推理中潜伏视觉令牌的非必需性,随机噪声替代不影响性能,挑战现有认知。
arXiv:2605.18641v1 Announce Type: new Abstract: Latent visual reasoning involves visual evidence more directly in multimodal reasoning by inserting co…
用字形先验和注意力引导语义混合,无需训练即可高质量渲染图像中被遮挡的文字,极具创新性。
arXiv:2605.16810v1 Announce Type: new Abstract: We present a training-free framework for occluded text rendering with a pretrained FLUX.1-dev backbone…
研究发现视频MLLMs的音频理解实际依赖视觉线索,揭示模型幻觉问题,挑战多模态真实性。
arXiv:2605.16403v1 Announce Type: new Abstract: Despite rapid progress in video-capable MLLMs, we find that their apparent audio understanding in vide…
用主成分分析高效检测月球陨石坑,为天体图像识别领域带来新思路。
arXiv:2605.17125v1 Announce Type: cross Abstract: Optical navigation is a critical component for lunar orbiter and lander missions. Image-based crater…
通过修改训练协议而非架构,让Vision Transformer更好地利用空间局部性,提升性能与效率。
arXiv:2605.16390v1 Announce Type: cross Abstract: We investigate whether the training protocol can induce spatial locality in the early layers of a Vi…
提出ATLAS方法,用单个词统一智能体和潜在视觉推理,突破中间状态计算瓶颈
arXiv:2605.15198v1 Announce Type: cross Abstract: Visual reasoning, often interleaved with intermediate visual states, has emerged as a promising dire…
残差场预测实现前馈原生3D场景编辑,单次前向即可完成复杂编辑,无需逐场景优化。
arXiv:2605.15186v1 Announce Type: cross Abstract: High-quality 3D scene reconstruction has recently advanced toward generalizable feed-forward archite…
提出PDI-Bench框架,量化评估生成视频模型的几何一致性,攻克3D结构合理性难题。
arXiv:2605.15185v1 Announce Type: cross Abstract: Generative video models are increasingly studied as implicit world models, yet evaluating whether th…
突破分钟级人类动画生成难题,通过潜在流恢复保持长时视觉与身份一致性
arXiv:2605.15042v1 Announce Type: cross Abstract: We propose EverAnimate, an efficient post-training method for long-horizon animated video generation…