1
Vision Inference Former: Sustaining Visual Consistency in Multimodal Large Language Models
提出Vision Inference Former方法,解决多模态大模型视觉一致性难题,为视觉-语言融合提供新范式。
arXiv:2605.18160v1 Announce Type: new Abstract: In recent years, multimodal large language models (MLLMs) have achieved remarkable progress, primarily…