1
Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation
多模态大模型新突破,通过自蒸馏策略让AI学会捕捉视觉细节,显著提升细粒度理解能力。
arXiv:2605.18740v1 Announce Type: cross Abstract: Multimodal Large Language Models (MLLMs) still struggle with fine-grained visual understanding, wher…