1
RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition
提出检索与排序增强多模态大模型,精准攻克CLIP细粒度视觉识别短板
arXiv:2403.13805v2 Announce Type: replace-cross Abstract: CLIP (Contrastive Language-Image Pre-training) uses contrastive learning from noise image-te…