Your CLIP has 164 dimensions of noise: Exploring the embeddings covariance eigenspectrum of contrastively pretrained vision-language transformers
发现CLIP模型嵌入空间存在164个噪声维度,协方差特征谱揭示视觉语言模型的结构异常
arXiv:2605.14893v1 Announce Type: cross Abstract: Contrastively pre-trained Vision-Language Models (VLMs) serve as powerful feature extractors. Yet, t…