1
Omni-Embed-Audio: Leveraging Multimodal LLMs for Robust Audio-Text Retrieval
多模态大模型如何让音频文本检索更贴近真实搜索?这篇论文用Omni-Embed-Audio挑战传统CLAP的局限性。
arXiv:2604.18360v2 Announce Type: replace-cross Abstract: Audio-text retrieval systems based on Contrastive Language-Audio Pretraining (CLAP) achieve …