牛哥精选 · 所有

📋 全部 ☁️ 云服务 🤖 AI 平台 🔗 API 中转 🔐 安全/认证 💳 支付 📧 通讯 📊 数据分析 🖼 媒体处理 🌐 域名/DNS

🤖 AI·大模型 arXiv 计算机视觉 2026-06-23

Do Modern Video-LLMs Need to Listen? A Benchmark Audit and Scalable Remedy

视频大语言模型是否真的需要音频？这篇被Interspeech 2026接收的研究给出了基准审计与可扩展修复方案。

arXiv:2509.17901v4 Announce Type: replace Abstract: Speech and audio encoders developed over years of community effort are routinely excluded from vid…

视频多模态大语言模型音频融合基准测试可扩展方案

🤖 AI·大模型 arXiv AI 2026-06-10

Spatial-Omni: Spatial Audio Understanding Integration in Multimodal LLMs via FOA Encoding

首次将空间音频理解融入多模态大语言模型，通过FOA编码实现突破性整合。

arXiv:2606.10738v1 Announce Type: cross Abstract: Recent multimodal large language models mainly process audio as monaural signals, thereby discarding…

空间音频多模态大模型 foa编码音频理解 arxiv论文

🤖 AI·大模型 arXiv AI 2026-06-10

AuRA: Internalizing Audio Understanding into LLMs as LoRA

用LoRA技术让大模型直接理解音频，高效内化跨模态能力，开源研究新突破。

arXiv:2606.11033v1 Announce Type: cross Abstract: Recent efforts to extend large language models (LLMs) to speech inputs typically rely on cascaded AS…

llm lora 音频理解多模态模型微调

🤖 AI·大模型 arXiv AI 2026-06-02

MOSS-Audio Technical Report

MOSS大模型音频能力技术报告，揭秘音频理解与生成新突破

arXiv:2606.01802v1 Announce Type: cross Abstract: MOSS-Audio is a unified audio-language model for speech, environmental sound, and music understandin…

moss 音频模型技术报告多模态音频理解

📝 深度技术 arXiv 计算机视觉 2026-05-20

When Vision Speaks for Sound

研究发现视频MLLMs的音频理解实际依赖视觉线索，揭示模型幻觉问题，挑战多模态真实性。

arXiv:2605.16403v1 Announce Type: new Abstract: Despite rapid progress in video-capable MLLMs, we find that their apparent audio understanding in vide…

多模态大语言模型音频理解视觉驱动多模态大模型视觉依赖

📅 日期

2026-05-20 2026-05-19