1
SMART: Shot-Aware Multimodal Video Moment Retrieval with Audio-Enhanced MLLM
将视频中的音频线索与镜头结构结合,用增强多模态大模型精准定位目标时刻,提升检索深度与准确度。
arXiv:2511.14143v2 Announce Type: replace Abstract: Video Moment Retrieval is a task in video understanding that aims to localize a specific temporal …