1
A More Word-like Image Tokenization for MLLMs
让图像分词更接近文本语义,提出新方法优化多模态大语言模型的融合效果。
arXiv:2605.17954v1 Announce Type: cross Abstract: Modern multimodal large language models (MLLMs) typically keep the language model fixed and train a …
让图像分词更接近文本语义,提出新方法优化多模态大语言模型的融合效果。
arXiv:2605.17954v1 Announce Type: cross Abstract: Modern multimodal large language models (MLLMs) typically keep the language model fixed and train a …
将GUI批评从二元判断重构为连续语义对齐,提升智能体测试时扩展的排序能力
arXiv:2605.14311v1 Announce Type: cross Abstract: Test-Time Scaling (TTS), which samples multiple candidate actions and ranks them via a Critic Model,…