PGT: Procedurally Generated Tasks for improving visual grounding in MLLMs
用程序化生成任务打破MLLM细粒度理解瓶颈,PGT框架一举两得。
arXiv:2605.23883v1 Announce Type: cross Abstract: Despite remarkable progress in Multimodal Large Language Models (MLLMs), these models still struggle…