On Robustness and Chain-of-Thought Consistency of RL-Finetuned VLMs
首份系统研究RL微调VLM的鲁棒性与思维链一致性,揭示模型脆弱性根源
arXiv:2602.12506v3 Announce Type: replace Abstract: Reinforcement learning (RL) finetuning has become a key technique for enhancing large language mod…