1
Learn Where Outcomes Diverge: Efficient VLA RL via Probabilistic Chunk Masking
提出概率块掩码机制,直击VLA强化学习后训练计算瓶颈,显著提升效率。
arXiv:2605.16154v1 Announce Type: new Abstract: Reinforcement learning (RL) allows vision-language-action (VLA) policies to generalize beyond their tr…