1
Vector Policy Optimization: Training for Diversity Improves Test-Time Search
策略优化新范式:训练阶段注入多样性,显著提升测试时搜索性能
arXiv:2605.22817v1 Announce Type: cross Abstract: Language models must now generalize out of the box to novel environments and work inside inference-s…