1
COOPO: Cyclic Offline-Online Policy Optimization Algorithm
COOPO算法提出循环离线-在线策略优化,巧妙解决分布偏移与灾难性遗忘难题,为强化学习混合范式带来新突破。
arXiv:2605.18675v1 Announce Type: new Abstract: Offline reinforcement learning struggles with distributional shift and constrained performance due to …