1
Rethinking RL Evaluation: Can Benchmarks Truly Reveal Failures of RL Methods?
反思强化学习评估体系,质疑基准测试能否真正暴露RL方法失败,引发对评价标准的深刻思考。
arXiv:2510.10541v2 Announce Type: replace Abstract: Current benchmarks are inadequate for evaluating progress in reinforcement learning (RL) for large…