1
ARES: Automated Rubric Synthesis for Scalable LLM Reinforcement Learning
自动生成评估准则,为大规模强化学习训练大模型提供可扩展方案。
arXiv:2605.23454v1 Announce Type: new Abstract: Rubric-based rewards offer a promising way to extend reinforcement learning (RL) for large language mo…