1
AMARIS: A Memory-Augmented Rubric Improvement System for Rubric-Based Reinforcement Learning
提出记忆增强的评分标准改进系统,提升基于评分标准的强化学习效果。
arXiv:2605.18592v1 Announce Type: new Abstract: Rubric-based reward shaping is an effective method for fine-tuning LLMs via RL, where structured rubri…