1
CodeScaler: Scaling Code LLM Training and Test-Time Inference via Reward Models
用奖励模型突破测试用例限制,实现代码大模型训练与推理阶段的可扩展强化学习。
arXiv:2602.17684v2 Announce Type: replace Abstract: Reinforcement Learning from Verifiable Rewards (RLVR) has driven recent progress in code large lan…