1
Process Rewards with Learned Reliability
提出BetaPRM:不仅预测步骤级成功概率,还能告诉用户该预测的可靠性,改进推理反馈。
arXiv:2605.15529v1 Announce Type: cross Abstract: Process Reward Models (PRMs) provide step-level feedback for reasoning, but current PRMs usually out…
提出BetaPRM:不仅预测步骤级成功概率,还能告诉用户该预测的可靠性,改进推理反馈。
arXiv:2605.15529v1 Announce Type: cross Abstract: Process Reward Models (PRMs) provide step-level feedback for reasoning, but current PRMs usually out…