1
Off-the-Shelf LLMs as Process Scorers: Training-Free Alternative to PRMs for Mathematical Reasoning
无需额外训练,用现成大模型就能给数学推理过程打分,性能媲美专用过程奖励模型。
arXiv:2606.01682v1 Announce Type: cross Abstract: Selecting the best response from multiple small-model samples using a stronger scorer is a simple in…