1
Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty
突破二元奖励局限,让语言模型在推理中学会表达自身不确定性,提升可解释性和可靠性。
arXiv:2507.16806v2 Announce Type: replace-cross Abstract: When language models (LMs) are trained via reinforcement learning (RL) to generate natural l…