1
Calibrating LLMs with Semantic-level Reward
用语义级奖励替代二元反馈,让LLM学会表达真实不确定性,提升高安全场景下的可靠性。
arXiv:2605.15588v1 Announce Type: cross Abstract: As large language models (LLMs) are deployed in consequential settings such as medical question answ…
用语义级奖励替代二元反馈,让LLM学会表达真实不确定性,提升高安全场景下的可靠性。
arXiv:2605.15588v1 Announce Type: cross Abstract: As large language models (LLMs) are deployed in consequential settings such as medical question answ…