1
Eliminating Inductive Bias in Reward Models with Information-Theoretic Guidance
ICLR 2026 顶会论文:用信息论指导消除奖励模型中的归纳偏置,为强化学习对齐提供更客观的评估基础
arXiv:2512.23461v2 Announce Type: replace Abstract: Reward models (RMs) are essential in reinforcement learning from human feedback (RLHF) to align la…