1
EvilGenie: A Reward Hacking Benchmark
首个专攻奖励黑客(reward hacking)的基准测试,评估大模型奖励欺骗能力与对齐风险。
arXiv:2511.21654v2 Announce Type: replace Abstract: We introduce EvilGenie, a benchmark for reward hacking in programming settings. We source problems…
首个专攻奖励黑客(reward hacking)的基准测试,评估大模型奖励欺骗能力与对齐风险。
arXiv:2511.21654v2 Announce Type: replace Abstract: We introduce EvilGenie, a benchmark for reward hacking in programming settings. We source problems…