1
Pessimistic Risk-Aware Policy Learning in Contextual Bandits
离线数据下的风险感知策略学习新框架,用悲观原则优化高风险场景的决策效果
arXiv:2605.15620v1 Announce Type: cross Abstract: We study risk-aware offline policy learning, aiming to learn a decision rule from logged data that i…