1
Kernelized Advantage Estimation: From Nonparametric Statistics to LLM Reasoning
该论文提出Kernelized Advantage Estimation方法,从非参数统计视角优化LLM推理,为强化学习提供新思路。
arXiv:2604.28005v2 Announce Type: replace Abstract: Recent advances in large language models (LLMs) have increasingly relied on reinforcement learning…