1
DISA: Offline Importance Sampling for Distribution-Matching LLM-RL
用离线重要性采样实现分布匹配,为LLM强化学习提供新解法,论文技术细节扎实。
arXiv:2605.17295v1 Announce Type: new Abstract: Modern reasoning agents are increasingly evaluated on their ability to generate multiple valid solutio…