1
Reinforcement Learning for Diffusion LLMs with Entropy-Guided Step Selection and Stepwise Advantages
扩散语言模型遇上强化学习,熵引导步骤选择与逐步优势破局后训练难题。
arXiv:2603.12554v2 Announce Type: replace-cross Abstract: Reinforcement learning (RL) has been effective for post-training autoregressive (AR) languag…