1
Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling
结合有监督与强化微调的创新方法,通过前缀采样平衡模仿学习与探索,提升LLM后训练效果。
arXiv:2507.01679v3 Announce Type: replace-cross Abstract: Existing LLMs-post-training techniques are broadly categorized into supervised fine-tuning (…