1
$f$-Trajectory Balance: A Loss Family for Tuning GFlowNets, Generative Models, and LLMs with Off- and On-Policy Data
提出f-轨迹平衡损失族,统一了GFlowNets和LLM的on/off-policy训练,梯度对应KL散度,低方差高效。
arXiv:2605.15417v1 Announce Type: cross Abstract: In GFlowNets and variational inference, it has been shown that the mean square error between target …