1
Weak-to-Strong Elicitation via Mismatched Wrong Drafts
通过注入弱模型的错误草稿,激发强模型在on-policy RL中难以达到的能力,研究视角独特
arXiv:2605.17314v1 Announce Type: cross Abstract: We consider whether off-policy experience from a smaller, weaker model can elicit capability in a st…