1
Self-Improving Tabular Language Models via Iterative Reward-Guided Post-Training
用迭代奖励引导后训练,让表格语言模型也能自我进化、持续提升性能。
arXiv:2604.18966v2 Announce Type: replace Abstract: Tabular language models can generate synthetic tables by modeling rows as token sequences, but the…