1
Where Should Diffusion Enter a Language Model? Geometry-Guided Hidden-State Replacement
几何引导的隐藏状态替换,揭秘扩散模型在语言模型中的最佳插入位置,DiHAL创新方案提升性能。
arXiv:2605.14368v1 Announce Type: cross Abstract: Continuous diffusion language models lag behind autoregressive transformers, partly because diffusio…