1
Pretraining large language models with MXFP4 on Native FP4 Hardware
研究揭示大模型全FP4训练发散根源,通过控制实验逐步启用MXFP4量化,为高效低精度训练提供新思路。
arXiv:2605.09825v3 Announce Type: replace-cross Abstract: Why does full-pipeline FP4 training of large language models often diverge, even when forwar…