1
Does Weight Decay Enhance Training Stability?
深度揭秘权重衰减对训练稳定性的真实作用,挑战传统正则化认知。
arXiv:2605.16622v1 Announce Type: new Abstract: In modern deep learning, weight decay is often credited with "stabilizing" training dynamics, divergin…
深度揭秘权重衰减对训练稳定性的真实作用,挑战传统正则化认知。
arXiv:2605.16622v1 Announce Type: new Abstract: In modern deep learning, weight decay is often credited with "stabilizing" training dynamics, divergin…
研究揭示大模型全FP4训练发散根源,通过控制实验逐步启用MXFP4量化,为高效低精度训练提供新思路。
arXiv:2605.09825v3 Announce Type: replace-cross Abstract: Why does full-pipeline FP4 training of large language models often diverge, even when forwar…