1
Grokking or Glitching? How Low-Precision Drives Slingshot Loss Spikes
揭秘低精度训练中“弹弓损失尖峰”的成因,区分了真正“顿悟”与异常“故障”,为数值优化提供新视角。
arXiv:2605.06152v3 Announce Type: replace Abstract: Deep neural networks exhibit periodic loss spikes during unregularized long-term training, a pheno…