1
AdaGC: Enhancing LLM Pretraining Stability via Adaptive Gradient Clipping
被ICML接收的自适应梯度裁剪方法,有效提升LLM预训练稳定性,AI训练优化的新突破
arXiv:2502.11034v3 Announce Type: replace Abstract: Loss spikes remain a persistent obstacle in large-scale language model pretraining. While previous…