1
One LR Doesn't Fit All: Heavy-Tail Guided Layerwise Learning Rates for LLMs
突破传统统一学习率,重尾分布指导LLM逐层自适应学习,大幅提升训练效率与模型性能。
arXiv:2605.22297v1 Announce Type: cross Abstract: Learning rate configuration is a fundamental aspect of modern deep learning. The prevailing practice…