Memory-Efficient LLM Pretraining via Minimalist Optimizer Design
提出极简优化器设计,大幅降低大模型预训练内存占用,已被ICML 2026接收。
arXiv:2506.16659v3 Announce Type: replace-cross Abstract: Training large language models (LLMs) relies on adaptive optimizers such as Adam, which intr…