1
From SGD to Muon: Adaptive Optimization via Schatten-p Norms
深入解读从SGD到Muon的优化器演进,以Schatten-p范数统一矩阵几何约束,为AI研究者提供理论新视角
arXiv:2605.19781v1 Announce Type: new Abstract: Modern optimizers, like Muon, impose matrix-wise geometry constraints on their updates. These matrix-w…