1
Inverse Depth Scaling From Most Layers Being Similar
颠覆深度缩放传统认知:层间相似性导致“越深越差”的逆缩放现象,ICML 2026最新发现。
arXiv:2602.05970v2 Announce Type: replace Abstract: Neural scaling laws relate loss to model size in large language models (LLMs), yet depth and width…