1
Sparsity Moves Computation: How FFN Architecture Reshapes Attention in Small Transformers
揭示FFN架构稀疏性如何重塑注意力计算,影响小型Transformer模型学习机制。
arXiv:2605.09403v2 Announce Type: replace-cross Abstract: Architectural choices inside the Transformer feedforward network (FFN) block do not merely a…