1
Krause Synchronization Transformers
提出Krause Attention机制,解决Transformer中全局softmax导致的同步动态与表示崩塌问题
arXiv:2602.11534v3 Announce Type: replace-cross Abstract: Self-attention in Transformers relies on globally normalized softmax weights, causing all to…