1
DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention
提出可微分自适应稀疏层次注意力机制,显著提升长序列建模效率与计算可扩展性
arXiv:2605.18753v1 Announce Type: cross Abstract: Current hierarchical attention methods, such as NSA and InfLLMv2, select the top-k relevant key-valu…