1
STS: Efficient Sparse Attention with Speculative Token Sparsity
提出无需重训练的稀疏注意力机制STS,通过推测性token稀疏性突破大模型长序列推理的算力和内存瓶颈。
arXiv:2605.15508v1 Announce Type: new Abstract: The quadratic complexity of attention imposes severe memory and computational bottlenecks on Large Lan…