牛哥精选 · 本周

📋 全部 🤖 AI·大模型 ⚡ 效率工具 📝 深度技术 🚀 产品观察 💰 商业科技 🔓 开源项目 🎨 设计创意 📖 阅读推荐 🏷 资源合集 🌱 成长效率

📝 深度技术 arXiv 机器学习 2026-05-20

MTraining: Distributed Dynamic Sparse Attention for Efficient Ultra-Long Context Training

新方法MTraining通过分布式动态稀疏注意力，大幅降低超长上下文训练的计算开销。

arXiv:2510.18830v2 Announce Type: replace-cross Abstract: The adoption of long context windows has become a standard feature in Large Language Models …

分布式训练稀疏注意力超长上下文大模型训练效率优化

📝 深度技术 arXiv 机器学习 2026-05-20

DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention

提出可微分自适应稀疏层次注意力机制，显著提升长序列建模效率与计算可扩展性

arXiv:2605.18753v1 Announce Type: cross Abstract: Current hierarchical attention methods, such as NSA and InfLLMv2, select the top-k relevant key-valu…

注意力机制稀疏注意力可微分自适应层次注意力 dashattent

📝 深度技术 arXiv NLP 2026-05-20

Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps

突破大模型长上下文推理瓶颈，百步内将全注意力高效转为稀疏，平衡效率与精度。

arXiv:2605.16928v1 Announce Type: new Abstract: Long-context inference in large language models is bottlenecked by the quadratic cost of full attentio…

大模型长上下文稀疏注意力训练效率推理优化

🤖 AI·大模型 arXiv 机器学习 2026-05-19

STS: Efficient Sparse Attention with Speculative Token Sparsity

提出无需重训练的稀疏注意力机制STS，通过推测性token稀疏性突破大模型长序列推理的算力和内存瓶颈。

arXiv:2605.15508v1 Announce Type: new Abstract: The quadratic complexity of attention imposes severe memory and computational bottlenecks on Large Lan…

稀疏注意力大模型推理推测性稀疏长序列效率优化

📅 日期

2026-05-20 2026-05-19

🐂 牛哥精选

MTraining: Distributed Dynamic Sparse Attention for Efficient Ultra-Long Context Training

DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention

Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps

STS: Efficient Sparse Attention with Speculative Token Sparsity

📅 日期