1
Vortex: Efficient and Programmable Sparse Attention Serving for AI Agents
针对AI代理的高效可编程稀疏注意力服务新框架,大幅降低计算成本并保持灵活性。
arXiv:2606.06453v1 Announce Type: new Abstract: Sparse attention is becoming increasingly important for serving large language models (LLMs) as genera…