1
Dynamic Linear Attention
揭秘新型注意力机制:动态线性注意力,同时提升效率与精度,ICML 2026录用论文。
arXiv:2606.10650v1 Announce Type: cross Abstract: The scalability of Large Language Models (LLMs) to long contexts is fundamentally constrained by the…
揭秘新型注意力机制:动态线性注意力,同时提升效率与精度,ICML 2026录用论文。
arXiv:2606.10650v1 Announce Type: cross Abstract: The scalability of Large Language Models (LLMs) to long contexts is fundamentally constrained by the…
利用高效向量量化架构大幅加速LLM解码,已获顶级会议ISCA 2026接收,为推理提速带来新思路。
arXiv:2605.24144v1 Announce Type: cross Abstract: Large Language Models (LLMs) have achieved impressive performance across diverse domains but remain …
一种新型动态路由方法通过二进制专家激活掩码减少MoE冗余计算,无需重训即可加速推理。
arXiv:2605.14438v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) architectures enhance the efficiency of large language models by activating o…