1
TokenButler: Token Importance is Predictable
研究发现大型语言模型中token重要性可预测,为KV缓存优化提供新思路,显著降低内存与计算瓶颈
arXiv:2503.07518v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) rely on the Key-Value (KV) Cache to store token history, enabli…