1
Meta-Soft: Leveraging Composable Meta-Tokens for Context-Preserving KV Cache Compression
用可组合的元标记压缩KV缓存,高效保留上下文信息,大模型推理再提速。
arXiv:2605.22337v1 Announce Type: new Abstract: The KV cache used in large language models has linearly growing time complexity, so LLMs face memory b…