1
OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization
提出OSCAR方法,利用离线谱协方差感知旋转实现2-bit KV缓存量化,显著降低大模型推理显存占用。
arXiv:2605.17757v1 Announce Type: new Abstract: INT2 KV-cache quantization is attractive for long-context LLM serving, but it remains difficult to mak…