1
Qift: Shift-Friendly No-Zero W2 Post-Training Quantization for Rotated W2A4/KV4 LLM Inference
新方法Qift实现旋转后W2A4/KV4的LLM无零值训练后量化,兼顾移位友好与推理效率。
arXiv:2606.02823v1 Announce Type: new Abstract: Two-bit weight quantization is attractive for memory-efficient LLM inference, but the standard W2 leve…