1
GSQ: Highly-Accurate Low-Precision Scalar Quantization for LLMs via Gumbel-Softmax Sampling
提出GSQ方法,利用Gumbel-Softmax采样实现LLM的高精度低精度标量量化,突破现有量化瓶颈
arXiv:2604.18556v2 Announce Type: replace-cross Abstract: Quantization has become a standard tool for efficient LLM deployment, especially for local i…