1
NanoQuant: Efficient Sub-1-Bit Quantization of Large Language Models
提出子1比特量化方法,大幅降低大语言模型存储与计算开销,兼顾效率与性能。
arXiv:2602.06694v2 Announce Type: replace Abstract: Weight-only quantization has become a standard approach for efficiently serving large language mod…