1
GAMMA: Global Bit Allocation for Mixed-Precision Models under Arbitrary Budgets
提出GAMMA方法,实现任意预算下大语言模型混合精度量化的全局比特自动分配,无需量化感知训练。
arXiv:2605.18475v1 Announce Type: new Abstract: Mixed-precision quantization improves the budget--accuracy trade-off for large language models (LLMs) …