1
TORQ: Two-Level Orthogonal Rotation for MXFP4 Quantization
MXFP4量化新突破:两级正交旋转显著提升大模型低比特推理精度,部署友好。
arXiv:2605.19561v1 Announce Type: new Abstract: As Large Language Models (LLMs) advance toward practical deployment, the Microscaling FP4 (MXFP4) form…