牛哥精选 · 三个月

1

🤖 AI·大模型 arXiv 机器学习 2026-07-10

KronQ: LLM Quantization via Kronecker-Factored Hessian

用Kronecker分解海森矩阵，高效提升LLM量化精度，突破后训练量化瓶颈。

arXiv:2607.07964v1 Announce Type: new Abstract: Post-training quantization (PTQ) is a widely adopted technique for compressing large language models (…

llm量化 kronecker分海森矩阵模型压缩后训练量化

2

📝 深度技术 arXiv AI 2026-06-12

TWLA: Achieving Ternary Weights and Low-Bit Activations for LLMs via Post-Training Quantization

提出TWLA方法，通过后训练量化实现大模型的三值权重和低位激活，已被ICML 2026接收，性能与效率兼顾。

arXiv:2606.13054v1 Announce Type: cross Abstract: Large language models (LLMs) exhibit exceptional general language processing capabilities, but their…

twla 后训练量化三值权重低位激活大模型

3

🤖 AI·大模型 arXiv NLP 2026-06-10

UniSVQ: 2-bit Unified Scalar-Vector Quantization

突破2-bit量化瓶颈，统一标量与向量量化方法，实现大模型低成本部署与推理加速。

arXiv:2606.10520v1 Announce Type: new Abstract: Post-training quantization at the 2-bit level enables low-cost deployment and inference acceleration f…

2-bit量化标量量化向量量化后训练量化大模型部署

4

📝 深度技术 arXiv 机器学习 2026-06-09

ScaleSweep: Accurate NVFP4 Post-Training Quantization of LLMs via Block Scale Initialization

提出基于块尺度初始化的NVFP4后训练量化方法，有效提升大语言模型低比特精度。

arXiv:2606.07618v1 Announce Type: new Abstract: NVFP4 is a recently introduced hardware-supported FP4 format that improves the fidelity of 4-bit quant…

llm量化 nvfp4 后训练量化块尺度初始化大模型压缩

5

📝 深度技术 arXiv AI 2026-06-08

FAIR-Calib: Frontier-Aware Instability-Reweighted Calibration for Post-Training Quantization of Diffusion Large Language Models

针对扩散大语言模型量化中的“稳定性滞后”问题，提出前沿感知重加权校准方法，有效抑制量化误差导致的早期决策翻转。

arXiv:2606.06547v1 Announce Type: cross Abstract: Diffusion Large Language Models (dLLMs) refine tokens iteratively but commit them irreversibly, lead…

扩散大模型量化校准 ptq 稳定性滞后写前沿

6

📝 深度技术 arXiv 机器学习 2026-06-08

AAAC: Activation-Aware Adaptive Codebooks for 4-bit LLM Weight Quantization

提出AAAC方法，通过激活感知自适应码本，在保持4比特精度的同时进一步降低LLM权重量化误差

arXiv:2605.08692v2 Announce Type: replace Abstract: Post-training weight-only quantization to 4 bits is widely used to reduce the memory and compute c…

llm量化 4比特权重量化激活感知自适应码本后训练量化

7

🤖 AI·大模型 arXiv 机器学习 2026-06-05

STaR-Quant: State-Time Consistent Post-Training Quantization for Diffusion Large Language Models

提出状态-时间一致后训练量化方案，突破扩散大语言模型部署瓶颈，降低模型推理成本。

arXiv:2606.04945v1 Announce Type: new Abstract: Diffusion large language models (DLLMs) have recently emerged as a promising alternative to autoregres…

扩散大语言模型量化后训练量化模型压缩状态-时间一致

8

📝 深度技术 arXiv AI 2026-06-04

MorphoQuant: Modality-Aware Quantization for Omni-modal Large Language Models

针对全模态大语言模型4比特量化时模态分布异质难题，MorphoQuant提出模态感知量化框架，有效压缩模型同时保持性能。

arXiv:2606.04349v1 Announce Type: cross Abstract: Conventional Post-Training Quantization (PTQ) methods struggle with 4-bit Omni-modal Large Language …

模态感知量化全模态大语言模型后训练量化分布异质性离群模式

9

🤖 AI·大模型 arXiv 机器学习 2026-06-02

ProjQ: Project-and-Quantize for Adapter-Aware LLM Compression

新方法ProjQ让大模型压缩更聪明：后训练量化与低秩适配协同去噪，缓解顺序部署的精度损失。

arXiv:2606.00494v1 Announce Type: new Abstract: Post-Training Quantization (PTQ) and Low-Rank Adaptation (LoRA) constitute the standard pipeline for e…

projq llm压缩后训练量化 lora适配模型部署

10

📝 深度技术 arXiv 机器学习 2026-05-20

LoopQ: Quantization for Recursive Transformers

循环语言模型量化面临三大挑战，首次系统性研究揭秘其脆弱性根源

arXiv:2605.16343v1 Announce Type: new Abstract: Looped language models (LoopLMs) improve parameter efficiency by recursively reusing Transformer block…

量化递归变压器循环语言模型后训练量化递归transfor

11

📝 深度技术 arXiv 机器学习 2026-05-19

Rethinking Output Alignment For 1-bit Post-Training Quantization of Large Language Models

1-bit量化大模型新思路，输出对齐策略再审视，助力低资源设备高效推理

arXiv:2512.21651v3 Announce Type: replace Abstract: Large Language Models (LLMs) deliver strong performance across a wide range of NLP tasks, but thei…

llm 1-bit量化后训练量化输出对齐模型压缩

🐂 牛哥精选