1
Joint Structural Pruning and Mixed-Precision Quantization for LLM Compression
联合结构化剪枝与混合精度量化,为LLM压缩提供高效新方案,兼顾精度与速度。
arXiv:2606.07819v1 Announce Type: cross Abstract: Recently, the efficiency of Large Language Models (LLMs) deployment has become a critical concern in…