1
SpenseGPT: Practical One-shot Pruning Enabling Sparse and Dense GEMMs for LLM Inference
一种实用的一次性剪枝方法,同时支持稀疏与密集GEMM运算,显著降低LLM推理成本。
arXiv:2606.10445v1 Announce Type: new Abstract: Semi-structured 2:4 sparsity is widely supported by modern accelerators, providing up to a 2x theoreti…