1
Quant.npu: Enabling Efficient Mobile NPU Inference for on-device LLMs via Fully Static Quantization
完全静态量化方案让移动NPU高效运行大语言模型,精度媲美最先进方法,推理延迟显著降低。
arXiv:2605.20295v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed on mobile devices, where Neural Processing Unit…