1
Lever: Speculative LLM Inference on Smartphones
投机解码将小模型驻留DRAM,突破手机LLM推理的内存瓶颈,实现高效闪存大模型推理
arXiv:2605.16786v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly needed for interactive mobile applications, but high-qua…