1
SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM Inference on Superchips
面向Superchip的LLM推理优化,提出SLO感知的旋转调度与内存管理方案,已被MLSys '26接收。
arXiv:2601.20309v2 Announce Type: replace-cross Abstract: Large Language Model (LLM) serving faces a fundamental tension between stringent latency Ser…