1
CoLLM: Continuous Adaptation for SLO-Aware LLM Serving on Shared GPU Clusters
面向共享GPU集群,提出连续自适应方法优化大模型服务SLO,降低延迟与成本
arXiv:2604.16400v2 Announce Type: replace-cross Abstract: As Large Language Models (LLMs) are increasingly adopted in edge intelligence to power domai…