PALS: Power-Aware LLM Serving for Mixture-of-Experts Models
MoE模型功耗优化新方案,将GPU功率从静态约束变为可控资源,提升能效
arXiv:2605.21427v1 Announce Type: new Abstract: Large language model (LLM) inference has become a dominant workload in modern data centers, driving si…
MoE模型功耗优化新方案,将GPU功率从静态约束变为可控资源,提升能效
arXiv:2605.21427v1 Announce Type: new Abstract: Large language model (LLM) inference has become a dominant workload in modern data centers, driving si…
评估11款专有模型,揭示何时小模型更优,兼顾可持续性与成本效益
arXiv:2504.13217v3 Announce Type: replace Abstract: Large language models (LLMs) have become increasingly embedded in organizational workflows. This h…
实测对比:本地M5 MacBook Pro运行LLM每百万token成本约$1.5,而OpenRouter同类模型价格仅1/3且速度翻倍,揭示本地推理的真实经济账。
Article URL: https://www.williamangel.net/blog/2026/05/17/offline-llm-energy-use.html Comments URL: https://news.ycombinator.com/item?id=48168198 Poin…
无需新硬件,软件优化就能大幅降低AI能耗,绿色计算新思路来了。
Article URL: https://thenewstack.io/streaming-ai-energy-efficiency/ Comments URL: https://news.ycombinator.com/item?id=48161187 Points: 1 # Comments: …