牛哥精选 · 本月

📋 全部 🤖 AI·大模型 ⚡ 效率工具 📝 深度技术 🚀 产品观察 💰 商业科技 🔓 开源项目 🎨 设计创意 📖 阅读推荐 🏷 资源合集 🌱 成长效率

📝 深度技术 arXiv 机器学习 2026-06-08

Scalable Joint Resource Allocation for SLO-Constrained LLM Inference in Heterogeneous GPU Clouds

面向异构GPU云的可扩展联合资源分配方案，高效保障LLM推理的SLO约束

arXiv:2604.07472v2 Announce Type: replace Abstract: Serving large language model (LLM) inference in cloud environments requires jointly optimizing mod…

llm推理资源分配异构gpu slo约束云计算

🤖 AI·大模型 arXiv 机器学习 2026-06-02

ATLAS: Agentic Test-time Learning-to-Allocate Scaling

让大模型自主决定测试时计算如何分配，突破固定预算与策略的局限。

arXiv:2606.01667v1 Announce Type: new Abstract: Test-time scaling has become a major way to improve large language model reasoning, but its orchestrat…

agentic test-time llm推理计算资源分配学习分配

📅 日期

2026-05-20 2026-05-19

🐂 牛哥精选

Scalable Joint Resource Allocation for SLO-Constrained LLM Inference in Heterogeneous GPU Clouds

ATLAS: Agentic Test-time Learning-to-Allocate Scaling

📅 日期