牛哥精选 · 本月

📋 全部 🤖 AI·大模型 ⚡ 效率工具 📝 深度技术 🚀 产品观察 💰 商业科技 🔓 开源项目 🎨 设计创意 📖 阅读推荐 🏷 资源合集 🌱 成长效率

📝 深度技术 arXiv 机器学习 2026-05-20

RAP: Runtime Adaptive Pruning for LLM Inference

提出运行时自适应剪枝方法，让LLM推理内存动态调整，效率大增

arXiv:2505.17138v5 Announce Type: replace Abstract: Large language models (LLMs) excel at language understanding and generation, but their enormous co…

llm推理自适应剪枝运行时优化内存约束模型压缩

🤖 AI·大模型 arXiv AI 2026-05-19

Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints

提出流体引导的在线调度方法，在内存约束下优化LLM推理，显著降低延迟与运营成本

arXiv:2504.11320v3 Announce Type: replace-cross Abstract: Large language models now serve millions of users daily, with providers incurring costs exce…

llm推理 kv缓存调度优化内存约束 llm推理优化

📅 日期

2026-05-20 2026-05-19

🐂 牛哥精选

RAP: Runtime Adaptive Pruning for LLM Inference

Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints

📅 日期