牛哥精选 · 本月

📋 全部 🤖 AI·大模型 ⚡ 效率工具 📝 深度技术 🚀 产品观察 💰 商业科技 🔓 开源项目 🎨 设计创意 📖 阅读推荐 🏷 资源合集 🌱 成长效率

📝 深度技术 arXiv 机器学习 2026-05-21

A Free Lunch in LLM Compression: Revisiting Retraining after Pruning

重新审视大模型剪枝后微调的必要性，挑战复杂剪枝标准，提出更高效的压缩策略。

arXiv:2510.14444v3 Announce Type: replace Abstract: Post-training pruning can substantially reduce LLM inference costs, but it often degrades quality …

llm压缩剪枝重训练模型优化推理成本

🤖 AI·大模型 arXiv 机器学习 2026-05-20

ProxyKV: Cross-Model Proxy Pruning for Efficient Long-Context LLM Inference

跨模型代理剪枝巧妙兼顾低延迟与高精度，解决长上下文LLM推理中KV缓存内存墙难题

arXiv:2605.16360v1 Announce Type: new Abstract: Efficient long-context inference in Large Language Models (LLMs) is severely constrained by the Key-Va…

kv缓存剪枝长上下文 llm推理代理模型

📝 深度技术 arXiv 机器学习 2026-05-20

SlimQwen: Exploring the Pruning and Distillation in Large MoE Model Pre-training

揭秘MoE大模型预训练中剪枝与蒸馏技术，SlimQwen优化效率与性能。

arXiv:2605.08738v2 Announce Type: replace Abstract: Structured pruning and knowledge distillation (KD) are typical techniques for compressing large la…

moe 剪枝蒸馏预训练大模型

📝 深度技术 arXiv 机器学习 2026-05-20

LEAP: Learnable End-to-End Adaptive Pruning of Large Language Models

提出LEAP可学习端到端自适应剪枝方法，在保持大语言模型性能的同时实现高效压缩

arXiv:2605.17289v1 Announce Type: new Abstract: Unstructured sparsity is now natively accelerated by recent GPU kernels and dataflow hardware, shiftin…

leap 大语言模型自适应剪枝端到端可学习

📝 深度技术 arXiv 机器学习 2026-05-20

RAP: Runtime Adaptive Pruning for LLM Inference

提出运行时自适应剪枝方法，让LLM推理内存动态调整，效率大增

arXiv:2505.17138v5 Announce Type: replace Abstract: Large language models (LLMs) excel at language understanding and generation, but their enormous co…

llm推理自适应剪枝运行时优化内存约束模型压缩

📝 深度技术 arXiv AI 2026-05-20

TAPIOCA: Why Task- Aware Pruning Improves OOD model Capability

探究任务感知剪枝如何提升模型在分布外数据上的表现，揭示内在机制

arXiv:2605.14738v1 Announce Type: cross Abstract: Recent work has promoted task-aware layer pruning as a way to improve model performance on particula…

任务感知剪枝 ood泛化多项式回归大语言模型模型压缩

📝 深度技术 arXiv 机器学习 2026-05-20

Prune, Update and Trim: Robust Structured Pruning for Large Language Models

提出新型结构化剪枝方法，实现大模型高效压缩同时保持鲁棒性，适合模型优化研究者

arXiv:2605.18331v1 Announce Type: new Abstract: Large Language Models (LLMs) have experienced significant growth and development in recent years. Howe…

大型语言模型结构化剪枝模型压缩鲁棒性剪枝方法

📝 深度技术 arXiv AI 2026-05-19

Ghosted Layers: Unconstrained Activation Alignment for Recovering Layer-Pruned LLMs

提出Ghosted Layers，无需训练即可恢复层剪枝后LLM的性能，通过激活对齐解决隐藏状态不匹配问题。

arXiv:2605.15491v1 Announce Type: cross Abstract: Layer pruning removes entire Transformer decoder blocks from large language models, but introduces a…

层剪枝 large lang 激活对齐无需训练性能恢复

🤖 AI 工具 arXiv 计算机视觉 2026-05-19

LRCP: Low-Rank Compressibility Guided Visual Token Pruning for Efficient LVLMs

无需手动筛选视觉令牌，LRCP利用低秩可压缩性自动剪枝，大幅提升LVLMs推理效率，尤其适合高分辨率图像与长视频场景。

arXiv:2605.15621v1 Announce Type: new Abstract: Large vision-language models (LVLMs) achieve strong multimodal understanding, but their inference cost…

视觉令牌剪枝 lvlms高效推理低秩可压缩性注意力优化 gpu加速

📅 日期

2026-05-20 2026-05-19

🐂 牛哥精选

📅 日期