Strong Teacher Not Needed? On Distillation in LLM Pretraining
颠覆认知?弱教师模型也能有效蒸馏LLM,预训练阶段教师强度并非关键。
arXiv:2605.23857v1 Announce Type: new Abstract: Knowledge distillation generally assumes a strong-to-weak relationship where stronger teachers yield b…
颠覆认知?弱教师模型也能有效蒸馏LLM,预训练阶段教师强度并非关键。
arXiv:2605.23857v1 Announce Type: new Abstract: Knowledge distillation generally assumes a strong-to-weak relationship where stronger teachers yield b…
针对长时LLM Agent的上下文溢出问题,提出并行压缩方法,减少数十秒推理阻塞。
arXiv:2605.23296v1 Announce Type: new Abstract: Long-horizon LLM agents accumulate growing conversation histories that eventually exceed the model's c…
GitHub开源项目,让LLM应用拥有长期记忆,同时将输入token平均削减68%,大幅降低API成本。
Article URL: https://github.com/Tem-Degu/streetai-memory Comments URL: https://news.ycombinator.com/item?id=48249509 Points: 1 # Comments: 0
将Agent工作流编译进LLM权重,以极低成本实现接近前沿的质量,提出了一种颠覆性的模型优化路径。
arXiv:2605.22502v1 Announce Type: new Abstract: Agent orchestration frameworks have proliferated, collectively exceeding 290,000 GitHub stars across L…
用可组合的元标记压缩KV缓存,高效保留上下文信息,大模型推理再提速。
arXiv:2605.22337v1 Announce Type: new Abstract: The KV cache used in large language models has linearly growing time complexity, so LLMs face memory b…
无需辅助组件的投影引导跨分词器知识蒸馏,有效解决词汇不兼容问题。
arXiv:2605.21699v1 Announce Type: cross Abstract: Cross-tokenizer knowledge distillation allows a student model to learn from teachers with incompatib…
重新审视大模型剪枝后微调的必要性,挑战复杂剪枝标准,提出更高效的压缩策略。
arXiv:2510.14444v3 Announce Type: replace Abstract: Post-training pruning can substantially reduce LLM inference costs, but it often degrades quality …
提出高效视觉编码器,解决Video LLM长视频中视觉token爆炸难题,突破帧扩展瓶颈。
arXiv:2605.17260v1 Announce Type: new Abstract: The fundamental challenge in scaling Video Large Language Models (Video LLMs) to long-form video lies …
提出VeriCache方法,将有损KV Cache转化为无损LLM推理,提升模型效率与精度。
arXiv:2605.17613v1 Announce Type: cross Abstract: The large size of the KV cache has become a major bottleneck for serving LLMs with increasing contex…
提出LEAP可学习端到端自适应剪枝方法,在保持大语言模型性能的同时实现高效压缩
arXiv:2605.17289v1 Announce Type: new Abstract: Unstructured sparsity is now natively accelerated by recent GPU kernels and dataflow hardware, shiftin…
混合全微调与低秩适应的新方法,专为后训练场景优化,效率与性能兼得
arXiv:2605.18822v1 Announce Type: new Abstract: Post-training has become essential for adapting large language models (LLMs) to complex downstream beh…
探索K-Quantization对模型输出性能的影响,量化新技术深度解析
arXiv:2605.19645v1 Announce Type: new Abstract: Recent advancements in large language models (LLMs) have shown their remarkable capacities in many NLP…
量化技术让机器学习模型在低资源医疗影像场景下也能高效运行,大幅降低算力门槛,加速基层医疗智能化。
arXiv:2605.19207v1 Announce Type: cross Abstract: Deep learning models have shown strong performance in medical image analysis, but deploying them in …
深入评估编码智能体在数据驱动科学发现中的记忆压缩策略,为AI辅助科研提供新思路。
arXiv:2605.18854v1 Announce Type: new Abstract: Coding agents accumulate extensive context during long-running tasks, yet fixed context windows force …
基于平坦度的理论最优量化方法,为深度学习模型压缩提供新思路
arXiv:2605.18800v1 Announce Type: new Abstract: Post-training quantization has emerged as a widely adopted technique for compressing and accelerating …
将语义解耦与LLM推理、扩散生成融合,实现通用图像编码新范式。
arXiv:2412.18158v2 Announce Type: replace Abstract: Learned image compression methods have shown impressive performance but are often highly specializ…
提出频域残差压缩方法,大幅减少视频MLLM的token数量,高效且不损失性能。
arXiv:2605.16366v1 Announce Type: new Abstract: Video MLLMs face a persistent tension between spatial fidelity and temporal coverage: preserving fine-…
利用大模型隐藏表示实现每任务量化,在保持性能的同时大幅提升效率,值得关注的技术突破。
arXiv:2511.06516v3 Announce Type: replace Abstract: Many LLM applications require only narrow capabilities, yet standard post-training quantization (P…
提出专家引导的后合并量化方法,利用合并权重锚定,在低资源部署中平衡模型压缩与性能。
arXiv:2605.16882v1 Announce Type: new Abstract: Low-resource deployment constraints have made model quantization essential for deploying neural networ…
提出运行时自适应剪枝方法,让LLM推理内存动态调整,效率大增
arXiv:2505.17138v5 Announce Type: replace Abstract: Large language models (LLMs) excel at language understanding and generation, but their enormous co…