Dr.LLM: Dynamic Layer Routing in LLMs
提出动态层路由机制,让LLM推理时跳过无关层,显著提升效率与精度。
arXiv:2510.12773v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) process every token through all layers of a transformer stack, …
提出动态层路由机制,让LLM推理时跳过无关层,显著提升效率与精度。
arXiv:2510.12773v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) process every token through all layers of a transformer stack, …
提出DISK可微稀疏核复合体,实现高效空间可变卷积,已被ICLR 2026接收。
arXiv:2512.04556v2 Announce Type: replace-cross Abstract: Image convolution with complex kernels is a fundamental operation in photography, scientific…
MoE架构在严格等资源条件下首次证明超越稠密大模型,ICLR 2026最新研究。
arXiv:2506.12119v2 Announce Type: replace Abstract: Mixture-of-Experts (MoE) language models dramatically expand model capacity and achieve remarkable…
ICLR 2026 顶会论文:用信息论指导消除奖励模型中的归纳偏置,为强化学习对齐提供更客观的评估基础
arXiv:2512.23461v2 Announce Type: replace Abstract: Reward models (RMs) are essential in reinforcement learning from human feedback (RLHF) to align la…
ICLR 2026论文提出混合训练框架,统一视觉-语言-动作模型,提升多模态具身智能表现。
arXiv:2510.00600v2 Announce Type: replace-cross Abstract: Using Large Language Models to produce intermediate thoughts, a.k.a. Chain-of-thought (CoT),…
ICLR 2026接受的论文,用近端优化改进扩散模型,实现更高效的神经采样器,适合机器学习研究者。
arXiv:2510.03824v2 Announce Type: replace Abstract: The task of learning a diffusion-based neural sampler for drawing samples from an unnormalized tar…