牛哥精选 · 所有

1

🤖 AI·大模型 arXiv AI 2026-07-03

Neuron-Aware Data Selection for Annotation-Free LLM Self-Distillation

无需人工标注，通过神经元激活模式筛选数据，实现LLM高效自蒸馏训练。

arXiv:2607.02460v1 Announce Type: cross Abstract: Post-training large language models (LLMs) without real-world interaction feedback or human-labeled …

llm 自蒸馏数据选择神经元感知无监督学习

2

🤖 AI·大模型 arXiv AI 2026-06-05

SHRED: Retain-Set-Free Unlearning via Self-Distillation with Logit Demotion

无需保留集！新方法SHRED用自蒸馏+logit降级实现LLM高效遗忘，拒绝灾难性性能下降。

arXiv:2605.07482v2 Announce Type: replace-cross Abstract: Machine unlearning for large language models (LLMs) aims to selectively remove memorized con…

机器遗忘 llm 自蒸馏 logit降级保留集

3

📝 深度技术 arXiv AI 2026-06-04

Making Expert Reasoning Learnable with Self-Distillation

自蒸馏让大模型在难题上学会专家推理，摆脱依赖更强模型或采样正确解的局限

arXiv:2602.02405v2 Announce Type: replace-cross Abstract: Improving the reasoning capabilities of large language models (LLMs) typically relies either…

自我蒸馏推理能力大语言模型专家推理训练方法

4

🤖 AI·大模型 arXiv 机器学习 2026-05-28

ROSD: Reflective On-Policy Self-Distillation for Language Model Reasoning across Domains

一种新颖的反思式策略自蒸馏方法，让语言模型通过自我反思实现跨领域推理能力提升，无需人工标注。

arXiv:2605.28014v1 Announce Type: cross Abstract: On-policy self-distillation (OPSD) improves the reasoning performance of large language models (LLMs…

自蒸馏跨领域推理反思机制强化学习语言模型

5

📝 深度技术 arXiv AI 2026-05-28

Restoring the Sweet Spot: Pass-Rate Weighted Self-Distillation for LLM Reasoning

通过pass-rate加权自蒸馏，恢复LLM推理的“甜蜜点”，破解GRPO归一化带来的学习偏差。

arXiv:2605.27765v1 Announce Type: cross Abstract: Self-Distillation Policy Optimization (SDPO) provides dense token-level credit assignment for reinfo…

自蒸馏 llm推理 grpo 优势归一化可学习性

6

📝 深度技术 arXiv AI 2026-05-25

VISD: Enhancing Video Reasoning via Structured Self-Distillation

视频推理新突破：结构化自蒸馏方法VISD可显著提升模型理解能力

arXiv:2605.06094v4 Announce Type: replace-cross Abstract: Training VideoLLMs for complex reasoning remains challenging due to sparse sequence level re…

visd 视频推理结构化自蒸馏 ai技术论文

7

🤖 AI·大模型 arXiv AI 2026-05-23

Tailoring Teaching to Aptitude: Direction-Adaptive Self-Distillation for LLM Reasoning

一种针对大模型推理能力的自适应蒸馏方法，根据模型当前水平动态调整教学方向，提升效果与效率。

arXiv:2605.22263v1 Announce Type: cross Abstract: On-policy self-distillation (OPSD) is an emerging LLM post-training paradigm in which the model serv…

llm推理自蒸馏方向自适应模型训练推理优化

8

🤖 AI·大模型 arXiv NLP 2026-05-22

Self-Policy Distillation via Capability-Selective Subspace Projection

无需外部信号，自我蒸馏新范式：能力选择子空间投影让LLM自主提升性能

arXiv:2605.22675v1 Announce Type: new Abstract: Self-distillation bootstraps large language models (LLMs) by training on their own generations. Howeve…

自蒸馏大语言模型子空间投影能力选择模型优化

9

🤖 AI·大模型 arXiv NLP 2026-05-22

UniSD: Towards a Unified Self-Distillation Framework for Large Language Models

探索统一自蒸馏框架UniSD，为大型语言模型的高效优化与性能提升提供新思路。

arXiv:2605.06597v2 Announce Type: replace Abstract: Self-distillation (SD) offers a promising path for adapting large language models (LLMs) without r…

大语言模型自蒸馏知识蒸馏模型优化 llm框架

10

🤖 AI·大模型 arXiv 机器学习 2026-05-21

Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?

揭秘自蒸馏为何会损害LLM的数学推理能力，并指出抑制关键探索过程是背后原因。

arXiv:2603.24472v3 Announce Type: replace-cross Abstract: Self-distillation has emerged as an effective post-training paradigm for LLMs, often improvi…

自蒸馏 llm推理数学推理推理退化中间步骤抑制

11

🤖 AI·大模型 arXiv 机器学习 2026-05-21

It Takes Two: Complementary Self-Distillation for Contextual Integrity in LLMs

互补自蒸馏如何维护大模型上下文完整性？这项研究提出双模型协作新方案，为LLM安全对齐提供创新思路。

arXiv:2605.20258v1 Announce Type: new Abstract: Contextual Integrity (CI) defines privacy not merely as keeping information hidden, but as governing i…

互补自蒸馏上下文完整性 llm 大模型自我蒸馏

12

🤖 AI·大模型 arXiv 机器学习 2026-05-20

Post-Trained MoE Can Skip Half Experts via Self-Distillation

最新研究：后训练MoE模型通过自蒸馏跳过一半专家，无需从头预训练，显著降低计算量。

arXiv:2605.18643v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) scales language models efficiently through sparse expert activation, and its …

moe 混合专家模型自蒸馏稀疏激活大模型效率优化

13

📝 深度技术 arXiv NLP 2026-05-20

Roll Out and Roll Back: Diffusion LLMs are Their Own Efficiency Teachers

扩散LLM无需外部教师，通过“展开回退”策略自我提升推理效率，开辟模型加速新方向。

arXiv:2605.16941v1 Announce Type: new Abstract: Diffusion Large Language Models (DLLMs) promise fast parallel generation, yet open-source DLLMs still …

扩散模型大语言模型效率优化自蒸馏推理加速

14

🤖 AI·大模型 arXiv 机器学习 2026-05-20

Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation

多模态大模型新突破，通过自蒸馏策略让AI学会捕捉视觉细节，显著提升细粒度理解能力。

arXiv:2605.18740v1 Announce Type: cross Abstract: Multimodal Large Language Models (MLLMs) still struggle with fine-grained visual understanding, wher…

多模态大模型视觉细节自蒸馏细粒度理解 arxiv论文

15

📝 深度技术 arXiv 机器学习 2026-05-20

HINT-SD: Targeted Hindsight Self-Distillation for Long-Horizon Agents

从长时域智能体学习挑战入手，提出目标后见自蒸馏方法，提升复杂任务表现。

arXiv:2605.17873v1 Announce Type: new Abstract: Training long-horizon LLM agents with reinforcement learning is challenging because sparse outcome rew…

hint-sd 自蒸馏长时域智能体强化学习目标后见

16

📝 深度技术 arXiv 机器学习 2026-05-19

Few-Step Diffusion Language Models via Trajectory Self-Distillation

提出轨迹自蒸馏方法，让扩散语言模型用少步就能快速并行生成文本，突破推理速度瓶颈。

arXiv:2602.12262v3 Announce Type: replace-cross Abstract: Diffusion large language models (DLLMs) have emerged as powerful generative models with the …

扩散模型语言模型自蒸馏快速文本生成并行解码

17

📝 深度技术 arXiv AI 2026-05-19

GEAR: Granularity-Adaptive Advantage Reweighting for LLM Agents via Self-Distillation

提出粒度自适应优势重加权方法，用自蒸馏实现LLM Agent的细粒度信用分配，改进策略学习效率。

arXiv:2605.11853v2 Announce Type: replace-cross Abstract: Reinforcement learning has become a widely used post-training approach for LLM agents, where…

大语言模型强化学习自蒸馏粒度自适应 llm agent

🐂 牛哥精选