牛哥精选 · 本周

1

🤖 AI·大模型 arXiv 机器学习 2026-05-25

Strong Teacher Not Needed? On Distillation in LLM Pretraining

颠覆认知？弱教师模型也能有效蒸馏LLM，预训练阶段教师强度并非关键。

arXiv:2605.23857v1 Announce Type: new Abstract: Knowledge distillation generally assumes a strong-to-weak relationship where stronger teachers yield b…

大语言模型知识蒸馏预训练模型压缩弱到弱蒸馏

2

📝 深度技术 arXiv NLP 2026-05-22

X-Token: Projection-Guided Cross-Tokenizer Knowledge Distillation

无需辅助组件的投影引导跨分词器知识蒸馏，有效解决词汇不兼容问题。

arXiv:2605.21699v1 Announce Type: cross Abstract: Cross-tokenizer knowledge distillation allows a student model to learn from teachers with incompatib…

知识蒸馏跨分词器投影引导模型压缩学生模型

3

📝 深度技术 arXiv AI 2026-05-21

What and When to Distill: Selective Hindsight Distillation for Multi-Turn Agents

多轮对话代理只能“一刀切”蒸馏？这篇论文给出何时蒸馏、蒸馏什么的智能选择策略

arXiv:2605.19447v1 Announce Type: new Abstract: Reinforcement learning can train LLM agents from sparse task rewards, but long-horizon credit assignme…

知识蒸馏多轮对话代理训练选择性蒸馏后见学习

4

📝 深度技术 arXiv AI 2026-05-20

Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training

提出稀疏到稠密奖励原则，四阶段后训练流程更高效利用稀缺标注数据，为LLM推理优化提供新范式。

arXiv:2605.12483v2 Announce Type: replace-cross Abstract: When labeled verifiable training data is scarce, each checked example should be used where i…

llm后训练强化学习奖励设计 grpo 知识蒸馏

5

📝 深度技术 arXiv 机器学习 2026-05-20

A Survey of On-Policy Distillation for Large Language Models

首份大模型在线策略蒸馏综述，系统梳理方法、挑战与未来方向，适合研究者深挖。

arXiv:2604.00626v3 Announce Type: replace Abstract: As Large Language Models (LLMs) continue to grow in both capability and cost, transferring frontie…

llm 知识蒸馏在线策略蒸馏综述

6

📝 深度技术 arXiv AI 2026-05-20

Cognitive-Uncertainty Guided Knowledge Distillation for Accurate Classification of Student Misconceptions

用认知不确定性引导知识蒸馏，解决学生误解分类中数据稀疏与边界模糊难题。

arXiv:2605.14752v1 Announce Type: cross Abstract: Accurately identifying student misconceptions is crucial for personalized education but faces three …

认知不确定性知识蒸馏学生误解分类个性化教育机器学习

7

📝 深度技术 arXiv AI 2026-05-20

BiFedKD: Bidirectional Federated Knowledge Distillation Framework for Non-IID and Long-Tailed ECG Monitoring

双向联邦知识蒸馏框架，破解非独立同分布与长尾心电图监测的隐私与效率难题

arXiv:2605.14886v1 Announce Type: new Abstract: Electrocardiogram (ECG) monitoring in Internet of Medical Things (IoMT) networks is constrained by str…

联邦学习知识蒸馏 ecg监测非iid 长尾分布

8

📝 深度技术 arXiv 机器学习 2026-05-20

Lossless Anti-Distillation Sampling

提出一种无损抗蒸馏采样方法，为保护大模型知识产权提供新思路

arXiv:2605.18829v1 Announce Type: new Abstract: Frontier commercial generative models face a growing threat from distillation, whereby a distiller har…

无损抗蒸馏采样大模型知识蒸馏

9

📝 深度技术 arXiv 计算机视觉 2026-05-19

How to Choose Your Teacher for Fine Grained Image Recognition

细粒度图像识别中教师模型如何选？这篇研究为资源受限设备的知识蒸馏提供了新思路。

arXiv:2605.15689v1 Announce Type: new Abstract: Fine-grained image recognition classifies subcategories such as bird species or car models. While stat…

细粒度图像识别知识蒸馏教师模型选择资源受限设备

10

📝 深度技术 arXiv AI 2026-05-19

Flow-OPD: On-Policy Distillation for Flow Matching Models

提出Flow-OPD新方法，用同策略蒸馏解决流匹配模型在多任务对齐中的奖励稀疏和梯度干扰问题。

arXiv:2605.08063v3 Announce Type: replace-cross Abstract: Existing Flow Matching (FM) text-to-image models suffer from two critical bottlenecks under …

flow match 知识蒸馏文本到图像多任务对齐

11

📝 深度技术 arXiv AI 2026-05-19

BiSpikCLM: A Spiking Language Model integrating Softmax-Free Spiking Attention and Spike-Aware Alignment Distillation

脉冲语言模型新突破：无Softmax注意力与对齐蒸馏打造超低能耗大模型。

arXiv:2605.13859v1 Announce Type: cross Abstract: Spiking Neural Networks (SNNs) offer promising energy-efficient alternatives to large language model…

脉冲神经网络语言模型注意力机制知识蒸馏能效

12

🤖 AI·大模型 arXiv AI 2026-05-19

AMiD: Knowledge Distillation for LLMs with $\alpha$-mixture Assistant Distribution

AMiD提出了一种统一的知识蒸馏框架，通过α-混合辅助分布系统性地桥接了教师与学生的容量鸿沟，解决了因高维输出近零概率引发的训练不稳定问题——这是LLM蒸馏中关键却长期碎片化的挑战。

arXiv:2510.15982v3 Announce Type: replace-cross Abstract: Autoregressive large language models (LLMs) have achieved remarkable improvement across many…

知识蒸馏大语言模型 α-混合辅助分布 amid 散度

🐂 牛哥精选

Strong Teacher Not Needed? On Distillation in LLM Pretraining

X-Token: Projection-Guided Cross-Tokenizer Knowledge Distillation

What and When to Distill: Selective Hindsight Distillation for Multi-Turn Agents

Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training

A Survey of On-Policy Distillation for Large Language Models

Cognitive-Uncertainty Guided Knowledge Distillation for Accurate Classification of Student Misconceptions

BiFedKD: Bidirectional Federated Knowledge Distillation Framework for Non-IID and Long-Tailed ECG Monitoring

Lossless Anti-Distillation Sampling

How to Choose Your Teacher for Fine Grained Image Recognition

Flow-OPD: On-Policy Distillation for Flow Matching Models

BiSpikCLM: A Spiking Language Model integrating Softmax-Free Spiking Attention and Spike-Aware Alignment Distillation

AMiD: Knowledge Distillation for LLMs with $\alpha$-mixture Assistant Distribution

📅 日期