牛哥精选 · 本月

1

🤖 AI·大模型 arXiv 机器学习 2026-05-25

Strong Teacher Not Needed? On Distillation in LLM Pretraining

颠覆认知？弱教师模型也能有效蒸馏LLM，预训练阶段教师强度并非关键。

arXiv:2605.23857v1 Announce Type: new Abstract: Knowledge distillation generally assumes a strong-to-weak relationship where stronger teachers yield b…

大语言模型知识蒸馏预训练模型压缩弱到弱蒸馏

2

📝 深度技术 arXiv AI 2026-05-23

Memory-Efficient LLM Pretraining via Minimalist Optimizer Design

提出极简优化器设计，大幅降低大模型预训练内存占用，已被ICML 2026接收。

arXiv:2506.16659v3 Announce Type: replace-cross Abstract: Training large language models (LLMs) relies on adaptive optimizers such as Adam, which intr…

llm预训练内存优化优化器设计极简架构 icml 2026

3

📝 深度技术 arXiv NLP 2026-05-22

Understanding Data Temporality Impact on Large Language Models Pre-training

最新研究揭示数据时间顺序对LLM预训练的深刻影响，理解时序偏差是关键

arXiv:2605.22769v1 Announce Type: new Abstract: Large language models (LLMs) are typically trained on shuffled corpora, yielding models whose knowledg…

数据时间性大语言模型预训练时序偏差影响研究

4

📝 深度技术 arXiv 机器学习 2026-05-21

LLM Pretraining Shapes a Generalizable Manifold: Insights into Cross-Modal Transfer to Time Series

LLM预训练的隐藏能力：学习到的数据流形可跨模态迁移至时间序列任务，揭示通用表征机制。

arXiv:2605.20449v1 Announce Type: new Abstract: Can language-pretrained transformers become effective time-series forecasters, and why? In this paper,…

llm 预训练跨模态迁移时间序列流形学习

5

📝 深度技术 arXiv 机器学习 2026-05-20

Revisiting the Adam-SGD Gap in LLM Pre-Training: The Role of Large Effective Learning Rates

揭秘SGD在LLM预训练中不如Adam的根源：大有效学习率的关键作用。

arXiv:2605.17787v1 Announce Type: new Abstract: It is widely believed that stochastic gradient descent (SGD) performs significantly worse than adaptiv…

llm预训练 adam优化器 sgd差距有效学习率深度学习

6

📝 深度技术 OpenAI 官方博客 2026-05-20

Text and code embeddings by contrastive pre-training

OpenAI对比预训练方法，学习文本与代码的高质量嵌入表示

对比预训练嵌入学习 openai 文本嵌入代码嵌入

7

📝 深度技术 arXiv 机器学习 2026-05-20

SMART Fine-tuning Factor Augmented Neural Lasso

提出SMART框架，将预训练模型融入高维非参数变量选择，为微调提供理论基础。

arXiv:2604.12288v2 Announce Type: replace-cross Abstract: Fine-tuning is a widely used strategy for adapting pre-trained models to new tasks, yet its …

微调非参数高维统计变量选择 smart框架

8

📝 深度技术 arXiv 机器学习 2026-05-20

SlimQwen: Exploring the Pruning and Distillation in Large MoE Model Pre-training

揭秘MoE大模型预训练中剪枝与蒸馏技术，SlimQwen优化效率与性能。

arXiv:2605.08738v2 Announce Type: replace Abstract: Structured pruning and knowledge distillation (KD) are typical techniques for compressing large la…

moe 剪枝蒸馏预训练大模型

9

📝 深度技术 arXiv AI 2026-05-20

Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining

从视频中自动合成海量GUI交互轨迹，破解GUI Agent预训练数据稀缺难题，让智能体更好理解真实应用。

arXiv:2605.14747v1 Announce Type: cross Abstract: Recent advances in multimodal large language models have driven growing interest in graphical user i…

gui agent 视频合成交互轨迹预训练多模态大模型

10

📝 深度技术 arXiv NLP 2026-05-20

Efficient Pre-Training with Token Superposition

提出Token叠加技术，颠覆预训练效率瓶颈，大幅降低算力需求，LLM训练优化必读。

arXiv:2605.06546v2 Announce Type: replace Abstract: Pre-training of Large Language Models is often prohibitively expensive and inefficient at scale, r…

预训练 token supe 高效训练大语言模型机器学习

11

📝 深度技术 arXiv NLP 2026-05-20

From BERT to T5: A Study of Named Entity Recognition

从BERT到T5，一篇扎实的NER微调实战对比，技术细节丰富。

arXiv:2605.18462v1 Announce Type: new Abstract: Named entity recognition (NER) has been one of the essential preliminary steps in modern NLP applicati…

ner bert t5 预训练模型微调

12

🤖 AI·大模型 arXiv NLP 2026-05-20

Language Acquisition Device in Large Language Models

探讨如何借鉴语言习得装置，通过合成语言预训练提升大模型的数据效率，为AI发展带来新思路。

arXiv:2605.16758v1 Announce Type: new Abstract: Large Language Models (LLMs) remain substantially less data-efficient than humans. Pre-pretraining (PP…

大型语言模型语言习得数据效率合成语言预训练

13

📝 深度技术 arXiv 机器学习 2026-05-20

Universal Pose Pretraining for Generalizable Vision-Language-Action Policies

机器人基础模型新突破：通用姿态预训练让视觉-语言-动作策略泛化能力飙升，已被RSS 2026接收。

arXiv:2602.19710v2 Announce Type: replace-cross Abstract: Existing Vision-Language-Action (VLA) models often suffer from feature collapse and low trai…

通用姿态预训练视觉语言动作策略机器人基础模型 rss 2026 泛化性

14

🤖 AI·大模型 arXiv 机器学习 2026-05-20

Generating Pretraining Tokens from Organic Data for Data-Bound Scaling

LLM预训练正从算力受限转向数据受限，这篇论文探讨如何从有机数据生成预训练token来突破规模瓶颈。

arXiv:2605.17849v1 Announce Type: cross Abstract: LLM pretraining is shifting from a compute-bound to a data-bound regime, where available human (orga…

llm预训练数据瓶颈有机数据生成token 规模定律

15

🤖 AI·大模型 arXiv 机器学习 2026-05-20

Protein Fold Classification at Scale: Benchmarking and Pretraining

构建大规模非冗余蛋白质折叠分类基准TEDBench，突破尺度瓶颈，助力生物大分子功能解析。

arXiv:2605.18552v1 Announce Type: new Abstract: Classifying protein topology is essential for deciphering biological function, but progress is held ba…

蛋白质折叠分类基准测试预训练规模生物信息学

16

📝 深度技术 arXiv 机器学习 2026-05-20

How Do Electrocardiogram Models Scale?

探讨心电图模型缩放定律：增大模型规模并非总能带来性能提升，挑战自然语言处理经验。

arXiv:2605.17276v1 Announce Type: new Abstract: While scaling laws have established a fundamental framework for foundation models in natural language …

心电图缩放定律基础模型深度学习预训练

17

📝 深度技术 arXiv NLP 2026-05-20

QuCo-RAG: Quantifying Uncertainty from the Pre-training Corpus for Dynamic Retrieval-Augmented Generation

最新研究通过量化预训练语料不确定性，实现动态优化检索增强生成策略，提升生成质量

arXiv:2512.19134v2 Announce Type: replace Abstract: Dynamic Retrieval-Augmented Generation adaptively determines when to retrieve during generation to…

rag 不确定性量化预训练语料动态检索 acl 2026

18

📝 深度技术 arXiv 机器学习 2026-05-20

Improved Baselines with Representation Autoencoders

用预训练视觉编码器替代传统 VAE，系统性设计选择研究揭示三大简化改进思路。

arXiv:2605.18324v1 Announce Type: cross Abstract: Representation Autoencoders (RAE) replace traditional VAE with pretrained vision encoders. In this p…

representa 预训练视觉编码器 vae 改进基线表示学习

19

📝 深度技术 arXiv 机器学习 2026-05-20

Extending Pretrained 10-Second ECG Foundation Models to Longer Horizons

将10秒心电基础模型扩展至更长时间窗口，研究时序模型泛化能力。

arXiv:2605.16975v1 Announce Type: new Abstract: Electrocardiogram (ECG) foundation models pretrained on typical diagnostic 10-second ECG segments, hav…

ecg 基础模型预训练时序模型心电信号

20

🤖 AI·大模型 arXiv NLP 2026-05-20

Rethinking 1-bit Optimization Leveraging Pre-trained Large Language Models

颠覆传统，用预训练大模型突破1-bit量化瓶颈，既省存储又保精度。

arXiv:2508.06974v2 Announce Type: replace Abstract: 1-bit LLM quantization offers significant advantages in reducing storage and computational costs. …

1-bit量化大语言模型预训练模型渐进式优化训练成本

🐂 牛哥精选

Strong Teacher Not Needed? On Distillation in LLM Pretraining

Memory-Efficient LLM Pretraining via Minimalist Optimizer Design

Understanding Data Temporality Impact on Large Language Models Pre-training

LLM Pretraining Shapes a Generalizable Manifold: Insights into Cross-Modal Transfer to Time Series

Revisiting the Adam-SGD Gap in LLM Pre-Training: The Role of Large Effective Learning Rates

Text and code embeddings by contrastive pre-training

SMART Fine-tuning Factor Augmented Neural Lasso

SlimQwen: Exploring the Pruning and Distillation in Large MoE Model Pre-training

Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining

Efficient Pre-Training with Token Superposition

From BERT to T5: A Study of Named Entity Recognition

Language Acquisition Device in Large Language Models

Universal Pose Pretraining for Generalizable Vision-Language-Action Policies

Generating Pretraining Tokens from Organic Data for Data-Bound Scaling

Protein Fold Classification at Scale: Benchmarking and Pretraining

How Do Electrocardiogram Models Scale?

QuCo-RAG: Quantifying Uncertainty from the Pre-training Corpus for Dynamic Retrieval-Augmented Generation

Improved Baselines with Representation Autoencoders

Extending Pretrained 10-Second ECG Foundation Models to Longer Horizons

Rethinking 1-bit Optimization Leveraging Pre-trained Large Language Models

📅 日期