牛哥精选 · 本周

1

📝 深度技术 arXiv 机器学习 2026-05-20

Parallelizable memory recurrent units

提出可并行化的记忆循环单元，突破传统RNN序列计算瓶颈，显著提升训练效率

arXiv:2601.09495v3 Announce Type: replace Abstract: With the emergence of massively parallel processing units, parallelization has become a desirable …

rnn 并行化记忆单元序列建模深度学习

2

📝 深度技术 arXiv 机器学习 2026-05-20

DevBench: A Realistic, Developer-Informed Benchmark for Code Generation Models

首个融入真实开发反馈的代码生成模型评测基准，直击现有基准脱离实际代码场景的痛点。

arXiv:2601.11895v3 Announce Type: replace Abstract: DevBench is a telemetry-driven benchmark designed to evaluate Large Language Models (LLMs) on real…

代码生成基准测试开发反馈大模型评估 devbench

3

📝 深度技术 arXiv 机器学习 2026-05-20

Sparse Training of Neural Networks based on Multilevel Mirror Descent

基于多层级镜像下降的稀疏训练方法，实现神经网络训练时间减少约50%

arXiv:2602.03535v2 Announce Type: replace Abstract: We introduce a dynamic sparse training algorithm based on linearized Bregman iterations / mirror d…

稀疏训练神经网络多层级镜像下降训练时间优化深度学习

4

📝 深度技术 arXiv 机器学习 2026-05-20

CodeScaler: Scaling Code LLM Training and Test-Time Inference via Reward Models

用奖励模型突破测试用例限制，实现代码大模型训练与推理阶段的可扩展强化学习。

arXiv:2602.17684v2 Announce Type: replace Abstract: Reinforcement Learning from Verifiable Rewards (RLVR) has driven recent progress in code large lan…

codescaler 奖励模型代码大模型 rlvr 训练缩放

5

📝 深度技术 arXiv 机器学习 2026-05-20

ARROW: Augmented Replay for RObust World models

提出ARROW增强回放框架，显著提升世界模型在分布外场景的鲁棒性。

arXiv:2603.11395v2 Announce Type: replace Abstract: Continual reinforcement learning challenges agents to acquire new skills while retaining previousl…

世界模型增强回放鲁棒性强化学习模型泛化

6

📝 深度技术 arXiv 机器学习 2026-05-20

Automatic Generation of High-Performance RL Environments

自动生成强化学习环境的新方法，提升环境性能与多样性，为RL研究提供高效工具。

arXiv:2603.12145v2 Announce Type: replace Abstract: Translating complex reinforcement learning (RL) environments into high-performance implementations…

强化学习环境生成自动生成高性能

7

📝 深度技术 arXiv 机器学习 2026-05-20

A Survey of On-Policy Distillation for Large Language Models

首份大模型在线策略蒸馏综述，系统梳理方法、挑战与未来方向，适合研究者深挖。

arXiv:2604.00626v3 Announce Type: replace Abstract: As Large Language Models (LLMs) continue to grow in both capability and cost, transferring frontie…

llm 知识蒸馏在线策略蒸馏综述

8

📝 深度技术 arXiv 机器学习 2026-05-20

ClawArena: Benchmarking AI Agents in Evolving Information Environments

AI Agent在动态信息环境中的信念维护与矛盾证据处理，这篇论文定义了首个演化信息基准测试。

arXiv:2604.04202v2 Announce Type: replace Abstract: AI agents deployed as persistent assistants must maintain correct beliefs as their information env…

ai agent 基准测试信息演化矛盾证据持续学习

9

📝 深度技术 arXiv 机器学习 2026-05-20

Discrete Tilt Matching

介绍一种针对掩码扩散大语言模型的离散倾斜匹配方法，解决RL微调中边际似然难解问题。

arXiv:2604.18739v2 Announce Type: replace Abstract: Masked diffusion large language models (dLLMs) are a promising alternative to autoregressive gener…

扩散模型大语言模型强化学习微调掩码

10

📝 深度技术 arXiv 机器学习 2026-05-20

Kernelized Advantage Estimation: From Nonparametric Statistics to LLM Reasoning

该论文提出Kernelized Advantage Estimation方法，从非参数统计视角优化LLM推理，为强化学习提供新思路。

arXiv:2604.28005v2 Announce Type: replace Abstract: Recent advances in large language models (LLMs) have increasingly relied on reinforcement learning…

kernelized 非参数统计 llm推理强化学习优势函数估计

11

📝 深度技术 arXiv 机器学习 2026-05-20

Enhancing AI-Based ECG Delineation with Deep Learning Denoising Techniques

用深度学习去噪技术提升AI心电图分析精度，论文解读最新方法

arXiv:2605.03183v2 Announce Type: replace Abstract: Evaluating canine electrocardiograms (ECGs) is challenging due to noise that can obscure clinicall…

深度学习心电图去噪 ai 医学图像

12

📝 深度技术 arXiv 机器学习 2026-05-20

Matrix-Decoupled Concentration for Autoregressive Sequences: Dimension-Free Guarantees for Sparse Long-Context Rewards

自回归序列的矩阵解耦集中不等式，为稀疏长上下文奖励提供无维度保证，理论创新突破。

arXiv:2605.06017v2 Announce Type: replace Abstract: Sequence-level evaluations in autoregressive Large Language Models (LLMs) rely on highly dependent…

自回归序列集中不等式矩阵解耦无维度保证长上下文奖励

13

📝 深度技术 arXiv 机器学习 2026-05-20

MAGIQ: A Post-Quantum Multi-Agentic AI Governance System with Provable Security

后量子时代多智能体AI治理系统，提出可证明安全的MAGIQ架构，解决新兴计算范式下的安全挑战。

arXiv:2605.06933v2 Announce Type: replace Abstract: Our computing ecosystem is being transformed by two emerging paradigms: the increased deployment o…

后量子多智能体 ai治理可证明安全量子安全

14

📝 深度技术 arXiv 机器学习 2026-05-20

Targeted Tests for LLM Reasoning: An Audit-Constrained Protocol

提出审计约束协议，精准测试LLM推理对提示变化的脆弱性，避免错误归因。

arXiv:2605.11599v2 Announce Type: replace Abstract: Fixed reasoning benchmarks evaluate canonical prompts, but semantically valid changes in presentat…

llm推理提示变体审计约束定向测试审计协议

15

📝 深度技术 arXiv 机器学习 2026-05-20

NOFE - Neural Operator Function Embedding

提出连续域降维新范式，用神经算子嵌入离散点云，突破传统方法瓶颈

arXiv:2605.11970v2 Announce Type: replace Abstract: Most dimensionality reduction methods treat data as discrete point clouds, ignoring the continuous…

降维连续学习神经算子嵌入域感知

16

📝 深度技术 arXiv 机器学习 2026-05-20

Stochastic Minimum-Cost Reach-Avoid Reinforcement Learning

ICML 2026 收录：在随机环境中用强化学习求解最小成本到达-避障问题，理论突破+算法设计兼顾。

arXiv:2605.11975v2 Announce Type: replace Abstract: We study stochastic minimum-cost reach-avoid reinforcement learning, where an agent must satisfy a…

随机最小成本到达回避强化学习 icml 机器学习

17

🤖 AI·大模型 arXiv 机器学习 2026-05-20

Not Just RLHF: Why Alignment Alone Won't Fix Multi-Agent Sycophancy

研究发现多智能体系统在同伴分歧下“屈服”并非RLHF特有，基础模型同样存在该漏洞，挑战了传统对齐认知。

arXiv:2605.12991v2 Announce Type: replace Abstract: LLM-based multi-agent pipelines flip from correct to incorrect answers under simulated peer disagr…

多智能体 llm对齐谄媚 rlhf 基础模型

18

📝 深度技术 arXiv 机器学习 2026-05-20

NodeSynth: Socially Aligned Synthetic Data for AI Evaluation

利用社会对齐的合成数据，让AI评估更贴近真实社会场景，提升模型敏感性与可信度。

arXiv:2605.14381v2 Announce Type: replace Abstract: Recent advancements in generative AI facilitate large-scale synthetic data generation for model ev…

合成数据 ai评估社会对齐生成式ai 模型评估

19

📝 深度技术 arXiv 机器学习 2026-05-20

Mat\'ern Gaussian Processes on Graphs

图上的Matérn高斯过程：理论推导与图结构结合的创新方法，为图数据建模提供新视角。

arXiv:2010.15538v4 Announce Type: replace-cross Abstract: Gaussian processes are a versatile framework for learning unknown functions in a manner that…

matérn高斯过程图论高斯过程机器学习图数据建模

20

📝 深度技术 arXiv 机器学习 2026-05-20

Long Context Modeling with Ranked Memory-Augmented Retrieval

用排序记忆增强检索解决长上下文建模，突破大模型上下文窗口限制。

arXiv:2503.14800v3 Announce Type: replace-cross Abstract: Effective long-term memory management is crucial for language models handling extended conte…

长上下文记忆增强检索大模型排序记忆上下文建模

🐂 牛哥精选

Parallelizable memory recurrent units

DevBench: A Realistic, Developer-Informed Benchmark for Code Generation Models

Sparse Training of Neural Networks based on Multilevel Mirror Descent

CodeScaler: Scaling Code LLM Training and Test-Time Inference via Reward Models

ARROW: Augmented Replay for RObust World models

Automatic Generation of High-Performance RL Environments

A Survey of On-Policy Distillation for Large Language Models

ClawArena: Benchmarking AI Agents in Evolving Information Environments

Discrete Tilt Matching

Kernelized Advantage Estimation: From Nonparametric Statistics to LLM Reasoning

Enhancing AI-Based ECG Delineation with Deep Learning Denoising Techniques

Matrix-Decoupled Concentration for Autoregressive Sequences: Dimension-Free Guarantees for Sparse Long-Context Rewards

MAGIQ: A Post-Quantum Multi-Agentic AI Governance System with Provable Security

Targeted Tests for LLM Reasoning: An Audit-Constrained Protocol

NOFE - Neural Operator Function Embedding

Stochastic Minimum-Cost Reach-Avoid Reinforcement Learning

Not Just RLHF: Why Alignment Alone Won't Fix Multi-Agent Sycophancy

NodeSynth: Socially Aligned Synthetic Data for AI Evaluation

Mat\'ern Gaussian Processes on Graphs

Long Context Modeling with Ranked Memory-Augmented Retrieval

📅 日期