牛哥精选 · 三个月

1

📝 深度技术 arXiv 机器学习 2026-05-20

Forecasting Downstream Performance of LLMs With Proxy Metrics

用代理指标提前预判LLM下游表现，为模型选型提供可靠决策依据

arXiv:2605.18607v1 Announce Type: cross Abstract: Progress in language model development is often driven by comparative decisions: which architecture …

llm 性能预测代理指标下游任务模型选择

2

📝 深度技术 arXiv 机器学习 2026-05-20

An Approximation Algorithm for Graph Label Selection

被ICML 2026收录的图标签选择近似算法，9页7图含理论分析与证明。

arXiv:2605.18623v1 Announce Type: cross Abstract: In the graph label selection problem, one is given an $n$-vertex graph and a budget $k$, and seeks t…

图标签选择近似算法 icml 机器学习理论计算机

3

📝 深度技术 arXiv 机器学习 2026-05-20

Can machine learning for quantum-gas experiments be explainable?

探讨量子气体实验中的机器学习模型是否具备可解释性，揭示前沿交叉领域的关键挑战。

arXiv:2605.18689v1 Announce Type: cross Abstract: Virtually all aspects of many-body atomic physics are challenging: experiments are technically deman…

可解释ai 量子气体机器学习量子物理实验

4

📝 深度技术 arXiv 机器学习 2026-05-20

EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL

将强化学习与环境合成结合，为扩展工具使用智能体提供稳健新方法。

arXiv:2605.18703v1 Announce Type: cross Abstract: Equipping LLMs with tool-use capabilities via Agentic Reinforcement Learning (Agentic RL) is bottlen…

工具使用智能体可执行环境合成鲁棒强化学习智能体扩展

5

🤖 AI·大模型 arXiv 机器学习 2026-05-20

Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation

多模态大模型新突破，通过自蒸馏策略让AI学会捕捉视觉细节，显著提升细粒度理解能力。

arXiv:2605.18740v1 Announce Type: cross Abstract: Multimodal Large Language Models (MLLMs) still struggle with fine-grained visual understanding, wher…

多模态大模型视觉细节自蒸馏细粒度理解 arxiv论文

6

📝 深度技术 arXiv 机器学习 2026-05-20

LLM-TabLogic: Preserving Inter-Column Logical Relationships in Synthetic Tabular Data via Prompt-Guided Latent Diffusion

用LLM引导潜在扩散模型，保留合成表格数据的列间逻辑关系，提升数据真实性和可用性。

arXiv:2503.02161v3 Announce Type: replace Abstract: Synthetic tabular data are increasingly being used to replace real data, serving as an effective s…

llm 表格数据潜在扩散逻辑关系合成数据

7

📝 深度技术 arXiv 机器学习 2026-05-20

Improving Random Forests by Smoothing

通过平滑策略提升随机森林性能，创新方法值得关注

arXiv:2505.06852v2 Announce Type: replace Abstract: Random forest regression is a powerful non-parametric method that adapts to local data characteris…

随机森林平滑机器学习集成学习论文

8

📝 深度技术 arXiv 机器学习 2026-05-20

RAP: Runtime Adaptive Pruning for LLM Inference

提出运行时自适应剪枝方法，让LLM推理内存动态调整，效率大增

arXiv:2505.17138v5 Announce Type: replace Abstract: Large language models (LLMs) excel at language understanding and generation, but their enormous co…

llm推理自适应剪枝运行时优化内存约束模型压缩

9

📝 深度技术 arXiv 机器学习 2026-05-20

Fine-grained List-wise Alignment for Generative Medication Recommendation

NeurIPS 2025 Spotlight论文，提出细粒度列表对齐方法提升生成式药物推荐效果

arXiv:2505.20218v2 Announce Type: replace Abstract: Accurate and safe medication recommendations are critical for effective clinical decision-making, …

生成式药物推荐细粒度列表对齐 neurips 20 医学ai

10

📝 深度技术 arXiv 机器学习 2026-05-20

Beyond RLHF: A Unified Theoretical Framework of Alignment

一份超越RLHF的统一对齐理论框架，抽象形式化多种对齐算法并揭示内在联系，为AI安全提供新视角。

arXiv:2506.01523v2 Announce Type: replace Abstract: Alignment via reinforcement learning from human feedback (RLHF) has become the dominant paradigm f…

rlhf 对齐理论统一框架 ai安全算法形式化

11

📝 深度技术 arXiv 机器学习 2026-05-20

Algebraic Priors for Approximately Equivariant Networks

基于代数先验实现近似等变网络，巧妙平衡对称嵌入与架构灵活性，理论有新意。

arXiv:2506.08244v2 Announce Type: replace Abstract: Equivariant neural networks incorporate symmetries through group actions, embedding them as an ind…

等变网络群作用代数先验对称性神经网络

12

📝 深度技术 arXiv 机器学习 2026-05-20

Stein Diffusion Guidance: Training-Free Posterior Correction for Sampling Beyond High-Density Regions

无需训练即可校正扩散模型后验，解决高密度区域外采样难题的新方法。

arXiv:2507.05482v3 Announce Type: replace Abstract: Training-free diffusion guidance offers a flexible framework for leveraging off-the-shelf classifi…

扩散模型后验校正采样训练免费 stein方法

13

📝 深度技术 arXiv 机器学习 2026-05-20

Graph Embedding in the Graph Fractional Fourier Transform Domain

在图的分数傅里叶变换域中嵌入，突破传统谱嵌入的表达瓶颈，捕获更全面的图结构特征

arXiv:2508.02383v2 Announce Type: replace Abstract: Spectral graph embedding plays a critical role in graph representation learning by generating low-…

图嵌入分数傅里叶变换谱图理论图表示学习拉普拉斯矩阵

14

📝 深度技术 arXiv 机器学习 2026-05-20

Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training

超越正确性：通过强化学习调和过程与结果奖励，为模型训练提供新视角

arXiv:2509.03403v2 Announce Type: replace Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) improves final-answer accuracy on reasoning …

强化学习过程奖励结果奖励 rl训练泛化

15

📝 深度技术 arXiv 机器学习 2026-05-20

FediLoRA: Practical Federated Fine-Tuning of Foundation Models Under Missing-Modality Constraints

提出FediLoRA方法，在联邦微调中解决模态缺失难题，兼顾通信效率与模型性能。

arXiv:2509.06984v3 Announce Type: replace Abstract: Federated Learning with LoRA fine-tuning offers an efficient and privacy-aware solution for instit…

联邦学习 lora 多模态微调基础模型模态缺失

16

📝 深度技术 arXiv 机器学习 2026-05-20

Activation Steering with a Feedback Controller

用反馈控制器实现精准激活引导，为提升大模型可控性提供新思路，ICLR2026论文。

arXiv:2510.04309v3 Announce Type: replace Abstract: Controlling the behaviors of large language models (LLM) is fundamental to their safety alignment …

激活引导反馈控制大模型 iclr2026 可控性

17

📝 深度技术 arXiv 机器学习 2026-05-20

EvilGenie: A Reward Hacking Benchmark

首个专攻奖励黑客（reward hacking）的基准测试，评估大模型奖励欺骗能力与对齐风险。

arXiv:2511.21654v2 Announce Type: replace Abstract: We introduce EvilGenie, a benchmark for reward hacking in programming settings. We source problems…

奖励黑客基准测试 ai安全对齐大模型

18

📝 深度技术 arXiv 机器学习 2026-05-20

Goal inference with Rao-Blackwellized Particle Filters

利用Rao-Blackwellized粒子滤波器进行目标推断，提高状态估计精度，论文被IFAC 2026收录。

arXiv:2512.09269v2 Announce Type: replace Abstract: Inferring the eventual goal of a mobile agent from noisy observations of its trajectory is a funda…

rao-blackw 目标推理状态估计贝叶斯滤波机器学习

19

📝 深度技术 arXiv 机器学习 2026-05-20

Constrained Policy Optimization via Sampling-Based Weight-Space Projection

提出采样式权重空间投影方法，高效解决约束策略优化问题，已被IFAC 2026收录

arXiv:2512.13788v2 Announce Type: replace Abstract: Safety-critical learning requires policies that improve performance without leaving the safe opera…

约束策略优化采样投影权重空间 ifac 2026 强化学习

20

📝 深度技术 arXiv 机器学习 2026-05-20

Geometric Scaling of Bayesian Inference in LLMs

揭秘大模型内部贝叶斯推断的几何结构，从小模型到生产级LLM的规模扩展规律

arXiv:2512.23752v5 Announce Type: replace Abstract: Recent work has shown that small transformers trained in controlled "wind-tunnel'' settings can im…

大语言模型贝叶斯推理几何缩放机器学习理论 transforme

🐂 牛哥精选

Forecasting Downstream Performance of LLMs With Proxy Metrics

An Approximation Algorithm for Graph Label Selection

Can machine learning for quantum-gas experiments be explainable?

EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL

Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation

LLM-TabLogic: Preserving Inter-Column Logical Relationships in Synthetic Tabular Data via Prompt-Guided Latent Diffusion

Improving Random Forests by Smoothing

RAP: Runtime Adaptive Pruning for LLM Inference

Fine-grained List-wise Alignment for Generative Medication Recommendation

Beyond RLHF: A Unified Theoretical Framework of Alignment

Algebraic Priors for Approximately Equivariant Networks

Stein Diffusion Guidance: Training-Free Posterior Correction for Sampling Beyond High-Density Regions

Graph Embedding in the Graph Fractional Fourier Transform Domain

Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training

FediLoRA: Practical Federated Fine-Tuning of Foundation Models Under Missing-Modality Constraints

Activation Steering with a Feedback Controller

EvilGenie: A Reward Hacking Benchmark

Goal inference with Rao-Blackwellized Particle Filters

Constrained Policy Optimization via Sampling-Based Weight-Space Projection

Geometric Scaling of Bayesian Inference in LLMs

📅 日期