Forecasting Downstream Performance of LLMs With Proxy Metrics
用代理指标提前预判LLM下游表现,为模型选型提供可靠决策依据
arXiv:2605.18607v1 Announce Type: cross Abstract: Progress in language model development is often driven by comparative decisions: which architecture …
用代理指标提前预判LLM下游表现,为模型选型提供可靠决策依据
arXiv:2605.18607v1 Announce Type: cross Abstract: Progress in language model development is often driven by comparative decisions: which architecture …
被ICML 2026收录的图标签选择近似算法,9页7图含理论分析与证明。
arXiv:2605.18623v1 Announce Type: cross Abstract: In the graph label selection problem, one is given an $n$-vertex graph and a budget $k$, and seeks t…
探讨量子气体实验中的机器学习模型是否具备可解释性,揭示前沿交叉领域的关键挑战。
arXiv:2605.18689v1 Announce Type: cross Abstract: Virtually all aspects of many-body atomic physics are challenging: experiments are technically deman…
将强化学习与环境合成结合,为扩展工具使用智能体提供稳健新方法。
arXiv:2605.18703v1 Announce Type: cross Abstract: Equipping LLMs with tool-use capabilities via Agentic Reinforcement Learning (Agentic RL) is bottlen…
多模态大模型新突破,通过自蒸馏策略让AI学会捕捉视觉细节,显著提升细粒度理解能力。
arXiv:2605.18740v1 Announce Type: cross Abstract: Multimodal Large Language Models (MLLMs) still struggle with fine-grained visual understanding, wher…
用LLM引导潜在扩散模型,保留合成表格数据的列间逻辑关系,提升数据真实性和可用性。
arXiv:2503.02161v3 Announce Type: replace Abstract: Synthetic tabular data are increasingly being used to replace real data, serving as an effective s…
通过平滑策略提升随机森林性能,创新方法值得关注
arXiv:2505.06852v2 Announce Type: replace Abstract: Random forest regression is a powerful non-parametric method that adapts to local data characteris…
提出运行时自适应剪枝方法,让LLM推理内存动态调整,效率大增
arXiv:2505.17138v5 Announce Type: replace Abstract: Large language models (LLMs) excel at language understanding and generation, but their enormous co…
NeurIPS 2025 Spotlight论文,提出细粒度列表对齐方法提升生成式药物推荐效果
arXiv:2505.20218v2 Announce Type: replace Abstract: Accurate and safe medication recommendations are critical for effective clinical decision-making, …
一份超越RLHF的统一对齐理论框架,抽象形式化多种对齐算法并揭示内在联系,为AI安全提供新视角。
arXiv:2506.01523v2 Announce Type: replace Abstract: Alignment via reinforcement learning from human feedback (RLHF) has become the dominant paradigm f…
基于代数先验实现近似等变网络,巧妙平衡对称嵌入与架构灵活性,理论有新意。
arXiv:2506.08244v2 Announce Type: replace Abstract: Equivariant neural networks incorporate symmetries through group actions, embedding them as an ind…
无需训练即可校正扩散模型后验,解决高密度区域外采样难题的新方法。
arXiv:2507.05482v3 Announce Type: replace Abstract: Training-free diffusion guidance offers a flexible framework for leveraging off-the-shelf classifi…
在图的分数傅里叶变换域中嵌入,突破传统谱嵌入的表达瓶颈,捕获更全面的图结构特征
arXiv:2508.02383v2 Announce Type: replace Abstract: Spectral graph embedding plays a critical role in graph representation learning by generating low-…
超越正确性:通过强化学习调和过程与结果奖励,为模型训练提供新视角
arXiv:2509.03403v2 Announce Type: replace Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) improves final-answer accuracy on reasoning …
提出FediLoRA方法,在联邦微调中解决模态缺失难题,兼顾通信效率与模型性能。
arXiv:2509.06984v3 Announce Type: replace Abstract: Federated Learning with LoRA fine-tuning offers an efficient and privacy-aware solution for instit…
用反馈控制器实现精准激活引导,为提升大模型可控性提供新思路,ICLR2026论文。
arXiv:2510.04309v3 Announce Type: replace Abstract: Controlling the behaviors of large language models (LLM) is fundamental to their safety alignment …
首个专攻奖励黑客(reward hacking)的基准测试,评估大模型奖励欺骗能力与对齐风险。
arXiv:2511.21654v2 Announce Type: replace Abstract: We introduce EvilGenie, a benchmark for reward hacking in programming settings. We source problems…
利用Rao-Blackwellized粒子滤波器进行目标推断,提高状态估计精度,论文被IFAC 2026收录。
arXiv:2512.09269v2 Announce Type: replace Abstract: Inferring the eventual goal of a mobile agent from noisy observations of its trajectory is a funda…
提出采样式权重空间投影方法,高效解决约束策略优化问题,已被IFAC 2026收录
arXiv:2512.13788v2 Announce Type: replace Abstract: Safety-critical learning requires policies that improve performance without leaving the safe opera…
揭秘大模型内部贝叶斯推断的几何结构,从小模型到生产级LLM的规模扩展规律
arXiv:2512.23752v5 Announce Type: replace Abstract: Recent work has shown that small transformers trained in controlled "wind-tunnel'' settings can im…