Parallelizable memory recurrent units
提出可并行化的记忆循环单元,突破传统RNN序列计算瓶颈,显著提升训练效率
arXiv:2601.09495v3 Announce Type: replace Abstract: With the emergence of massively parallel processing units, parallelization has become a desirable …
提出可并行化的记忆循环单元,突破传统RNN序列计算瓶颈,显著提升训练效率
arXiv:2601.09495v3 Announce Type: replace Abstract: With the emergence of massively parallel processing units, parallelization has become a desirable …
首个融入真实开发反馈的代码生成模型评测基准,直击现有基准脱离实际代码场景的痛点。
arXiv:2601.11895v3 Announce Type: replace Abstract: DevBench is a telemetry-driven benchmark designed to evaluate Large Language Models (LLMs) on real…
基于多层级镜像下降的稀疏训练方法,实现神经网络训练时间减少约50%
arXiv:2602.03535v2 Announce Type: replace Abstract: We introduce a dynamic sparse training algorithm based on linearized Bregman iterations / mirror d…
用奖励模型突破测试用例限制,实现代码大模型训练与推理阶段的可扩展强化学习。
arXiv:2602.17684v2 Announce Type: replace Abstract: Reinforcement Learning from Verifiable Rewards (RLVR) has driven recent progress in code large lan…
提出ARROW增强回放框架,显著提升世界模型在分布外场景的鲁棒性。
arXiv:2603.11395v2 Announce Type: replace Abstract: Continual reinforcement learning challenges agents to acquire new skills while retaining previousl…
自动生成强化学习环境的新方法,提升环境性能与多样性,为RL研究提供高效工具。
arXiv:2603.12145v2 Announce Type: replace Abstract: Translating complex reinforcement learning (RL) environments into high-performance implementations…
首份大模型在线策略蒸馏综述,系统梳理方法、挑战与未来方向,适合研究者深挖。
arXiv:2604.00626v3 Announce Type: replace Abstract: As Large Language Models (LLMs) continue to grow in both capability and cost, transferring frontie…
AI Agent在动态信息环境中的信念维护与矛盾证据处理,这篇论文定义了首个演化信息基准测试。
arXiv:2604.04202v2 Announce Type: replace Abstract: AI agents deployed as persistent assistants must maintain correct beliefs as their information env…
介绍一种针对掩码扩散大语言模型的离散倾斜匹配方法,解决RL微调中边际似然难解问题。
arXiv:2604.18739v2 Announce Type: replace Abstract: Masked diffusion large language models (dLLMs) are a promising alternative to autoregressive gener…
该论文提出Kernelized Advantage Estimation方法,从非参数统计视角优化LLM推理,为强化学习提供新思路。
arXiv:2604.28005v2 Announce Type: replace Abstract: Recent advances in large language models (LLMs) have increasingly relied on reinforcement learning…
用深度学习去噪技术提升AI心电图分析精度,论文解读最新方法
arXiv:2605.03183v2 Announce Type: replace Abstract: Evaluating canine electrocardiograms (ECGs) is challenging due to noise that can obscure clinicall…
自回归序列的矩阵解耦集中不等式,为稀疏长上下文奖励提供无维度保证,理论创新突破。
arXiv:2605.06017v2 Announce Type: replace Abstract: Sequence-level evaluations in autoregressive Large Language Models (LLMs) rely on highly dependent…
后量子时代多智能体AI治理系统,提出可证明安全的MAGIQ架构,解决新兴计算范式下的安全挑战。
arXiv:2605.06933v2 Announce Type: replace Abstract: Our computing ecosystem is being transformed by two emerging paradigms: the increased deployment o…
提出审计约束协议,精准测试LLM推理对提示变化的脆弱性,避免错误归因。
arXiv:2605.11599v2 Announce Type: replace Abstract: Fixed reasoning benchmarks evaluate canonical prompts, but semantically valid changes in presentat…
提出连续域降维新范式,用神经算子嵌入离散点云,突破传统方法瓶颈
arXiv:2605.11970v2 Announce Type: replace Abstract: Most dimensionality reduction methods treat data as discrete point clouds, ignoring the continuous…
ICML 2026 收录:在随机环境中用强化学习求解最小成本到达-避障问题,理论突破+算法设计兼顾。
arXiv:2605.11975v2 Announce Type: replace Abstract: We study stochastic minimum-cost reach-avoid reinforcement learning, where an agent must satisfy a…
研究发现多智能体系统在同伴分歧下“屈服”并非RLHF特有,基础模型同样存在该漏洞,挑战了传统对齐认知。
arXiv:2605.12991v2 Announce Type: replace Abstract: LLM-based multi-agent pipelines flip from correct to incorrect answers under simulated peer disagr…
利用社会对齐的合成数据,让AI评估更贴近真实社会场景,提升模型敏感性与可信度。
arXiv:2605.14381v2 Announce Type: replace Abstract: Recent advancements in generative AI facilitate large-scale synthetic data generation for model ev…
图上的Matérn高斯过程:理论推导与图结构结合的创新方法,为图数据建模提供新视角。
arXiv:2010.15538v4 Announce Type: replace-cross Abstract: Gaussian processes are a versatile framework for learning unknown functions in a manner that…
用排序记忆增强检索解决长上下文建模,突破大模型上下文窗口限制。
arXiv:2503.14800v3 Announce Type: replace-cross Abstract: Effective long-term memory management is crucial for language models handling extended conte…