Goal-Conditioned Supervised Learning for LLM Fine-Tuning
提出目标条件监督学习新方法,有效平衡LLM微调的成本与效果,无需外部奖励模型。
arXiv:2605.16345v1 Announce Type: new Abstract: Large language models often require fine-tuning to better align their behavior with user intent at dep…
提出目标条件监督学习新方法,有效平衡LLM微调的成本与效果,无需外部奖励模型。
arXiv:2605.16345v1 Announce Type: new Abstract: Large language models often require fine-tuning to better align their behavior with user intent at dep…
探讨如何用LLM评估人效,量化所需人类评审数量,高效平衡AI系统评估的成本与质量。
arXiv:2605.16354v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used as automated evaluators of AI systems, including in…
针对预测任务,LEAF动态基准填补了多维事件评估空白,让大模型预测能力测试更贴近现实。
arXiv:2605.16358v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly applied to forecasting. To evaluate this capability whil…
从统计物理视角分析遮蔽语言模型中Glauber动力学的混合时间,为理解MLM的采样行为提供理论依据。
arXiv:2605.16378v1 Announce Type: new Abstract: Masked language models (MLMs) define local conditional distributions over tokens but do not, in genera…
从损失到利润:揭示大规模训练LLM的成本收益最优解,为AI投入产出提供理论框架。
arXiv:2605.16430v1 Announce Type: new Abstract: Scaling LLMs requires tremendous computational resources, and recent advances in AI have gone hand in …
ICML 2026接收,提出嵌套时空时序预测新方法,实现多层次时空数据的精准建模
arXiv:2605.16447v1 Announce Type: new Abstract: Spatiotemporal forecasting is critical for real-world applications like traffic management, yet captur…
多尺度物理模拟的流匹配小波方法,高效生成高保真物理场。
arXiv:2605.16573v1 Announce Type: new Abstract: Accurate emulation of multi-scale physical systems governed by PDEs demands models that remain stable …
突破性结构感知掩码方法,提升蛋白质表征学习效能,为AI制药与蛋白质设计提供新思路
arXiv:2605.16581v1 Announce Type: new Abstract: Masked language modeling (MLM) is the standard objective for training protein language models, typical…
教你如何让小型语言模型学会判断何时该“求救”,避免盲目依赖昂贵大模型,提升Agent系统效率的突破性研究。
arXiv:2605.16604v1 Announce Type: new Abstract: Efficient agentic systems should incur expensive frontier-model costs only on decisions where a cheape…
揭秘SaaS产品与保险的结构相似性,一个商业建模新视角。
arXiv:2605.16699v1 Announce Type: new Abstract: Capped-usage SaaS products -- LLM subscriptions such as Claude Code and ChatGPT, cloud platforms such …
提出凸数据集估值方法,解决LLM后训练中数据集选择的成本与性能权衡问题
arXiv:2605.16704v1 Announce Type: new Abstract: Improving LLM performance on downstream tasks sometimes requires leveraging auxiliary datasets during …
从脑部fMRI信号解码情感描述,AI与神经科学跨界新突破。
arXiv:2605.16739v1 Announce Type: new Abstract: Decoding visual experience from brain activity has advanced substantially, but cur- rent brain-to-text…
探索混沌传播在上下文流图中的理论机制,为复杂系统建模提供新视角。
arXiv:2605.16747v1 Announce Type: new Abstract: We develop a quantitative statistical theory of transformers in the large-context regime by adopting t…
揭示RLVR训练中LLM对困难样本无法学习的反直觉现象,挑战现有认知
arXiv:2605.16787v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Reward (RLVR) has proven effective in improving Large Language …
一篇统一SFT、DAgger、离线RL和OPD视角的LLM蒸馏论文,解耦KL与轨迹,为模型优化提供新理论框架。
arXiv:2605.16826v1 Announce Type: new Abstract: Knowledge distillation is central to LLM post-training, yet its design space remains poorly understood…
面向大语言模型昂贵任务的黑盒优化基准BoLT,降低研究门槛,推动领域民主化。
arXiv:2605.17000v1 Announce Type: new Abstract: Optimization of LLM training and inference configurations, such as hyperparameters, data mixtures, and…
提出Learning-Zone Energy方法,在线选择数据以提升RL后训练效率,避免均匀分配浪费计算。
arXiv:2605.17003v1 Announce Type: new Abstract: Reinforcement Learning (RL) post-training has emerged as the dominant paradigm for eliciting mathemati…
针对RAG系统数据泄露,提出隐私政策执行(PPE)框架,用双密度估计器与嵌入融合检测非规则属性聚类。
arXiv:2605.17034v1 Announce Type: new Abstract: Standard PII filters often miss contextual data leakage in RAG systems, such as non-regulated attribut…
提出双难度感知自进化方法,解决强化学习训练数据稀缺与动态难度转移的挑战。
arXiv:2605.17037v1 Announce Type: new Abstract: Reinforcement learning (RL) has demonstrated potential for enhancing reasoning in large language model…
多智能体LLM状态协调新方案,自动读集重建无需改动SDK
arXiv:2605.17076v1 Announce Type: new Abstract: Concurrent LLM agents sharing mutable natural-language state produce Structural Race Conditions (SRCs)…