Unified Data Selection for LLM Reasoning
提出统一数据选择框架,为LLM推理任务高效筛选高质量训练数据,显著提升推理能力。
arXiv:2605.22389v1 Announce Type: new Abstract: Effectively training Large Language Models (LLMs) for complex, long-CoT reasoning is often bottlenecke…
提出统一数据选择框架,为LLM推理任务高效筛选高质量训练数据,显著提升推理能力。
arXiv:2605.22389v1 Announce Type: new Abstract: Effectively training Large Language Models (LLMs) for complex, long-CoT reasoning is often bottlenecke…
一份超越RLHF的统一对齐理论框架,抽象形式化多种对齐算法并揭示内在联系,为AI安全提供新视角。
arXiv:2506.01523v2 Announce Type: replace Abstract: Alignment via reinforcement learning from human feedback (RLHF) has become the dominant paradigm f…
一篇统一SFT、DAgger、离线RL和OPD视角的LLM蒸馏论文,解耦KL与轨迹,为模型优化提供新理论框架。
arXiv:2605.16826v1 Announce Type: new Abstract: Knowledge distillation is central to LLM post-training, yet its design space remains poorly understood…
提出将参数化动作分布视为动作的新型强化学习框架,统一离散、连续与混合动作空间,简化智能体设计。
arXiv:2506.16608v3 Announce Type: replace-cross Abstract: We introduce a novel reinforcement learning (RL) framework that treats parameterized action …
通过条件信息瓶颈将推理中的预算强制统一为压缩问题,揭示推理与信息论的内在联系。
arXiv:2603.08462v2 Announce Type: replace Abstract: \ac{CoT} prompting improves LLM accuracy on complex tasks but often increases token usage and infe…
从理论视角统一分数模型与漂移模型,揭示核诱导均值漂移差异的关键机制。
arXiv:2603.07514v3 Announce Type: replace-cross Abstract: Drifting models train one-step generators by optimizing a kernel-induced mean-shift discrepa…