Unified Data Selection for LLM Reasoning
提出统一数据选择框架,为LLM推理任务高效筛选高质量训练数据,显著提升推理能力。
arXiv:2605.22389v1 Announce Type: new Abstract: Effectively training Large Language Models (LLMs) for complex, long-CoT reasoning is often bottlenecke…
提出统一数据选择框架,为LLM推理任务高效筛选高质量训练数据,显著提升推理能力。
arXiv:2605.22389v1 Announce Type: new Abstract: Effectively training Large Language Models (LLMs) for complex, long-CoT reasoning is often bottlenecke…
用国际象棋棋局拆解大模型的记忆与推理边界,揭示模型何时是在背诵、何时真的在推演。
arXiv:2601.16823v2 Announce Type: replace Abstract: Large Language Models (LLMs) exhibit remarkable capabilities, yet it remains unclear to what exten…
提出TSR轨迹搜索展开方法,精准提升LLM Agent在多轮交互中的强化学习表现
arXiv:2602.11767v3 Announce Type: replace-cross Abstract: Advances in large language models (LLMs) are driving a shift toward using reinforcement lear…
探讨强化学习能否教会大模型长程推理,关键在于表达力,为LLM能力扩展提供新视角。
arXiv:2605.06638v3 Announce Type: replace-cross Abstract: Reinforcement learning (RL) has been applied to improve large language model (LLM) reasoning…
探索LLM在语言学奥林匹克谜题上的解谜与出题能力,拓展现有基准,测试最新模型。
arXiv:2509.21820v2 Announce Type: replace Abstract: In this paper, we introduce a combination of novel and exciting tasks: the solution and generation…
探索强化学习与可验证奖励在知识密集型领域对LLM推理能力的提升,填补研究空白。
arXiv:2605.18261v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has demonstrated promising potential to enhance …
揭示RLVR训练中LLM对困难样本无法学习的反直觉现象,挑战现有认知
arXiv:2605.16787v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Reward (RLVR) has proven effective in improving Large Language …
OpenAI最强编程模型GPT-5.3-Codex发布,融合前沿编码与推理能力,效率惊人
GPT‑5.3-Codex is the most capable agentic coding model to date, combining the frontier coding performance of GPT‑5.2-Codex with the reasoning and prof…
论文提出on-policy self-distillation方法,在不牺牲推理能力的前提下降低LLM安全对齐中的“安全税”。
arXiv:2605.15239v1 Announce Type: new Abstract: Safety alignment often improves robustness to harmful queries at the cost of reasoning ability, a trad…
OpenAI 分享如何训练大语言模型进行推理,揭示思维链背后的关键方法。
论文提出一种基于智能体进化的方法,显著提升大语言模型在竞赛编程中的推理表现。
arXiv:2605.15301v1 Announce Type: new Abstract: Large language models (LLMs) still struggle with the rigorous reasoning demands of hard competitive pr…