牛哥精选 · 本月

📋 全部 🤖 AI·大模型 ⚡ 效率工具 📝 深度技术 🚀 产品观察 💰 商业科技 🔓 开源项目 🎨 设计创意 📖 阅读推荐 🏷 资源合集 🌱 成长效率

📝 深度技术 arXiv NLP 2026-05-22

Unified Data Selection for LLM Reasoning

提出统一数据选择框架，为LLM推理任务高效筛选高质量训练数据，显著提升推理能力。

arXiv:2605.22389v1 Announce Type: new Abstract: Effectively training Large Language Models (LLMs) for complex, long-CoT reasoning is often bottlenecke…

llm推理数据选择统一框架人工智能大语言模型

📝 深度技术 arXiv NLP 2026-05-20

Disentangling generalization and memorization in large language models using chess

用国际象棋棋局拆解大模型的记忆与推理边界，揭示模型何时是在背诵、何时真的在推演。

arXiv:2601.16823v2 Announce Type: replace Abstract: Large Language Models (LLMs) exhibit remarkable capabilities, yet it remains unclear to what exten…

大语言模型泛化与记忆国际象棋推理能力受控测试

🤖 AI·大模型 arXiv 机器学习 2026-05-20

TSR: Trajectory-Search Rollouts for Multi-Turn RL of LLM Agents

提出TSR轨迹搜索展开方法，精准提升LLM Agent在多轮交互中的强化学习表现

arXiv:2602.11767v3 Announce Type: replace-cross Abstract: Advances in large language models (LLMs) are driving a shift toward using reinforcement lear…

llm agent 轨迹搜索多轮强化学习规划搜索展开

📝 深度技术 arXiv NLP 2026-05-20

Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key

探讨强化学习能否教会大模型长程推理，关键在于表达力，为LLM能力扩展提供新视角。

arXiv:2605.06638v3 Announce Type: replace-cross Abstract: Reinforcement learning (RL) has been applied to improve large language model (LLM) reasoning…

强化学习大语言模型推理能力长时推理表达力

🤖 AI·大模型 arXiv NLP 2026-05-20

Can LLMs Generate and Solve Linguistic Olympiad Puzzles?

探索LLM在语言学奥林匹克谜题上的解谜与出题能力，拓展现有基准，测试最新模型。

arXiv:2509.21820v2 Announce Type: replace Abstract: In this paper, we introduce a combination of novel and exciting tasks: the solution and generation…

llm 语言奥林匹克谜题生成谜题解题推理能力

📝 深度技术 arXiv NLP 2026-05-20

Knowledge-to-Verification: Exploring RLVR for LLMs in Knowledge-Intensive Domains

探索强化学习与可验证奖励在知识密集型领域对LLM推理能力的提升，填补研究空白。

arXiv:2605.18261v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has demonstrated promising potential to enhance …

rlvr llm 知识密集型强化学习可验证奖励

📝 深度技术 arXiv 机器学习 2026-05-20

The Unlearnability Phenomenon in RLVR for Language Models

揭示RLVR训练中LLM对困难样本无法学习的反直觉现象，挑战现有认知

arXiv:2605.16787v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Reward (RLVR) has proven effective in improving Large Language …

rlvr 不可学习性语言模型强化学习推理能力

🤖 AI·大模型 OpenAI 官方博客 2026-05-19

GPT-5.3-Codex System Card

OpenAI最强编程模型GPT-5.3-Codex发布，融合前沿编码与推理能力，效率惊人

GPT‑5.3-Codex is the most capable agentic coding model to date, combining the frontier coding performance of GPT‑5.2-Codex with the reasoning and prof…

gpt-5.3-co 编程模型智能代理推理能力 openai

📝 深度技术 arXiv 机器学习 2026-05-19

Reducing the Safety Tax in LLM Safety Alignment with On-Policy Self-Distillation

论文提出on-policy self-distillation方法，在不牺牲推理能力的前提下降低LLM安全对齐中的“安全税”。

arXiv:2605.15239v1 Announce Type: new Abstract: Safety alignment often improves robustness to harmful queries at the cost of reasoning ability, a trad…

llm安全对齐 on-policy自安全税分布不匹配推理能力

🤖 AI·大模型 OpenAI 官方博客 2026-05-19

Learning to reason with LLMs

OpenAI 分享如何训练大语言模型进行推理，揭示思维链背后的关键方法。

llm推理 openai 大模型推理能力

📝 深度技术 arXiv AI 2026-05-19

Solvita: Enhancing Large Language Models for Competitive Programming via Agentic Evolution

论文提出一种基于智能体进化的方法，显著提升大语言模型在竞赛编程中的推理表现。

arXiv:2605.15301v1 Announce Type: new Abstract: Large language models (LLMs) still struggle with the rigorous reasoning demands of hard competitive pr…

llm 竞赛编程智能体进化推理能力多智能体框架

📅 日期

2026-05-20 2026-05-19

🐂 牛哥精选

📅 日期