牛哥精选 · 半年

📋 全部 🤖 AI·大模型 ⚡ 效率工具 📝 深度技术 🚀 产品观察 💰 商业科技 🔓 开源项目 🎨 设计创意 📖 阅读推荐 🏷 资源合集 🌱 成长效率

🤖 AI·大模型 arXiv 机器学习 2026-07-10

TTHE: Test-Time Harness Evolution

测试时计算新范式：TTHE方法让模型在推理阶段自我进化，无需重新训练。

arXiv:2607.08124v1 Announce Type: cross Abstract: The behavior of an LLM agent is determined not only by the underlying model, but also by its harness…

测试时计算模型进化推理优化 ai前沿

📝 深度技术 arXiv AI 2026-07-01

What Drives Interactive Improvement from Feedback?

研究自然语言反馈是否真比单纯重复尝试更有效，揭示多轮语言智能体性能提升的真实驱动力。

arXiv:2606.30774v1 Announce Type: new Abstract: We study when natural-language feedback produces improvement beyond the gains obtainable from repeated…

自然语言反馈多轮交互语言智能体测试时计算改进机制

🤖 AI·大模型 arXiv AI 2026-06-17

How Inference Compute Shapes Frontier LLM Evaluation

前沿论文揭示推理计算如何成为衡量LLM性能的关键，测试时计算分配正重塑评估标准。

arXiv:2606.17930v1 Announce Type: new Abstract: AI evaluations are shifting toward harder tasks that benefit from longer trajectories involving tool u…

推理计算大模型评估测试时计算工具使用迭代求解

🤖 AI·大模型 arXiv AI 2026-06-16

Less is More: Improving LLM Reasoning with Minimal Test-Time Intervention

揭秘LLM推理效率新思路：最小干预即可显著提升性能，少即是多！

arXiv:2510.13940v4 Announce Type: replace-cross Abstract: Recent progress in large language models (LLMs) has focused on test-time scaling to improve …

llm推理测试时干预不确定性推理效率 less is mo

📝 深度技术 arXiv 机器学习 2026-06-01

Diversity Matters: Revisiting Test-Time Compute in Vision-Language Models

这篇论文重新审视了视觉语言模型中测试时计算的关键因素，揭示多样性对模型推理能力的重要影响。

arXiv:2605.30713v1 Announce Type: new Abstract: Test-time compute (TTC) strategies have emerged as a lightweight approach to boost reasoning in large …

视觉语言模型测试时计算多样性机器学习推理能力

🤖 AI·大模型 arXiv AI 2026-05-19

OpenDeepThink: Parallel Reasoning via Bradley--Terry Aggregation

提出OpenDeepThink方法，用Bradley-Terry模型实现并行推理，无需真实验证器即可筛选最佳候选，为LLM推理扩展新路径。

arXiv:2605.15177v1 Announce Type: new Abstract: Test-time compute scaling is a primary axis for improving LLM reasoning. Existing methods primarily sc…

llm推理并行推理 bradley-te 测试时计算模型聚合

📅 日期

2026-05-20 2026-05-19