牛哥精选 · 一年

1

📝 深度技术 arXiv AI 2026-05-20

The Evaluation Trap: Benchmark Design as Theoretical Commitment

AI基准测试暗藏理论假设，窄化进步定义，警惕评估陷阱重塑能力概念

arXiv:2605.14167v1 Announce Type: new Abstract: Every AI benchmark operationalizes theoretical assumptions about the capability it claims to assess. W…

ai基准测试理论假设评估陷阱范式固化进步窄化

2

💰 商业科技 arXiv AI 2026-05-20

Intelligence Impact Quotient (IIQ): A Framework for Measuring Organizational AI Impact

首个衡量组织AI整合深度与影响力的复合指标IIQ，超越简单访问量，为AI落地提供量化评估新思路。

arXiv:2605.14455v1 Announce Type: new Abstract: The Intelligence Impact Quotient (IIQ) is a composite metric intended to quantify the depth to which A…

ai影响组织ai 指标框架量化评估商业科技

3

📝 深度技术 arXiv AI 2026-05-20

LeanSearch v2: Global Premise Retrieval for Lean 4 Theorem Proving

LeanSearch v2提出全局前提检索，一次性找出Lean 4定理所需全部引理，突破现有单步或语义匹配局限。

arXiv:2605.13137v2 Announce Type: replace-cross Abstract: Proving theorems in Lean 4 often requires identifying a scattered set of library lemmas whos…

lean 4 定理证明前提检索引理选择全局搜索

4

🤖 AI·大模型 Tailwind CSS Blog 2026-05-20

Tailwind UI is now Tailwind Plus

Tailwind UI 全面升级为 Tailwind Plus，保留终身买断制，还计划新增 Tailwind Play 账户等独家功能。

We just shipped a huge rebrand project, turning what was previously known as Tailwind UI into Tailwind Plus. Tailwind Plus is the all same high-qualit…

tailwind p 品牌升级 ui组件库前端开发终身访问

5

📝 深度技术 arXiv 机器学习 2026-05-20

A More Word-like Image Tokenization for MLLMs

让图像分词更接近文本语义，提出新方法优化多模态大语言模型的融合效果。

arXiv:2605.17954v1 Announce Type: cross Abstract: Modern multimodal large language models (MLLMs) typically keep the language model fixed and train a …

多模态大语言模型图像分词 tokenizati 视觉语义对齐计算机视觉

6

🤖 AI·大模型 arXiv 机器学习 2026-05-20

Improving MLLM Training Efficiency via Stage-Aware Sparsity

多模态大模型训练新范式：阶段感知稀疏性动态消除冗余，大幅提升效率而保持性能。

arXiv:2509.18150v2 Announce Type: replace Abstract: Multimodal Large Language Models (MLLMs) have demonstrated outstanding performance across a variet…

mllm 训练效率稀疏性多模态阶段感知

7

📝 深度技术 arXiv 机器学习 2026-05-20

Prior Knowledge Makes It Possible: From Sublinear Graph Algorithms to LLM Test-Time Methods

将次线性图算法的先验知识理论引入LLM测试时优化，开辟AI效率提升新路径。

arXiv:2510.16609v3 Announce Type: replace Abstract: Test-time augmentation, such as Retrieval-Augmented Generation (RAG) or tool use, critically depen…

次线性图算法先验知识 llm测试时方法理论交叉算法优化

8

📝 深度技术 arXiv 机器学习 2026-05-20

Self-Improving Tabular Language Models via Iterative Reward-Guided Post-Training

用迭代奖励引导后训练，让表格语言模型也能自我进化、持续提升性能。

arXiv:2604.18966v2 Announce Type: replace Abstract: Tabular language models can generate synthetic tables by modeling rows as token sequences, but the…

表格语言模型自我改进奖励引导后训练迭代优化

9

🤖 AI·大模型 arXiv NLP 2026-05-20

FOL2NS: Generating Natural Sentences from First-Order Logic

神经符号框架，将一阶逻辑自动转化为自然语言语句，革新语义解析与定理验证

arXiv:2605.18155v1 Announce Type: new Abstract: Translating formal language into natural language is a foundational challenge in NLP, driving various …

神经符号框架一阶逻辑自然语言生成语义解析定理验证

10

📝 深度技术 arXiv NLP 2026-05-20

Code as Agent Harness

一篇探讨将代码作为智能体（Agent）驱动框架的前沿论文，为AI代理开发提供新思路与理论基础。

arXiv:2605.18747v1 Announce Type: new Abstract: Recent large language models (LLMs) have demonstrated strong capabilities in understanding and generat…

智能体代码驱动 ai框架论文 arxiv

11

📝 深度技术 arXiv NLP 2026-05-20

Trustworthiness in Retrieval-Augmented Generation Systems: A Survey

综述RAG系统可信度挑战，涵盖事实性、鲁棒性与公平性等关键维度。

arXiv:2409.10102v2 Announce Type: replace-cross Abstract: Retrieval-Augmented Generation (RAG) has quickly grown into a pivotal paradigm in the develo…

rag 可信度综述大模型事实性

12

📝 深度技术 arXiv 计算机视觉 2026-05-20

GSMap: 2D Gaussians for Online HD Mapping

用2D高斯函数做在线高精地图，新方法GSMap兼顾速度与精度

arXiv:2605.09619v2 Announce Type: replace Abstract: Accurate High-Definition (HD) map construction is critical for autonomous driving, yet existing me…

gsmap 2d高斯在线高精地图自动驾驶地图构建

13

🤖 AI·大模型 OpenAI 官方博客 2026-05-20

New GPT-3 capabilities: Edit & insert

GPT-3和Codex新增编辑与插入功能，不再局限于续写，文本交互更灵活。

We’ve released new versions of GPT-3 and Codex which can edit or insert content into existing text, rather than just completing existing text.

gpt-3 codex 文本编辑文本插入 openai

14

🚀 产品观察 OpenAI 官方博客 2026-05-20

AI and efficiency

OpenAI分析揭示：神经网路训练效率每16个月翻倍，远超市摩尔定律，AI算力成本已降至44倍以下。

We’re releasing an analysis showing that since 2012 the amount of compute needed to train a neural net to the same performance on ImageNet classificat…

ai效率算法进步计算成本神经网络 openai

15

🤖 AI·大模型 OpenAI 官方博客 2026-05-20

Improving verifiability in AI development

58位作者、30家机构联合发布报告，提出10种机制来增强AI系统安全、公平、隐私方面的可验证性。

We’ve contributed to a multi-stakeholder report by 58 co-authors at 30 organizations, including the Centre for the Future of Intelligence, Mila, Schwa…

ai安全可验证性多利益相关者机制 openai

16

📝 深度技术 OpenAI 官方博客 2026-05-20

Fine-tuning GPT-2 from human preferences

OpenAI分享用人类反馈微调GPT-2（774M参数）的实践，发现模型学会复制原文来迎合标注者偏好，揭示了偏好对齐中的反直觉现象。

We’ve fine-tuned the 774M parameter GPT-2 language model using human feedback for various tasks, successfully matching the preferences of the external…

gpt-2 人类反馈微调偏好学习摘要任务 openai

17

🤖 AI·大模型 OpenAI 官方博客 2026-05-20

Why responsible AI development needs cooperation on safety

OpenAI发布政策研究论文，提出四种策略促进AI安全合作，应对竞争压力下的集体行动困境

We’ve written a policy research paper identifying four strategies that can be used today to improve the likelihood of long-term industry cooperation o…

ai安全行业合作政策研究 openai 安全规范

18

📝 深度技术 OpenAI 官方博客 2026-05-20

AI safety needs social scientists

OpenAI发文论证，长期AI安全研究亟需社会科学家参与，以解决人类心理、偏见与理性不确定性，促进ML与社科跨界协作。

We’ve written a paper arguing that long-term AI safety research needs social scientists to ensure AI alignment algorithms succeed when actual humans a…

ai安全社会科学家对齐算法人类价值观跨学科合作

19

📝 深度技术 OpenAI 官方博客 2026-05-20

AI safety via debate

OpenAI提出用辩论机制训练AI安全，让智能体互辩、人类判胜负，创新思路令人耳目一新

We’re proposing an AI safety technique which trains agents to debate topics with one another, using a human to judge who wins.

ai安全辩论训练人类评判 openai

20

📝 深度技术 OpenAI 官方博客 2026-05-20

Gotta Learn Fast: A new benchmark for generalization in RL

OpenAI发布强化学习泛化新基准，加速AI在复杂环境中的快速适应能力。

强化学习泛化基准测试 openai ai研究

🐂 牛哥精选

The Evaluation Trap: Benchmark Design as Theoretical Commitment

Intelligence Impact Quotient (IIQ): A Framework for Measuring Organizational AI Impact

LeanSearch v2: Global Premise Retrieval for Lean 4 Theorem Proving

Tailwind UI is now Tailwind Plus

A More Word-like Image Tokenization for MLLMs

Improving MLLM Training Efficiency via Stage-Aware Sparsity

Prior Knowledge Makes It Possible: From Sublinear Graph Algorithms to LLM Test-Time Methods

Self-Improving Tabular Language Models via Iterative Reward-Guided Post-Training

FOL2NS: Generating Natural Sentences from First-Order Logic

Code as Agent Harness

Trustworthiness in Retrieval-Augmented Generation Systems: A Survey

GSMap: 2D Gaussians for Online HD Mapping

New GPT-3 capabilities: Edit & insert

AI and efficiency

Improving verifiability in AI development

Fine-tuning GPT-2 from human preferences

Why responsible AI development needs cooperation on safety

AI safety needs social scientists

AI safety via debate

Gotta Learn Fast: A new benchmark for generalization in RL

📅 日期