牛哥精选 · 本月

1

🔐 安全/认证 IT 之家 2026-06-13

小米雷军总结 YU7 八大环节测试直播：实际过程非常顺利，目前整个测试团队有 800 多人

IT之家 6 月 13 日消息，小米董事长兼 CEO 雷军今天在盐城试验场完成了一场 YU7 测试直播，随后他发布长文，对本次直播进行了总结。雷军表示，今天的测试共有 8 大环节、26 个项目、11 项挑战，原计划 7 个小时，实际测试过程非常顺利，只用了 5 个半小时就全部完成。测试车型是 …

小米雷军总结八大环节测试直播实际过程非常顺利

2

🤖 AI·大模型 arXiv AI 2026-06-12

PaLMR: Towards Faithful Visual Reasoning via Multimodal Process Alignment

PaLMR通过多模态过程对齐实现可信视觉推理，提升大模型对图像的理解与逻辑一致性。

arXiv:2603.06652v2 Announce Type: replace-cross Abstract: Reinforcement learning has recently improved the reasoning ability of Large Language Models …

视觉推理多模态对齐可信ai 大语言模型过程监督

3

🚀 产品观察量子位 2026-06-05

WPS笔记正式发布：AI贯穿记录、整理与复用全过程

AI原生多模态笔记WPS发布，支持语音图片文字等多方式录入，智能检索让信息“找得到”，效率提升显著。

AI笔记不是聊天框，而是信息入口。

笔记正式发布贯穿记录整理与复用全过程 wps笔记

4

🎨 设计创意 Hacker News Show 2026-06-04

Show HN: Realistic procedural fire effect via naive algorithm

仅用朴素算法实现的逼真火焰特效演示，视觉效果惊艳且代码开源可学习。

Source code: https://github.com/Leftium/fx/blob/main/src/routes/fire-plas... I made this naive fire effect as realistic as possible; arguably more rea…

程序化生成火焰效果图形学算法演示实时渲染

5

🤖 AI·大模型 arXiv 机器学习 2026-06-02

Off-the-Shelf LLMs as Process Scorers: Training-Free Alternative to PRMs for Mathematical Reasoning

无需额外训练，用现成大模型就能给数学推理过程打分，性能媲美专用过程奖励模型。

arXiv:2606.01682v1 Announce Type: cross Abstract: Selecting the best response from multiple small-model samples using a stronger scorer is a simple in…

llm 过程评分数学推理 prm 训练免费

6

📝 深度技术 arXiv AI 2026-05-29

DeepTool: Scaling Interleaved Deliberation in Tool-Integrated Reasoning via Process-Supervised Reinforcement Learning

DeepTool通过过程监督强化学习实现工具集成推理中的交错深思，提升LLM在策略规划与自纠正上的表现。

arXiv:2605.29568v1 Announce Type: new Abstract: Tool-Integrated Reasoning (TIR) extends LLM capabilities by leveraging external environments. However,…

deeptool 工具集成推理过程监督强化学习 llm 自纠正

7

📝 深度技术 arXiv AI 2026-05-28

Verifiable Process Rewards for Agentic Reasoning

提出可验证过程奖励机制，让智能体推理更可信可解释，强化学习新思路。

arXiv:2605.10325v2 Announce Type: replace Abstract: Reinforcement learning from verifiable rewards (RLVR) has improved the reasoning abilities of larg…

可验证过程奖励智能体推理强化学习推理可靠性奖励模型

8

📝 深度技术 arXiv AI 2026-05-26

Understanding and Mitigating Premature Confidence for Better LLM Reasoning

最新研究揭示LLM长思维链中“过早自信”导致的逻辑缺口，并提出基于过程奖励模型的缓解策略，提升推理质量。

arXiv:2605.24396v1 Announce Type: new Abstract: Long chains of thought (CoT) from current language models frequently contain logical gaps and unjustif…

llm推理过早自信思维链过程奖励模型逻辑缺口

9

📝 深度技术 arXiv 机器学习 2026-05-26

A Contractive Feedback Semantics for Reinforcement Learning

从封闭马尔可夫决策过程跳脱，用组合视角打造强化学习的新型收缩反馈语义。

arXiv:2605.24759v1 Announce Type: new Abstract: Discounted reinforcement learning is usually presented through Bellman equations on closed Markov deci…

强化学习收缩反馈语义学马尔可夫决策过程组合视角

10

📝 深度技术 arXiv NLP 2026-05-25

HawkesLLM: Semantic Uncertainty Propagation in Agentic Text Simulation

Hawkes过程结合LLM，让智能体文本模拟中的语义不确定性传播更精准可控。

arXiv:2605.23043v1 Announce Type: new Abstract: Agentic text-simulation systems write in sequence, with each item becoming possible context for later …

大模型不确定性传播语义模拟智能体 hawkes过程

11

📝 深度技术 arXiv 机器学习 2026-05-23

VRPRM: Process Reward Modeling via Visual Reasoning

通过视觉推理提升过程奖励建模精度，为复杂任务训练提供新思路。

arXiv:2508.03556v3 Announce Type: replace Abstract: Process Reward Model (PRM) is widely used in the post-training of Large Language Model (LLM) becau…

过程奖励模型视觉推理多步推理奖励信号模型训练

12

🤖 AI·大模型 arXiv 机器学习 2026-05-21

rePIRL: Learn PRM with Inverse RL for LLM Reasoning

用逆强化学习从推理轨迹中自动学习过程奖励模型，有效提升大语言模型的复杂推理能力。

arXiv:2602.07832v2 Announce Type: replace Abstract: Process rewards have been widely used in deep reinforcement learning to improve training efficienc…

逆强化学习过程奖励模型大语言模型推理 prm

13

🚀 产品观察 GitHub Blog 2026-05-21

Investigating unauthorized access to GitHub-owned repositories

GitHub首席安全官亲述内部仓库遭非法入侵的完整调查过程

If any impact is discovered, customers will be notified via established incident response and notification channels. The post Investigating unauthoriz…

github 安全事件未授权访问内部仓库事件调查

14

🚀 产品观察 Dev.to 2026-05-21

Does AI Know How Many Tokens It Is Burning

AI按token计费看似精确，但背后推理过程复杂且隐藏，揭露了费用计算的模糊性。

The strange thing about the modern AI bill is that it looks precise while the work behind it feels mysterious. A user types a short request, a model t…

ai成本 token计费模型推理缓存工具调用

15

📝 深度技术 arXiv 机器学习 2026-05-20

Mat\'ern Gaussian Processes on Graphs

图上的Matérn高斯过程：理论推导与图结构结合的创新方法，为图数据建模提供新视角。

arXiv:2010.15538v4 Announce Type: replace-cross Abstract: Gaussian processes are a versatile framework for learning unknown functions in a manner that…

matérn高斯过程图论高斯过程机器学习图数据建模

16

📝 深度技术 arXiv 机器学习 2026-05-20

Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training

超越正确性：通过强化学习调和过程与结果奖励，为模型训练提供新视角

arXiv:2509.03403v2 Announce Type: replace Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) improves final-answer accuracy on reasoning …

强化学习过程奖励结果奖励 rl训练泛化

17

🎨 设计创意 Codrops 2026-05-20

Designing Against the Gallery: A Two-Year Journey to a Layered Portfolio Experience

两年打磨个人作品集，如何用分层体验打破传统画廊思维，设计创意者的灵感实录。

A two-year journey to create a layered, engaging portfolio beyond the traditional gallery.

作品集设计分层体验创意过程设计实验个人项目

18

📝 深度技术 arXiv 机器学习 2026-05-20

Variational Optimality of F\"ollmer Processes in Generative Diffusions

从变分角度揭示Föllmer过程在生成扩散中的最优性，理论数学与AI生成的深度交叉

arXiv:2602.10989v2 Announce Type: replace-cross Abstract: We construct and analyze generative diffusions that transport a point mass to a prescribed t…

föllmer过程生成扩散变分最优性数学理论

19

📝 深度技术 arXiv AI 2026-05-20

Addressing Terminal Constraints in Data-Driven Demand Response Scheduling

数据驱动方法解决化工流程需求响应调度中的终端约束难题，助力柔性操作与电网平衡。

arXiv:2605.14741v1 Announce Type: cross Abstract: Electrified chemical processes are incentivized by exposure to time-varying electricity markets to o…

需求响应终端约束数据驱动化工过程电力市场

20

📝 深度技术 arXiv 机器学习 2026-05-19

Best-of-Both-Worlds for Heavy-Tailed Markov Decision Processes

提出首个在重尾MDP上同时实现随机与对抗环境最优遗憾的BoBW算法，突破保守局限。

arXiv:2602.01295v3 Announce Type: replace Abstract: We investigate episodic Markov Decision Processes with heavy-tailed losses (HTMDPs). Existing appr…

强化学习马尔可夫决策过程重尾分布 best-of-bo 遗憾界

🐂 牛哥精选

小米雷军总结 YU7 八大环节测试直播：实际过程非常顺利，目前整个测试团队有 800 多人

PaLMR: Towards Faithful Visual Reasoning via Multimodal Process Alignment

WPS笔记正式发布：AI贯穿记录、整理与复用全过程

Show HN: Realistic procedural fire effect via naive algorithm

Off-the-Shelf LLMs as Process Scorers: Training-Free Alternative to PRMs for Mathematical Reasoning

DeepTool: Scaling Interleaved Deliberation in Tool-Integrated Reasoning via Process-Supervised Reinforcement Learning

Verifiable Process Rewards for Agentic Reasoning

Understanding and Mitigating Premature Confidence for Better LLM Reasoning

A Contractive Feedback Semantics for Reinforcement Learning

HawkesLLM: Semantic Uncertainty Propagation in Agentic Text Simulation

VRPRM: Process Reward Modeling via Visual Reasoning

rePIRL: Learn PRM with Inverse RL for LLM Reasoning

Investigating unauthorized access to GitHub-owned repositories

Does AI Know How Many Tokens It Is Burning

Mat\'ern Gaussian Processes on Graphs

Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training

Designing Against the Gallery: A Two-Year Journey to a Layered Portfolio Experience

Variational Optimality of F\"ollmer Processes in Generative Diffusions

Addressing Terminal Constraints in Data-Driven Demand Response Scheduling

Best-of-Both-Worlds for Heavy-Tailed Markov Decision Processes

📅 日期