牛哥精选 · 三个月

1

🤖 AI·大模型 arXiv 机器学习 2026-07-15

Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning

ICML 2026重磅论文：进化策略替代强化学习，开创大模型微调新范式。

arXiv:2509.24372v3 Announce Type: replace Abstract: Fine-tuning large language models (LLMs) for downstream tasks is an essential stage of modern AI d…

进化策略 llm微调强化学习 icml 2026 大规模模型

2

🤖 AI 工具 IT 之家 2026-07-15

金山办公推出 AI 办公智能体灵犀专业版，支持专业 Office 操作

金山WPS AI灵犀专业版，以项目为中心管理上下文，自主学习写作风格，支持专业Office操作，让办公更智能高效。

IT之家 7 月 15 日消息，在今日举行的金山办公 WPS AI 生产力发布会上，金山办公发布了新一代办公智能体新品 —— 灵犀专业版。该产品主打个人专业办公助理，能够理解用户个人工作上下文，联动调用 WPS 文档内核与各类外部工具，最终将各类指令转化为可编辑、可追溯、支持多人协作的实体办公成…

金山办公推出办公智能体灵犀专业版支持专业操作

3

📝 深度技术 arXiv 计算机视觉 2026-07-15

Exact and Calibrated Diffusion Reconstruction for Digital Breast Tomosynthesis

扩散模型助力数字乳腺断层合成，解决有限角度重建难题，突破98%未测量空间限制

arXiv:2607.12937v1 Announce Type: cross Abstract: Limited-angle digital breast tomosynthesis (DBT) reconstructs a volume from a few low-dose projectio…

扩散模型医学影像重建数字乳腺断层合成有限角度ct 深度学习

4

🤖 AI·大模型 arXiv NLP 2026-07-15

Catalyst-Agent: Autonomous heterogeneous catalyst screening with an LLM Agent

用LLM代理自主筛选电催化剂，MLIPs和图神经网络助力，加速新材料发现。

arXiv:2603.01311v3 Announce Type: replace Abstract: The discovery of catalysts for electrochemical applications such as the oxygen reduction reaction …

催化剂筛选 llm agent 自主实验电催化机器学习势

5

🤖 AI·大模型 arXiv 机器学习 2026-07-15

SlimPer: Make Personalization Model Slim and Smart

SlimPer提出让个性化模型精简且智能的新方法，兼顾性能与效率，值得关注。

arXiv:2607.12281v1 Announce Type: cross Abstract: Transformer-style architectures are increasingly adopted for industrial recommendation systems, yet …

slimper 个性化模型模型压缩高效推理深度学习

6

🤖 AI·大模型 Dev.to 2026-07-15

用FROST家族治理模型，设计一个AI学习共同体

用FROST家族治理模型，设计一个"AI学习共同体" 作者：神通说日期：2026-07-15 主题：双项目联动 | 周三轮换阅读时间：10分钟缘起：一个人做培训，最怕什么？最近我在做一个训练营项目——"破局·动态能力生长实战营"。一个人做培训，听起来很美好，但实际操作起来，光是每…

家族治理模型设计一个学习共同体

7

📝 深度技术 arXiv 机器学习 2026-07-14

FineInstructions: Scaling Synthetic Instructions to Pre-Training Scale

通过合成指令数据扩展预训练规模，突破传统监督训练数据瓶颈的新方法

arXiv:2601.22146v2 Announce Type: replace-cross Abstract: Due to limited supervised training data, large language models (LLMs) are typically pre-trai…

合成数据指令微调预训练大语言模型自监督学习

8

📝 深度技术 arXiv AI 2026-07-14

Beyond Na\"ive Prompting: Strategies for Improved Context-aided Forecasting with LLMs

突破简单提示，提出改进上下文辅助预测的新策略，让LLM预测更精准

arXiv:2508.09904v3 Announce Type: replace-cross Abstract: Real-world forecasting requires models to integrate not only historical data but also releva…

llm 预测提示工程上下文学习策略优化

9

🤖 AI·大模型 ByteByteGo 2026-07-14

How LLMs Learn to Be Helpful (RLHF vs DPO)

一文对比RLHF与DPO两种主流大模型训练方法的核心差异与适用场景

In this article, we will look at how that learning actually happens, starting with why instruction-following alone falls short, then walking through t…

rlhf dpo 大模型训练人类反馈强化学习

10

📝 深度技术 arXiv NLP 2026-07-14

Production and Perception in LLMs: A Token Probability Approach

从token概率切入，揭示大模型生成与感知的深层机制，理论价值极高。

arXiv:2607.11703v1 Announce Type: new Abstract: The asymmetry between language production and perception has been well-documented in psycholinguistics…

llm token概率生产与感知深度学习

11

🤖 AI·大模型 arXiv 机器学习 2026-07-14

LLM-PDESR: Robust PDE Discovery via Subdomain Weighted Residuals and LLM-Guided Symbolic Hypothesis Generation

AI引导符号假设生成，子域加权残差法让噪声数据中偏微分方程发现更鲁棒精准

arXiv:2607.10546v1 Announce Type: new Abstract: Discovering governing partial differential equations (PDEs) from noisy observational data is a fundame…

llm pde发现符号回归子域加权残差科学机器学习

12

📝 深度技术 arXiv AI 2026-07-14

ARMOR: Stabilizing On-Policy LLM RL with Off-Policy Anchor Samples

提出ARMOR方法，用离策锚点样本稳定在策强化学习训练大语言模型，解决训练震荡难题

arXiv:2607.10481v1 Announce Type: cross Abstract: Reinforcement learning (RL) has significantly enhanced the reasoning capabilities of large language …

armor 强化学习 llm 离策锚点在策学习

13

🤖 AI·大模型 arXiv AI 2026-07-14

Reinforcement Learning with Verifiable Physics: Post-training LLMs with Continuous Rewards

巧用物理规则为LLM提供连续奖励信号，让强化学习后训练更可解释、更高效

arXiv:2607.10474v1 Announce Type: cross Abstract: Partial differential equations (PDEs) are foundational to modeling in science and engineering, but c…

强化学习可验证物理连续奖励 llm后训练物理驱动ai

14

🤖 AI·大模型 arXiv AI 2026-07-14

Agentic Context Learning with Self-Discovered Specification

让AI自动发现任务规范，在上下文中动态学习，实现更强自主推理能力。

arXiv:2607.09794v1 Announce Type: new Abstract: Context learning is an emerging inference-time task where LLMs must learn and apply novel, task-specif…

agentic ai 上下文学习自我发现规范大模型推理自主智能

15

🤖 AI·大模型 arXiv AI 2026-07-14

Inside the Unfair Judge: A Mechanistic Interpretability Account of LLM-as-Judge Bias

用机械可解释性方法解剖LLM作为评判者时的内在偏见，揭示不公平机制根源

arXiv:2607.11871v1 Announce Type: cross Abstract: Existing studies of LLM-as-judge scoring bias work predominantly at the input-output level: they per…

llm评判机械可解释性偏见分析深度学习可解释性模型公平性

16

📝 深度技术 arXiv AI 2026-07-14

Enhancing LLMs through human feedback: a journey towards self-improvement

从人类反馈到大模型自我进化，看最新研究成果如何用反馈驱动LLM性能跃升。

arXiv:2607.11267v1 Announce Type: cross Abstract: In the rapidly evolving landscape of information retrieval systems, the ability to adapt and improve…

llm 人类反馈自我改进强化学习 ai对齐

17

🔓 开源项目 Hacker News LLM 2026-07-14

Mnemo AI – Local agentic assistant for any LLM that learns from its failures

本地AI代理助手，能从失败中学习，支持多种大模型，快速搜索内容，让LLM越用越聪明。

Article URL: https://github.com/brunopistone/mnemoai Comments URL: https://news.ycombinator.com/item?id=48906032 Points: 2 # Comments: 0

本地ai 代理助手失败学习开源项目 llm

18

🤖 AI 工具 Hacker News Show 2026-07-14

Show HN: We Built a Chat of Stanford's CS229 Course Notes

将斯坦福CS229完整课程笔记打造为可搜索知识库，支持引用溯源，让技术文档探索更高效

CS229 is one of the most widely referenced machine learning courses. I turned the complete course notes into a searchable knowledge base with citation…

斯坦福cs229 机器学习课程笔记搜索知识库

19

📝 深度技术 arXiv 计算机视觉 2026-07-14

EvoGuard: An Extensible Agentic RL-based Framework for Practical and Evolving AI-Generated Image Detection

一种基于智能体强化学习的可扩展框架，让AI生成图像检测能随新攻击方式持续进化。

arXiv:2603.17343v2 Announce Type: replace Abstract: The rapid proliferation of AI-Generated Images (AIGIs) poses severe misinformation risks, making A…

ai图像检测强化学习智能体框架可扩展性 arxiv论文

20

📝 深度技术 arXiv 机器学习 2026-07-14

Diversified Multinomial Logit Contextual Bandits

提出一种结合多项logit选择模型与多样性的上下文老虎机算法，在推荐系统中实现探索与多样性的平衡优化

arXiv:2607.11684v1 Announce Type: cross Abstract: Existing contextual multinomial logit (MNL) bandits model relevance-driven choice but ignore the pot…

多臂老虎机上下文老虎机多项logit 多样化推荐系统

🐂 牛哥精选

Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning

金山办公推出 AI 办公智能体灵犀专业版，支持专业 Office 操作

Exact and Calibrated Diffusion Reconstruction for Digital Breast Tomosynthesis

Catalyst-Agent: Autonomous heterogeneous catalyst screening with an LLM Agent

SlimPer: Make Personalization Model Slim and Smart

用FROST家族治理模型，设计一个AI学习共同体

FineInstructions: Scaling Synthetic Instructions to Pre-Training Scale

Beyond Na\"ive Prompting: Strategies for Improved Context-aided Forecasting with LLMs

How LLMs Learn to Be Helpful (RLHF vs DPO)

Production and Perception in LLMs: A Token Probability Approach

LLM-PDESR: Robust PDE Discovery via Subdomain Weighted Residuals and LLM-Guided Symbolic Hypothesis Generation

ARMOR: Stabilizing On-Policy LLM RL with Off-Policy Anchor Samples

Reinforcement Learning with Verifiable Physics: Post-training LLMs with Continuous Rewards

Agentic Context Learning with Self-Discovered Specification

Inside the Unfair Judge: A Mechanistic Interpretability Account of LLM-as-Judge Bias

Enhancing LLMs through human feedback: a journey towards self-improvement

Mnemo AI – Local agentic assistant for any LLM that learns from its failures

Show HN: We Built a Chat of Stanford's CS229 Course Notes

EvoGuard: An Extensible Agentic RL-based Framework for Practical and Evolving AI-Generated Image Detection

Diversified Multinomial Logit Contextual Bandits

📅 日期