牛哥精选 · 本月

1

🔓 开源项目 Hacker News AI 2026-06-13

Equiv, check that an AI refactor did not change what your code does

开源工具Equiv提供确定性字节级验证，确保AI重构不改变代码行为，拒绝模型主观判断。

Article URL: https://github.com/Neelagiri65/equiv Comments URL: https://news.ycombinator.com/item?id=48515830 Points: 1 # Comments: 0

ai重构代码验证确定性检查开源工具字节级比较

2

⚡ 效率工具 Hacker News AI 2026-06-10

Show HN: Guildly, a Slack like interface to run a company of AI employees

像管理Slack团队一样管理AI员工，每个AI严格遵循「调查-计划-办单-分支-PR-审查」的确定性流程，杜绝随意操作。

If you're a solo founder running 5-6 claude code terminals and manually orchestrating work between them, this is for you. Comments URL: https://news.y…

ai员工管理 slack界面确定性流程自动化协作项目规范化

3

📝 深度技术 arXiv AI 2026-06-10

Integrating Local and Global Entropy for Uncertainty Quantification in LLMs

结合局部与全局熵的新方法，提升大模型不确定性量化精度，值得关注。

arXiv:2606.09875v1 Announce Type: cross Abstract: Large language models hallucinate confidently, making uncertainty quantification (UQ) essential for …

llm 不确定性量化熵论文机器学习

4

🤖 AI·大模型 arXiv AI 2026-06-10

Mitigating Manifold Departure: Uncertainty-Aware Subspace Rectification for Trustworthy MLLM Decoding

ICML 2026新研究用不确定性感知子空间纠正，让多模态大模型解码更可信，有效缓解流形偏离问题。

arXiv:2606.09859v1 Announce Type: cross Abstract: MLLMs frequently hallucinate objects inconsistent with visual inputs. This issue is typically attrib…

多模态大模型不确定性量化流形学习可信解码子空间修正

5

🤖 AI·大模型 arXiv NLP 2026-06-09

SafeRun: Enabling Determinism in LLM Planning for Running

ICML 2026 workshop论文，聚焦如何让LLM在跑步规划中摆脱随机性、实现可复现的确定性输出，提升安全性与可靠性。

arXiv:2606.09027v1 Announce Type: new Abstract: Large Language Models enable flexible natural-language planning but remain unreliable in determinism-c…

llm规划确定性安全运行 icml 2026 论文

6

🔓 开源项目 Hacker News AI 2026-06-09

OxyJen v0.5: a deterministic graph runtime for AI workflows

OxyJen v0.5：为AI工作流打造的确定性图运行时，强调可靠执行而非与LangChain4j竞争。

Article URL: https://github.com/11divyansh/OxyJen Comments URL: https://news.ycombinator.com/item?id=48456722 Points: 1 # Comments: 0

oxyjen ai工作流图运行时确定性执行开源

7

📝 深度技术 arXiv 机器学习 2026-06-09

Nonparametric LLM Evaluation from Preference Data

非参数方法评估LLM性能，突破参数假设限制，提供可靠的不确定性量化

arXiv:2601.21816v2 Announce Type: replace Abstract: Evaluating the performance of large language models (LLMs) from human preference data is crucial f…

非参数统计 llm评估偏好数据不确定性量化机器学习

8

📝 深度技术 arXiv 机器学习 2026-06-09

Code Is More Than Text: Uncertainty Estimation for Code Generation

代码不仅是文本，如何评估生成代码的不确定性？这篇论文提出新方法，为代码生成任务提供更可靠的置信度估计。

arXiv:2606.09577v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly deployed as code generators, where silently wrong prog…

代码生成不确定性估计大模型机器学习论文

9

🤖 AI·大模型 arXiv 机器学习 2026-06-08

Uncertainty-Aware LLM-Guided Policy Shaping for Sparse-Reward Reinforcement Learning

让大模型在稀疏奖励环境中引导强化学习策略，通过不确定性估计提升决策可靠性，有代码可复现。

arXiv:2606.06673v1 Announce Type: new Abstract: Sparse rewards and heterogeneous task sequences remain persistent challenges in Reinforcement Learning…

不确定性感知大型语言模型策略塑造稀疏奖励强化学习

10

🤖 AI·大模型 arXiv NLP 2026-06-05

Ask or Assume? Uncertainty-Aware Clarification-Seeking in Coding Agents

编程智能体何时该问、何时该猜？这篇论文提出不确定性感知的主动澄清策略，提升代码生成可靠性。

arXiv:2603.26233v2 Announce Type: replace Abstract: As Large Language Model (LLM) agents are increasingly deployed in open-ended domains like software…

编码代理不确定性感知澄清寻求 ai智能体代码生成

11

📝 深度技术 arXiv AI 2026-06-03

Uncertainty-Aware Clarification in LLM Agents with Information Gain

让LLM Agent学会主动追问澄清：用信息增益量化不确定性，提升任务成功率与交互效率

arXiv:2606.03135v1 Announce Type: new Abstract: Large Language Model (LLM) agents often operate under underspecified user instructions, where latent u…

llm agent 不确定性量化信息增益澄清机制交互优化

12

📝 深度技术 arXiv AI 2026-06-03

Using Reward Uncertainty to Induce Diverse Behaviour in Reinforcement Learning

用奖励不确定性引导智能体自我探索，强化学习实现真正多样化的行为涌现

arXiv:2606.03962v1 Announce Type: cross Abstract: Classical reinforcement learning (RL) typically seeks a deterministic policy that maximizes the expe…

强化学习奖励不确定性多样化行为探索策略人工智能

13

📝 深度技术 arXiv AI 2026-06-03

DMF: A Deterministic Memory Framework for Conversational AI Agents

确定性记忆框架让对话AI不再"失忆"，21页论文详解DMF如何提升对话一致性与可控性。

arXiv:2606.03463v1 Announce Type: new Abstract: Conversational AI agents require memory systems that are both scalable and semantically coherent acros…

确定性记忆对话ai 记忆框架大模型幻觉缓解

14

🤖 AI·大模型 arXiv 机器学习 2026-06-02

Uncertainty-Calibrated Diffusion for Reliable 3D Molecular Graph Generation

新方法通过不确定性校准提升3D分子图生成的可靠性，有望推动药物发现与材料设计。

arXiv:2606.01595v1 Announce Type: new Abstract: Bayesian inference provides a principled framework for modeling epistemic uncertainty in neural networ…

不确定性校准扩散模型 3d分子图生成药物发现可信ai

15

📝 深度技术 arXiv AI 2026-06-02

Does Compression Preserve Uncertainty? A Unified Benchmark for Quantized and Sparse LLMs via Conformal Prediction

压缩LLM时，准确率不是唯一指标——新基准用保形概率评估不确定性保留。

arXiv:2606.01850v1 Announce Type: new Abstract: Model compression techniques such as quantization and pruning are widely used to reduce the deployment…

llm压缩量化剪枝不确定性量化共形预测

16

🤖 AI·大模型 arXiv AI 2026-06-02

CA-BED: Conversation-Aware Bayesian Experimental Design

大语言模型在交互场景中如何主动提问降低不确定性？这篇论文提出对话感知贝叶斯实验设计方法。

arXiv:2606.01182v1 Announce Type: cross Abstract: Large Language Models (LLMs) excel at static reasoning tasks, yet their performance often degrades i…

大语言模型贝叶斯实验设计交互式推理不确定性降低问题选择

17

📝 深度技术 arXiv 机器学习 2026-06-02

Principle-Evolvable Scientific Discovery via Uncertainty Minimization

LLM科学智能体新范式：通过不确定性最小化动态演化假设空间，突破静态先验限制，提升发现效率与创新性。

arXiv:2602.06448v2 Announce Type: replace Abstract: Large Language Model (LLM)-based scientific agents have accelerated scientific discovery, yet they…

大语言模型科学发现不确定性最小化假设空间演化 llm智能体

18

📝 深度技术 arXiv 机器学习 2026-06-02

The Role of Ambiguity in Error Prediction via Uncertainty Quantification

探讨模糊性在不确定性量化中对错误预测的关键影响，为机器学习可靠性研究提供新视角。

arXiv:2606.02093v1 Announce Type: cross Abstract: The task of Error Prediction, namely predicting whether a model output is correct, is commonly tackl…

不确定性量化错误预测模糊性机器学习模型不确定性

19

🤖 AI·大模型 arXiv NLP 2026-06-01

Why Don't You Know? Evaluating the Impact of Uncertainty Sources on Uncertainty Quantification in LLMs

揭秘大语言模型「不确定性」的来源，一项严谨的技术评估论文，帮你理解LLM为何「不知道」。

arXiv:2604.10495v2 Announce Type: replace Abstract: As Large Language Models (LLMs) are increasingly deployed in real-world applications, reliable unc…

llm 不确定性量化来源评估论文大语言模型

20

📝 深度技术 arXiv NLP 2026-06-01

Counterfactual Graph for Multi-Agent LLM Calibration

用反事实图校准多智能体大模型的不确定性，提升群体决策可靠性。

arXiv:2605.30653v1 Announce Type: new Abstract: Multi-agent LLM systems often treat agreement as evidence: when many agents in a panel give the same a…

反事实图多智能体 llm校准大模型不确定性

🐂 牛哥精选

Equiv, check that an AI refactor did not change what your code does

Show HN: Guildly, a Slack like interface to run a company of AI employees

Integrating Local and Global Entropy for Uncertainty Quantification in LLMs

Mitigating Manifold Departure: Uncertainty-Aware Subspace Rectification for Trustworthy MLLM Decoding

SafeRun: Enabling Determinism in LLM Planning for Running

OxyJen v0.5: a deterministic graph runtime for AI workflows

Nonparametric LLM Evaluation from Preference Data

Code Is More Than Text: Uncertainty Estimation for Code Generation

Uncertainty-Aware LLM-Guided Policy Shaping for Sparse-Reward Reinforcement Learning

Ask or Assume? Uncertainty-Aware Clarification-Seeking in Coding Agents

Uncertainty-Aware Clarification in LLM Agents with Information Gain

Using Reward Uncertainty to Induce Diverse Behaviour in Reinforcement Learning

DMF: A Deterministic Memory Framework for Conversational AI Agents

Uncertainty-Calibrated Diffusion for Reliable 3D Molecular Graph Generation

Does Compression Preserve Uncertainty? A Unified Benchmark for Quantized and Sparse LLMs via Conformal Prediction

CA-BED: Conversation-Aware Bayesian Experimental Design

Principle-Evolvable Scientific Discovery via Uncertainty Minimization

The Role of Ambiguity in Error Prediction via Uncertainty Quantification

Why Don't You Know? Evaluating the Impact of Uncertainty Sources on Uncertainty Quantification in LLMs

Counterfactual Graph for Multi-Agent LLM Calibration

📅 日期