Equiv, check that an AI refactor did not change what your code does
开源工具Equiv提供确定性字节级验证,确保AI重构不改变代码行为,拒绝模型主观判断。
Article URL: https://github.com/Neelagiri65/equiv Comments URL: https://news.ycombinator.com/item?id=48515830 Points: 1 # Comments: 0
开源工具Equiv提供确定性字节级验证,确保AI重构不改变代码行为,拒绝模型主观判断。
Article URL: https://github.com/Neelagiri65/equiv Comments URL: https://news.ycombinator.com/item?id=48515830 Points: 1 # Comments: 0
像管理Slack团队一样管理AI员工,每个AI严格遵循「调查-计划-办单-分支-PR-审查」的确定性流程,杜绝随意操作。
If you're a solo founder running 5-6 claude code terminals and manually orchestrating work between them, this is for you. Comments URL: https://news.y…
结合局部与全局熵的新方法,提升大模型不确定性量化精度,值得关注。
arXiv:2606.09875v1 Announce Type: cross Abstract: Large language models hallucinate confidently, making uncertainty quantification (UQ) essential for …
ICML 2026新研究用不确定性感知子空间纠正,让多模态大模型解码更可信,有效缓解流形偏离问题。
arXiv:2606.09859v1 Announce Type: cross Abstract: MLLMs frequently hallucinate objects inconsistent with visual inputs. This issue is typically attrib…
ICML 2026 workshop论文,聚焦如何让LLM在跑步规划中摆脱随机性、实现可复现的确定性输出,提升安全性与可靠性。
arXiv:2606.09027v1 Announce Type: new Abstract: Large Language Models enable flexible natural-language planning but remain unreliable in determinism-c…
OxyJen v0.5:为AI工作流打造的确定性图运行时,强调可靠执行而非与LangChain4j竞争。
Article URL: https://github.com/11divyansh/OxyJen Comments URL: https://news.ycombinator.com/item?id=48456722 Points: 1 # Comments: 0
非参数方法评估LLM性能,突破参数假设限制,提供可靠的不确定性量化
arXiv:2601.21816v2 Announce Type: replace Abstract: Evaluating the performance of large language models (LLMs) from human preference data is crucial f…
代码不仅是文本,如何评估生成代码的不确定性?这篇论文提出新方法,为代码生成任务提供更可靠的置信度估计。
arXiv:2606.09577v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly deployed as code generators, where silently wrong prog…
让大模型在稀疏奖励环境中引导强化学习策略,通过不确定性估计提升决策可靠性,有代码可复现。
arXiv:2606.06673v1 Announce Type: new Abstract: Sparse rewards and heterogeneous task sequences remain persistent challenges in Reinforcement Learning…
编程智能体何时该问、何时该猜?这篇论文提出不确定性感知的主动澄清策略,提升代码生成可靠性。
arXiv:2603.26233v2 Announce Type: replace Abstract: As Large Language Model (LLM) agents are increasingly deployed in open-ended domains like software…
让LLM Agent学会主动追问澄清:用信息增益量化不确定性,提升任务成功率与交互效率
arXiv:2606.03135v1 Announce Type: new Abstract: Large Language Model (LLM) agents often operate under underspecified user instructions, where latent u…
用奖励不确定性引导智能体自我探索,强化学习实现真正多样化的行为涌现
arXiv:2606.03962v1 Announce Type: cross Abstract: Classical reinforcement learning (RL) typically seeks a deterministic policy that maximizes the expe…
确定性记忆框架让对话AI不再"失忆",21页论文详解DMF如何提升对话一致性与可控性。
arXiv:2606.03463v1 Announce Type: new Abstract: Conversational AI agents require memory systems that are both scalable and semantically coherent acros…
新方法通过不确定性校准提升3D分子图生成的可靠性,有望推动药物发现与材料设计。
arXiv:2606.01595v1 Announce Type: new Abstract: Bayesian inference provides a principled framework for modeling epistemic uncertainty in neural networ…
压缩LLM时,准确率不是唯一指标——新基准用保形概率评估不确定性保留。
arXiv:2606.01850v1 Announce Type: new Abstract: Model compression techniques such as quantization and pruning are widely used to reduce the deployment…
大语言模型在交互场景中如何主动提问降低不确定性?这篇论文提出对话感知贝叶斯实验设计方法。
arXiv:2606.01182v1 Announce Type: cross Abstract: Large Language Models (LLMs) excel at static reasoning tasks, yet their performance often degrades i…
LLM科学智能体新范式:通过不确定性最小化动态演化假设空间,突破静态先验限制,提升发现效率与创新性。
arXiv:2602.06448v2 Announce Type: replace Abstract: Large Language Model (LLM)-based scientific agents have accelerated scientific discovery, yet they…
探讨模糊性在不确定性量化中对错误预测的关键影响,为机器学习可靠性研究提供新视角。
arXiv:2606.02093v1 Announce Type: cross Abstract: The task of Error Prediction, namely predicting whether a model output is correct, is commonly tackl…
揭秘大语言模型「不确定性」的来源,一项严谨的技术评估论文,帮你理解LLM为何「不知道」。
arXiv:2604.10495v2 Announce Type: replace Abstract: As Large Language Models (LLMs) are increasingly deployed in real-world applications, reliable unc…
用反事实图校准多智能体大模型的不确定性,提升群体决策可靠性。
arXiv:2605.30653v1 Announce Type: new Abstract: Multi-agent LLM systems often treat agreement as evidence: when many agents in a panel give the same a…