牛哥精选 · 本月

1

🤖 AI·大模型 arXiv AI 2026-06-10

SAFE: An LLM-as-Verifier Framework for Evidence-Grounded Multi-Hop Reasoning

LLM化身验证器，用证据链提升多跳推理的准确性与可信度

arXiv:2604.01993v2 Announce Type: replace-cross Abstract: Multi-hop QA benchmarks often reward Large Language Models (LLMs) for spurious correctness, …

多跳推理 llm验证器证据基础推理框架 arxiv论文

2

🤖 AI·大模型 arXiv AI 2026-06-10

Dep-LLM: Training-Free Depression Diagnosis via Evidence-Guided Structured Multi-factor with Reliable LLM Reasoning

无需训练的大模型抑郁症诊断方法，证据引导的多因素推理大幅提升可靠性

arXiv:2606.10796v1 Announce Type: cross Abstract: Automatic Depression Detection (ADD) from clinical interviews is a pivotal task in computational men…

抑郁症诊断 llm 无需训练证据引导多因素推理

3

🤖 AI·大模型 arXiv AI 2026-06-10

Learning Evidence Highlighting for Frozen LLMs

冻结大模型也能学会高亮关键证据？新方法提升LLM可解释性。

arXiv:2604.22565v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) can reason well, yet often miss decisive evidence when it is bu…

llm 证据高亮冻结模型推理优化模型适配

4

🤖 AI·大模型 arXiv 计算机视觉 2026-06-09

Can You Trust What You See? Human and AI Detection of Synthetic Legal Evidence

论文探讨人类与AI在识别合成法律证据上的可信度差异，揭示AI检测的挑战与局限。

arXiv:2606.07613v1 Announce Type: new Abstract: Visual evidence has long been treated as a reliable form of legal proof, but advances in artificial in…

合成证据检测 ai安全法律ai 人类识别信任

5

🤖 AI·大模型 arXiv 机器学习 2026-06-09

Explainable AML Triage with LLMs: Evidence Retrieval and Counterfactual Checks

大语言模型让反洗钱筛查更透明：通过证据检索与反事实检验提升可解释性，直击AI金融风控痛点。

arXiv:2604.19755v2 Announce Type: replace-cross Abstract: Anti-money laundering (AML) transaction monitoring generates large volumes of alerts that mu…

llm aml 反洗钱可解释ai 证据检索

6

🔗 链接工具 IT 之家 2026-06-09

iOS 27 首个代码证据曝光，苹果已开始系统适配其首款折叠 iPhone

首曝iOS 27折叠屏代码适配，苹果折叠iPhone软件层面实锤，科技资讯一手掌握

IT之家 6 月 9 日消息，消息源 @samhenrigold 今天（6 月 9 日）挖掘 iOS 27 Beta 1 更新代码，发现了 foldState 和 angleDegrees 两个字符串，9to5Mac 等多家海外媒体据此判断苹果首款折叠 iPhone 已进入系统适配阶段。 IT之家…

首个代码证据曝光苹果已开始系统适配其首款折叠

7

🤖 AI·大模型 arXiv NLP 2026-06-05

EDIT: Evidence-Diagnosed Intervention Training for Rule-Faithful LLM Grading

一种基于证据诊断的干预训练方法，让大模型评分时严格遵循给定规则，提升可靠性与可解释性

arXiv:2606.06350v1 Announce Type: new Abstract: Reliable rubric grading requires more than accurate score prediction. Each judgement must be grounded …

llm评分规则忠实性干预训练证据诊断教育评估

8

📝 深度技术 arXiv AI 2026-06-05

Answer Presence Drives RAG Rewriting Gains

打破刻板印象：RAG重写提升效果并非来自证据质量改善，而是答案存在性起主导作用

arXiv:2606.05633v1 Announce Type: new Abstract: Retrieval-augmented QA pipelines often route retrieved passages through an LLM \emph{rewriter} before …

rag 重写器答案存在性证据质量因果推理

9

🤖 AI·大模型 arXiv AI 2026-06-05

When Evidence is Sparse: Weakly Supervised Early Failure Alerting in Dialogs and LLM-Agent Trajectories

稀疏证据下精准预警！弱监督方法为对话与LLM-Agent系统早期失败检测提供新思路

arXiv:2606.05414v1 Announce Type: cross Abstract: Early failure alerting requires deciding, while a dialog or agent trajectory is still unfolding, whe…

弱监督学习早期失败预警对话系统 llm-agent 系统可靠性

10

🤖 AI·大模型 arXiv AI 2026-06-03

VulnAgent-R2: Evidence-Calibrated Multi-Agent Auditing for Repository-Level Vulnerability Detection

多智能体协同审计，通过证据校准提升仓库级代码漏洞检测精度，开源研究新范式。

arXiv:2603.13384v2 Announce Type: replace-cross Abstract: Software vulnerabilities often depend on cross-file data flow, build options, framework conv…

漏洞检测多智能体代码审计证据校准仓库级

11

💰 商业科技 IT 之家 2026-06-03

因 Grok 排名起诉苹果，马斯克被要求提交特斯拉、SpaceX 相关邮件作为证据

马斯克起诉苹果因Grok排名，反被法院要求提交特斯拉和SpaceX的内部邮件作为证据，戏剧性反转。

IT之家 6 月 3 日消息，美国地区法官马克・皮特曼驳回了 xAI 试图在针对苹果与 OpenAI 的诉讼中，将埃隆・马斯克名下特斯拉、SpaceX 相关邮件排除在证据开示范围之外的申请。上个月，苹果、OpenAI、X 以及 xAI 的法律团队在美国地方法官小哈尔 ·R· 雷主持下举行了听证会，…

排名起诉苹果马斯克被要求提交特斯拉相关邮件作为证据

12

🤖 AI·大模型 arXiv NLP 2026-06-02

A Registry-Bound LLM Pipeline for Evidence-Grounded Trait Extraction across Tropical Plants, Aquatic Species, and Exotic Pets

一种注册绑定的LLM管道，为热带植物、水生动物及异宠的性状提取提供可溯源证据，方法论严谨且跨物种适用。

arXiv:2606.00994v1 Announce Type: new Abstract: We describe a registry-bound large-language-model extraction pipeline producing evidence-grounded stru…

大语言模型特征提取证据驱动管道架构生物多样性信息学

13

🔐 安全/认证 IT 之家 2026-05-30

苏格兰儿童事务专员：没有证据表明社媒禁令能让孩子在网上更安全

IT之家 5 月 30 日消息，据英国 BBC 今日报道，苏格兰儿童事务专员尼古拉 · 基利恩表示，目前没有足够证据表明，禁止 16 岁以下儿童使用社交媒体能让儿童在网上更安全。基利恩警告称，禁令反而可能把儿童推向监管更少、风险更高的互联网角落，政策重点不应放在限制儿童身上，而应放在追究社交媒…

苏格兰儿童事务专员没有证据表明社媒禁令能让孩子在网上更

14

📝 深度技术 arXiv AI 2026-05-28

DeepSciVerify: Verifying Scientific Claim--Citation Alignment via LLM-Driven Evidence Escalation

LLM驱动证据升级，自动验证科学主张与引文匹配，提升科研诚信。

arXiv:2605.27710v1 Announce Type: new Abstract: Misalignment between claims and their cited evidence is a common failure mode in reports generated by …

llm 科学验证引文对齐证据升级科研诚信

15

📝 深度技术 arXiv AI 2026-05-28

From Learning Resources to Competencies: LLM-Based Tagging with Evidence and Graph Constraints

LLM联合证据与图谱约束，让学习资源自动标记更透明、更精准

arXiv:2605.28483v1 Announce Type: new Abstract: Linking learning resources to a structured competency framework is key to enabling competency-based se…

llm 能力标注学习资源证据图可解释ai

16

📄 文档手册 IT 之家 2026-05-26

美国政府诉苹果反垄断案再起波澜：苹果称 DOJ“程序性拖沓”并要求法院介入

苹果指控美国司法部程序拖沓，反垄断案证据开示僵局升级，双方互怼细节曝光。

IT之家 5 月 26 日消息，据 AppleInsider 今日报道，美国司法部（DOJ）与苹果之间的反垄断诉讼陷入证据开示僵局。根据联合文件，苹果指责美国政府及其传唤的 14 家机构正在拖延案件进程。美国司法部于 2024 年 6 月对苹果提起反垄断诉讼，经过一年的上诉程序后，案件得以推进。但…

美国政府诉苹果反垄断案再起波澜苹果称程序性拖沓

17

📝 深度技术 arXiv AI 2026-05-23

Ex-GraphRAG: Interpretable Evidence Routing for Graph-Augmented LLMs

新型图增强LLM框架Ex-GraphRAG，通过可解释证据路由解决GNN编码器节点贡献纠缠问题。

arXiv:2605.21994v1 Announce Type: cross Abstract: GraphRAG conditions language models on subgraphs retrieved from knowledge graphs, encoded via messag…

graphrag 可解释性图增强大模型证据路由 gnn

18

📝 深度技术 arXiv 计算机视觉 2026-05-20

GCE-MIL: Faithful and Recoverable Evidence for Multiple Instance Learning in Whole-Slide Imaging

GCE-MIL提出新范式，让全切片图像分类的可信证据既能忠实还原又可恢复，解决可解释性痛点

arXiv:2605.17456v1 Announce Type: new Abstract: Multiple instance learning (MIL) is the standard approach for whole-slide image (WSI) classification a…

gce-mil 多实例学习全切片图像可解释性病理ai

19

🤖 AI·大模型 arXiv NLP 2026-05-20

Med-V1: Small Language Models for Zero-shot and Scalable Biomedical Evidence Attribution

提出小语言模型Med-V1，零样本实现生物医学证据归因，兼顾规模与可扩展性

arXiv:2603.05308v2 Announce Type: replace Abstract: Assessing whether an article supports an assertion is essential for hallucination detection and cl…

小语言模型零样本生物医学证据归因可扩展性

20

🤖 AI·大模型 arXiv NLP 2026-05-20

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

质疑LLM在科研评估中的可靠性，这项研究揭示了关键局限。

arXiv:2605.19196v1 Announce Type: new Abstract: Deep research agents increasingly automate complex information-seeking tasks, producing evidence-groun…

llm评估可信度研究研究代理证据推理 arxiv论文

🐂 牛哥精选

SAFE: An LLM-as-Verifier Framework for Evidence-Grounded Multi-Hop Reasoning

Dep-LLM: Training-Free Depression Diagnosis via Evidence-Guided Structured Multi-factor with Reliable LLM Reasoning

Learning Evidence Highlighting for Frozen LLMs

Can You Trust What You See? Human and AI Detection of Synthetic Legal Evidence

Explainable AML Triage with LLMs: Evidence Retrieval and Counterfactual Checks

iOS 27 首个代码证据曝光，苹果已开始系统适配其首款折叠 iPhone

EDIT: Evidence-Diagnosed Intervention Training for Rule-Faithful LLM Grading

Answer Presence Drives RAG Rewriting Gains

When Evidence is Sparse: Weakly Supervised Early Failure Alerting in Dialogs and LLM-Agent Trajectories

VulnAgent-R2: Evidence-Calibrated Multi-Agent Auditing for Repository-Level Vulnerability Detection

因 Grok 排名起诉苹果，马斯克被要求提交特斯拉、SpaceX 相关邮件作为证据

A Registry-Bound LLM Pipeline for Evidence-Grounded Trait Extraction across Tropical Plants, Aquatic Species, and Exotic Pets

苏格兰儿童事务专员：没有证据表明社媒禁令能让孩子在网上更安全

DeepSciVerify: Verifying Scientific Claim--Citation Alignment via LLM-Driven Evidence Escalation

From Learning Resources to Competencies: LLM-Based Tagging with Evidence and Graph Constraints

美国政府诉苹果反垄断案再起波澜：苹果称 DOJ“程序性拖沓”并要求法院介入

Ex-GraphRAG: Interpretable Evidence Routing for Graph-Augmented LLMs

GCE-MIL: Faithful and Recoverable Evidence for Multiple Instance Learning in Whole-Slide Imaging

Med-V1: Small Language Models for Zero-shot and Scalable Biomedical Evidence Attribution

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

📅 日期