牛哥精选 · 本月

1

📝 深度技术 arXiv 机器学习 2026-05-20

Beyond Explained Variance: A Cautionary Tale of PCA

PCA中解释方差并非万能指标，本文通过实例警示其潜在陷阱，值得数据分析者关注。

arXiv:2605.13520v2 Announce Type: replace-cross Abstract: We address shortcomings of principal component analysis (PCA) for visualizing high-dimension…

pca 主成分分析解释方差数据分析统计学习

2

📝 深度技术 arXiv NLP 2026-05-20

PQR: A Framework to Generate Diverse and Realistic User Queries that Elicit QA Agent Failures

提出PQR框架自动生成多样真实用户查询，精准发现QA agent的失败边界，补足对抗性测试的盲区

arXiv:2605.16551v1 Announce Type: new Abstract: Evaluating LLM-based agents remains challenging because identifying meaningful failure cases often req…

pqr框架用户查询生成 qa agent测试多样性真实性

3

📝 深度技术 arXiv NLP 2026-05-20

SKG-Eval: Stateful Evaluation of Multi-Turn Dialogue via Incremental Semantic Knowledge Graphs

基于增量语义知识图谱的状态化多轮对话评估方法，提升对话系统评测的连贯性与深度。

arXiv:2605.16650v1 Announce Type: new Abstract: Evaluating multi-turn dialogue systems remains challenging because response quality depends not only o…

skg-eval 多轮对话评估语义知识图谱对话系统状态化评估

4

🤖 AI·大模型 arXiv NLP 2026-05-20

Language Acquisition Device in Large Language Models

探讨如何借鉴语言习得装置，通过合成语言预训练提升大模型的数据效率，为AI发展带来新思路。

arXiv:2605.16758v1 Announce Type: new Abstract: Large Language Models (LLMs) remain substantially less data-efficient than humans. Pre-pretraining (PP…

大型语言模型语言习得数据效率合成语言预训练

5

🤖 AI·大模型 arXiv NLP 2026-05-20

Exploring Lightweight Large Language Models for Court View Generation

轻量级大模型在法律AI中展现潜力，这篇论文系统探索了小于2B参数模型在法院观点生成任务上的表现。

arXiv:2605.16770v1 Announce Type: new Abstract: Criminal Court View Generation (CVG) is a critical task in Legal Artificial Intelligence (Legal AI), i…

轻量级大模型法律ai 法院观点生成案件事实模型能力

6

📝 深度技术 arXiv NLP 2026-05-20

E-PMQ: Expert-Guided Post-Merge Quantization with Merged-Weight Anchoring

提出专家引导的后合并量化方法，利用合并权重锚定，在低资源部署中平衡模型压缩与性能。

arXiv:2605.16882v1 Announce Type: new Abstract: Low-resource deployment constraints have made model quantization essential for deploying neural networ…

模型量化神经网络压缩后合并量化加权锚定低资源部署

7

📝 深度技术 arXiv NLP 2026-05-20

Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps

突破大模型长上下文推理瓶颈，百步内将全注意力高效转为稀疏，平衡效率与精度。

arXiv:2605.16928v1 Announce Type: new Abstract: Long-context inference in large language models is bottlenecked by the quadratic cost of full attentio…

大模型长上下文稀疏注意力训练效率推理优化

8

📝 深度技术 arXiv NLP 2026-05-20

Effort as Ceiling, Not Dial: Reasoning Budget Does Not Modulate Cognitive Cost Alignment Between Humans and Large Reasoning Models

揭示推理预算为何无法调节人类与大型模型的认知成本对齐，努力是天花板而非旋钮。

arXiv:2605.16938v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) generate chain-of-thought traces whose length tracks human reaction time…

推理预算认知对齐大型推理模型 ai努力模型行为

9

📝 深度技术 arXiv NLP 2026-05-20

Roll Out and Roll Back: Diffusion LLMs are Their Own Efficiency Teachers

扩散LLM无需外部教师，通过“展开回退”策略自我提升推理效率，开辟模型加速新方向。

arXiv:2605.16941v1 Announce Type: new Abstract: Diffusion Large Language Models (DLLMs) promise fast parallel generation, yet open-source DLLMs still …

扩散模型大语言模型效率优化自蒸馏推理加速

10

📝 深度技术 arXiv NLP 2026-05-20

HalluScore: Large Language Model Hallucination Question Answering Benchmark

关注LLM幻觉？HalluScore填补阿拉伯语基准空白，专测大语言模型问答中的幻觉问题。

arXiv:2605.17007v1 Announce Type: new Abstract: Large language models (LLMs) have achieved remarkable progress in natural language generation, but rem…

llm 幻觉检测阿拉伯语基准测试大语言模型

11

🤖 AI·大模型 arXiv NLP 2026-05-20

Multilingual and Multimodal LLMs in the Wild: Building for Low-Resource Languages

探讨多模态大模型在低资源语言环境下的实际构建挑战与方案

arXiv:2605.17152v1 Announce Type: new Abstract: Multimodal LLMs are evolving from vision-language to tri-modality that see, hear, and read, yet pipeli…

低资源语言多模态llm 大模型多语言 ai研究

12

📝 深度技术 arXiv NLP 2026-05-20

BELIEF: Structured Evidence Modeling and Uncertainty-Aware Fusion for Biomedical Question Answering

生物医学问答新突破：结构化证据建模+不确定性感知融合，提升答案准确性与可靠性。

arXiv:2605.17435v1 Announce Type: new Abstract: Biomedical question answering often requires decisions from retrieved literature whose relevance, qual…

生物医学问答证据建模不确定性融合 ai论文

13

📝 深度技术 arXiv NLP 2026-05-20

VerifyMAS: Hypothesis Verification for Failure Attribution in LLM Multi-Agent Systems

提出假设验证方法定位LLM多智能体系统故障原因，提升系统可靠性

arXiv:2605.17467v1 Announce Type: new Abstract: Large language model-driven multi-agent systems (LLM-MAS) excel at complex tasks, yet unreliable agent…

llm多智能体系统故障归因假设验证系统可靠性

14

🔓 开源项目 arXiv NLP 2026-05-20

Temporal Decay of Co-Citation Predictability: A 20-Year Statute Retrieval Benchmark from 396M Ukrainian Court Citations

基于3.96亿乌克兰法院引用，揭示20年间共引预测能力随时间衰减的规律。

arXiv:2605.17639v1 Announce Type: new Abstract: Co-citation structure is widely assumed to provide stable retrieval signal in legal information system…

法律引用共引分析时间衰减基准测试乌克兰法院

15

📝 深度技术 arXiv NLP 2026-05-20

Systematic Evaluation of the Quality of Synthetic Clinical Notes Rephrased by LLMs at Million-Note Scale

百万级临床笔记重写质量系统性评估，揭示LLM文本生成多维评价短板

arXiv:2605.17775v1 Announce Type: new Abstract: Large language models (LLMs) can generate or synthesize clinical text for a wide range of applications…

llms 临床笔记大规模评估文本重写质量评估

16

📝 深度技术 arXiv NLP 2026-05-20

Prompt Compression in Diffusion Large Language Models: Evaluating LLMLingua-2 on LLaDA

评估LLMLingua-2在扩散大模型LLaDA上的提示压缩效果，探索高效推理新路径。

arXiv:2605.17932v1 Announce Type: new Abstract: Prompt compression reduces inference cost and context length in large language models, but prior evalu…

prompt com diffusion llmlingua- llada 模型评估

17

📝 深度技术 arXiv NLP 2026-05-20

AutoVecCoder: Teaching LLMs to Generate Explicitly Vectorized Code

让大模型生成显式向量化代码，大幅提升程序性能，来自arXiv的前沿研究。

arXiv:2605.17978v1 Announce Type: new Abstract: Vectorization via Single Instruction, Multiple Data (SIMD) architectures is a cornerstone of high-perf…

llm 代码生成向量化自动编程性能优化

18

🤖 AI·大模型 arXiv NLP 2026-05-20

PROTEA: Offline Evaluation and Iterative Refinement for Multi-Agent LLM Workflows

多智能体LLM工作流的离线评估与迭代优化新框架，即将亮相ACL 2026，助力复杂协作场景调优。

arXiv:2605.18032v1 Announce Type: new Abstract: Multi-agent LLM workflows -- systems composed of multiple role-specific LLM calls -- often outperform …

多智能体llm 离线评估迭代优化工作流 acl 2026

19

🤖 AI·大模型 arXiv NLP 2026-05-20

PPAI: Enabling Personalized LLM Agent Interoperability for Collaborative Edge Intelligence

将个性化LLM代理部署到边缘设备，实现P2P协作，突破本地能力限制

arXiv:2605.18067v1 Announce Type: new Abstract: Deploying large language model (LLM) on edge device enables personalized LLM agents for various users.…

个性化llm 边缘智能协作互操作 p2p协同

20

📝 深度技术 arXiv NLP 2026-05-20

iPOE: Interpretable Prompt Optimization via Explanations

基于解释的提示优化新方法，让大模型提示更透明、可解释。

arXiv:2605.18113v1 Announce Type: new Abstract: Prompt optimization has often been framed as a discrete search problem to find high-performing and rob…

ipoe 提示优化可解释性大语言模型

🐂 牛哥精选

Beyond Explained Variance: A Cautionary Tale of PCA

PQR: A Framework to Generate Diverse and Realistic User Queries that Elicit QA Agent Failures

SKG-Eval: Stateful Evaluation of Multi-Turn Dialogue via Incremental Semantic Knowledge Graphs

Language Acquisition Device in Large Language Models

Exploring Lightweight Large Language Models for Court View Generation

E-PMQ: Expert-Guided Post-Merge Quantization with Merged-Weight Anchoring

Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps

Effort as Ceiling, Not Dial: Reasoning Budget Does Not Modulate Cognitive Cost Alignment Between Humans and Large Reasoning Models

Roll Out and Roll Back: Diffusion LLMs are Their Own Efficiency Teachers

HalluScore: Large Language Model Hallucination Question Answering Benchmark

Multilingual and Multimodal LLMs in the Wild: Building for Low-Resource Languages

BELIEF: Structured Evidence Modeling and Uncertainty-Aware Fusion for Biomedical Question Answering

VerifyMAS: Hypothesis Verification for Failure Attribution in LLM Multi-Agent Systems

Temporal Decay of Co-Citation Predictability: A 20-Year Statute Retrieval Benchmark from 396M Ukrainian Court Citations

Systematic Evaluation of the Quality of Synthetic Clinical Notes Rephrased by LLMs at Million-Note Scale

Prompt Compression in Diffusion Large Language Models: Evaluating LLMLingua-2 on LLaDA

AutoVecCoder: Teaching LLMs to Generate Explicitly Vectorized Code

PROTEA: Offline Evaluation and Iterative Refinement for Multi-Agent LLM Workflows

PPAI: Enabling Personalized LLM Agent Interoperability for Collaborative Edge Intelligence

iPOE: Interpretable Prompt Optimization via Explanations

📅 日期