牛哥精选 · 半年

1

📝 深度技术 arXiv AI 2026-07-14

Valid $\ne$ Necessary: Diagnosing Latent Inefficiency in Chain-of-Thought

揭示CoT推理中有效步骤≠必要步骤，诊断大模型过度推理的冗余成本

arXiv:2607.11266v1 Announce Type: new Abstract: Chain-of-Thought (CoT) prompting has significantly advanced the reasoning capabilities of Large Langua…

chain-of-t 推理效率冗余步骤计算成本必要性分析

2

📝 深度技术 arXiv NLP 2026-07-13

Hierarchical Chain-of-Thought: Enhancing LLM Reasoning Performance and Efficiency

层次化思维链新方法，显著提升大模型推理性能与效率，值得关注的前沿研究。

arXiv:2604.00130v2 Announce Type: replace Abstract: Chain-of-Thought (CoT) prompting has significantly improved the reasoning capabilities of large la…

层次化思维链 llm推理推理效率思维链 arxiv论文

3

🤖 AI·大模型 arXiv 机器学习 2026-07-09

FPTQuant: Function-Preserving Transforms for LLM Quantization

新型LLM量化方法FPTQuant，通过功能保持变换解决离群值问题，高效节能不损性能。

arXiv:2506.04985v2 Announce Type: replace Abstract: Large language models (LLMs) require substantial compute, and thus energy, at inference time. Whil…

大语言模型量化功能保持离群值推理效率

4

📝 深度技术 arXiv AI 2026-06-23

The Language-Energy Divide: Measuring Energy Costs of Multilingual LLM Inference

首次系统量化不同语言在LLM推理中的能耗差异，揭示语言鸿沟对AI可持续性的影响

arXiv:2606.21869v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly deployed in multilingual settings, yet the energy cost…

能耗多语言大模型推理效率能源鸿沟 llm

5

🤖 AI·大模型 arXiv AI 2026-06-16

Less is More: Improving LLM Reasoning with Minimal Test-Time Intervention

揭秘LLM推理效率新思路：最小干预即可显著提升性能，少即是多！

arXiv:2510.13940v4 Announce Type: replace-cross Abstract: Recent progress in large language models (LLMs) has focused on test-time scaling to improve …

llm推理测试时干预不确定性推理效率 less is mo

6

📝 深度技术 arXiv AI 2026-06-08

Self-Consistency from Only Two Samples: CoT-PoT Ensembling for Efficient LLM Reasoning

仅用两个推理样本即可实现LLM自一致性？CoT+PoT集成方案，大幅提升推理效率的新突破。

arXiv:2604.17433v2 Announce Type: replace-cross Abstract: Self-consistency (SC) is a popular technique for improving the reasoning accuracy of large l…

llm推理自一致性 cot pot 集成学习

7

🤖 AI·大模型 Hacker News LLM 2026-06-06

Free LLM inference handbook: 100 engineers cloned it in week 1

破解LLM推理百倍成本差距的实战手册，一周被克隆百次

Article URL: https://github.com/harshuljain13/llm-inference-at-scale Comments URL: https://news.ycombinator.com/item?id=48424467 Points: 2 # Comments:…

llm推理内存带宽成本优化技术手册开源资源

8

🤖 AI·大模型 arXiv NLP 2026-06-03

HybridThinker: Efficient Chain-of-Thought Reasoning via Compressed Memory and Transient Thought Steps

HybridThinker用压缩记忆与瞬态思考步骤，让大模型链式推理更高效省力。

arXiv:2606.03768v1 Announce Type: new Abstract: Extended chain-of-thought (CoT) traces improve LLM reasoning but incur substantial computational and m…

hybridthin cot推理模型压缩记忆优化推理效率

9

🤖 AI·大模型 arXiv NLP 2026-06-03

ARBOR: Online Process Rewards via a Reusable Rubric Buffer for Search Agents

提出ARBOR框架，用可复用评分缓冲为搜索代理提供在线过程奖励，显著提升推理与搜索效率。

arXiv:2606.03239v1 Announce Type: new Abstract: LLM-based search agents are trained predominantly with outcome-only reward, leaving the search process…

arbor 过程奖励搜索代理在线学习可复用评分缓冲

10

📝 深度技术 arXiv AI 2026-06-03

Adaptive Latent Agentic Reasoning

自适应潜空间推理，让LLM代理告别冗长低效的思维链，大幅提升推理效率。

arXiv:2606.02871v1 Announce Type: cross Abstract: Large reasoning models improve performance by generating extended chain-of-thought (CoT) reasoning, …

adaptive l llm agents chain-of-t 推理效率自适应

11

📝 深度技术 arXiv 机器学习 2026-06-02

SmartThinker: Progressive Chain-of-Thought Length Calibration for Efficient Large Language Model Reasoning

提出SmartThinker方法，通过渐进式调整思维链长度，让大模型推理效率大幅提升，已被ICML 2026收录。

arXiv:2603.08000v2 Announce Type: replace-cross Abstract: Large reasoning models (LRMs) like OpenAI o1 and DeepSeek-R1 achieve high accuracy on comple…

smartthink chain-of-t 推理效率大语言模型长度校准

12

📝 深度技术 arXiv 机器学习 2026-06-02

LRAgent: Efficient KV Cache Sharing for Multi-LoRA LLM Agents

提出多LoRA LLM代理间KV缓存共享机制，有效降低推理计算开销，为服务优化提供新思路。

arXiv:2602.01053v2 Announce Type: replace Abstract: Role specialization in multi-LLM agent systems is often realized via multi-LoRA, where agents shar…

kv cache lora 多代理推理效率 llm

13

📝 深度技术 arXiv AI 2026-06-02

DyLLM: Efficient Diffusion LLM Inference via Saliency-based Token Selection and Partial Attention

融合显著性token选择与部分注意力，加速扩散LLM推理，被ICML 2026接收，效率提升显著。

arXiv:2603.08026v2 Announce Type: replace-cross Abstract: Masked diffusion language models enable parallel token decoding, providing a promising alter…

diffusion 推理效率显著性选择部分注意力 icml 2026

14

🤖 AI·大模型 arXiv NLP 2026-06-02

Geometric Latent Reasoning Induces Shorter Generations in LLMs

用几何潜在推理突破传统语言链，让LLM生成更短、更便宜且不牺牲准确性，重新定义推理效率。

arXiv:2606.02248v1 Announce Type: new Abstract: Large language models solve complex problems by generating lengthy chains of explicit reasoning tokens…

llm 潜推理几何推理生成效率推理链压缩

15

🤖 AI·大模型 arXiv 机器学习 2026-06-02

Off-the-Shelf LLMs as Process Scorers: Training-Free Alternative to PRMs for Mathematical Reasoning

无需额外训练，用现成大模型就能给数学推理过程打分，性能媲美专用过程奖励模型。

arXiv:2606.01682v1 Announce Type: cross Abstract: Selecting the best response from multiple small-model samples using a stronger scorer is a simple in…

llm 过程评分数学推理 prm 训练免费

16

📝 深度技术 arXiv 机器学习 2026-06-02

DOT-MoE: Differentiable Optimal Transport for MoEfication

用可微分最优传输优化MoE架构，提升大模型推理效率且训练更稳定。

arXiv:2606.01666v1 Announce Type: new Abstract: The scaling of Large Language Models (LLMs) has driven significant performance gains but created subst…

大型语言模型混合专家模型最优传输 moeficatio 推理效率

17

📝 深度技术 arXiv AI 2026-05-28

Zipping the Thought: When and How Compressed Reasoning Data Works in LLM Post-Training

探明压缩推理数据在LLM后训练中的生效条件与方式，为效率优化提供理论支撑。

arXiv:2605.28008v1 Announce Type: new Abstract: Large language models (LLMs) can now solve complex problems through long chain-of-thought (CoT) reason…

压缩推理数据 llm后训练推理效率数据压缩模型优化

18

🤖 AI·大模型 arXiv AI 2026-05-27

Self-signals Driven Multi-LLM Debate for Efficient and Accurate Reasoning

多LLM辩论新范式：利用自信号驱动，在高效与准确推理间取得突破，节省算力并提升效果。

arXiv:2510.06843v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) have exhibited impressive capabilities across diverse applicati…

多llm辩论自信号推理效率准确推理大模型协作

19

🤖 AI·大模型 arXiv NLP 2026-05-26

Selective Latent Thinking: Adaptive Compression of LLM Reasoning Chains

自适应压缩LLM推理链，用“选择性潜在思考”减少冗余，提升效率而保持准确性。

arXiv:2605.25745v1 Announce Type: new Abstract: Explicit chain-of-thought (CoT) reasoning substantially improves the reasoning ability of large langua…

选择性潜在思考 llm推理链自适应压缩推理效率

20

🤖 AI·大模型 arXiv 机器学习 2026-05-26

Knowing When to Quit: A Principled Framework for Dynamic Abstention in LLM Reasoning

让LLM学会"知难而退"：一个原则性框架，帮助模型动态判断何时应该放弃推理，提升可靠性与效率。

arXiv:2604.18419v4 Announce Type: replace Abstract: LLMs utilizing chain-of-thought reasoning often waste substantial compute by producing long, incor…

llm推理动态放弃不确定性可靠性自省机制

🐂 牛哥精选

Valid $\ne$ Necessary: Diagnosing Latent Inefficiency in Chain-of-Thought

Hierarchical Chain-of-Thought: Enhancing LLM Reasoning Performance and Efficiency

FPTQuant: Function-Preserving Transforms for LLM Quantization

The Language-Energy Divide: Measuring Energy Costs of Multilingual LLM Inference

Less is More: Improving LLM Reasoning with Minimal Test-Time Intervention

Self-Consistency from Only Two Samples: CoT-PoT Ensembling for Efficient LLM Reasoning

Free LLM inference handbook: 100 engineers cloned it in week 1

HybridThinker: Efficient Chain-of-Thought Reasoning via Compressed Memory and Transient Thought Steps

ARBOR: Online Process Rewards via a Reusable Rubric Buffer for Search Agents

Adaptive Latent Agentic Reasoning

SmartThinker: Progressive Chain-of-Thought Length Calibration for Efficient Large Language Model Reasoning

LRAgent: Efficient KV Cache Sharing for Multi-LoRA LLM Agents

DyLLM: Efficient Diffusion LLM Inference via Saliency-based Token Selection and Partial Attention

Geometric Latent Reasoning Induces Shorter Generations in LLMs

Off-the-Shelf LLMs as Process Scorers: Training-Free Alternative to PRMs for Mathematical Reasoning

DOT-MoE: Differentiable Optimal Transport for MoEfication

Zipping the Thought: When and How Compressed Reasoning Data Works in LLM Post-Training

Self-signals Driven Multi-LLM Debate for Efficient and Accurate Reasoning

Selective Latent Thinking: Adaptive Compression of LLM Reasoning Chains

Knowing When to Quit: A Principled Framework for Dynamic Abstention in LLM Reasoning

📅 日期