牛哥精选 · 所有

1

🤖 AI·大模型 arXiv AI 2026-07-14

Extending LLM Context via Associative Recurrent Memory

提出关联递归记忆方法，让大模型上下文窗口突破长度限制，高效处理超长序列

arXiv:2607.11614v1 Announce Type: cross Abstract: Extending the context length of large language models (LLMs) is critical for many real-world applica…

llm 上下文扩展关联记忆递归网络长序列

2

📝 深度技术 Hacker News AI 2026-07-13

AI Model Co-Design: Hardware-Friendly LLM Design

NVIDIA官方详解硬件感知的大模型设计，平衡吞吐量与延迟的Pareto前沿策略。

Article URL: https://developer.nvidia.com/blog/ai-model-co-design-hardware-friendly-llm-design/ Comments URL: https://news.ycombinator.com/item?id=488…

llm 模型协同设计硬件感知 transforme 吞吐量

3

🤖 AI·大模型 ByteByteGo 2026-07-07

ChatGPT vs Gemini vs Claude: How They Differ

三大AI模型ChatGPT、Gemini、Claude核心差异一图看懂，从自注意力机制到各自架构特点。

In this article, we will look at the various architectural forks the teams building these models encountered and the decisions they took.

chatgpt gemini claude transforme 自注意力

4

🤖 AI·大模型 arXiv AI 2026-06-29

Symmetry-Aware Transformer Training for Automated Planning

Transformer遇上自动化规划，对称性感知训练让AI规划更高效精准。

arXiv:2508.07743v2 Announce Type: replace Abstract: While transformers excel in many settings, their application in the field of automated planning is…

对称性感知 transforme 自动化规划训练方法 ai规划

5

⚡ 效率工具量子位 2026-06-26

英伟达MoE新开源：一行import，微调加速3.7倍

一行import即可加速MoE微调3.7倍，英伟达开源NeMo AutoModel，降低内存占用超30%，轻松提升训练效率。

在Transformers v5的基础上，增加了专家并行、DeepEP和TransformerEngine

英伟达新开源一行微调加速 moe

6

📝 深度技术 arXiv 机器学习 2026-06-24

An LLM-based Two-Stage Transformer Framework for Cross-Domain Bearing Fault Diagnosis with Limited Data

基于LLM的两阶段Transformer框架，攻克工业轴承故障诊断中数据异质、工况变化和标签稀缺的并发难题。

arXiv:2606.24459v1 Announce Type: new Abstract: Bearing fault diagnosis faces critical challenges when dataset heterogeneity, operating condition vari…

llm transforme 跨域故障诊断轴承有限数据

7

🤖 AI·大模型 Hacker News Show 2026-06-23

Show HN: Transformer Primitives – A visual explainer you can send to anyone

用视觉直观解释Transformer核心原理，非技术背景也能轻松理解GPT工作机制，适合分享给好奇的朋友。

I have had a few conversations in the past year with non-technical folks (traditional finance types, consultants) who asked for a simple explainer on …

transforme gpt 可视化解释非技术人群 ai科普

8

💰 商业科技 IT 之家 2026-06-20

谷歌 Gemini 联席负责人沙泽尔转投 OpenAI，奥尔特曼亲自发文欢迎

Transformer论文作者、Gemini联席负责人从谷歌跳槽OpenAI，奥尔特曼亲自欢迎，AI人才大战再添重磅一局。

IT之家 6 月 20 日消息，谷歌前工程副总裁、Gemini 技术联席负责人诺姆 · 沙泽尔宣布离职，转投 OpenAI。 IT之家获悉，当地时间 18 日，沙泽尔在 X 上宣布，离开谷歌是一个艰难决定，他为谷歌团队以及团队共同取得的成果感到自豪，“很高兴与大家分享，我将加入 OpenAI，也期待…

谷歌联席负责人沙泽尔转投奥尔特曼亲自发文欢迎

9

📄 文档手册量子位 2026-06-19

全球首个人形机器人通用小脑来了！全球最大规模2万小时人类动作数据，实现零样本泛化

全球首个人形机器人通用小脑，2万小时人类动作数据训练，实现零样本泛化，堪比机器人界的“GPT时刻”。

人形机器人正式迈入“GPT时代”

全球首个人形机器人通用小脑来了全球最大规模万小时人类动

10

🤖 AI·大模型 arXiv 机器学习 2026-06-16

Privacy from Symmetry: Orthogonally Equivariant Transformers for LLM Inference

利用对称性隐私保护新范式，正交等变Transformer让大模型推理更安全

arXiv:2606.16461v1 Announce Type: new Abstract: Running large language models locally is often impractical, pushing inference on sensitive text to thi…

正交等变变换器隐私保护 llm推理对称性机器学习理论

11

🤖 AI·大模型 arXiv AI 2026-06-12

Transformer-Guided Graph Attention for Direct Cardiac Mesh Reconstruction: A Structural Digital Twin Framework

用Transformer引导图注意力网络直接重建心脏网格，告别传统繁琐工作流，数字孪生更高效。

arXiv:2606.13188v1 Announce Type: cross Abstract: Building patient-specific cardiac models sits at the heart of precision cardiology, yet getting thos…

心脏网格重建 transforme 图注意力网络数字孪生 ai医学

12

🤖 AI·大模型 arXiv 机器学习 2026-06-11

Visualizing LLM Latent Space Geometry Through Dimensionality Reduction

用降维技术揭示大模型内部隐藏的几何结构，直观理解语言模型如何表征知识。

arXiv:2511.21594v3 Announce Type: replace Abstract: Large language models (LLMs) achieve state-of-the-art results across many natural language tasks, …

llm 降维潜在空间可视化可解释性

13

📝 深度技术 arXiv AI 2026-06-11

Unifying Learning Dynamics and Generalization in Transformers Scaling Law

揭秘Transformer缩放定律背后的学习动力学与泛化机制，87页长文深度统一理论框架。

arXiv:2512.22088v3 Announce Type: replace-cross Abstract: The scaling law, a cornerstone of Large Language Model (LLM) development, predicts improveme…

transforme 缩放定律学习动力学泛化深度学习理论

14

📝 深度技术 arXiv 机器学习 2026-06-10

Operator Fusion for LLM Inference on the Tensix Architecture

聚焦Tenstorrent Tensix架构的LLM推理瓶颈，提出RMSNorm与矩阵乘融合的算子优化策略，提升数据局部性。

arXiv:2606.09879v1 Announce Type: new Abstract: This study addresses on-device inference bottlenecks of Transformer models on Tenstorrent's Tensix arc…

llm推理算子融合 tensix架构数据局部性 transforme

15

📝 深度技术 arXiv AI 2026-06-10

Dynamic Linear Attention

揭秘新型注意力机制：动态线性注意力，同时提升效率与精度，ICML 2026录用论文。

arXiv:2606.10650v1 Announce Type: cross Abstract: The scalability of Large Language Models (LLMs) to long contexts is fundamentally constrained by the…

动态线性注意力 transforme 注意力机制 icml 2026 效率优化

16

📝 深度技术 arXiv 计算机视觉 2026-06-09

AQIFormer: A Transformer-Based Multi-View Architecture for Cross-City Air Quality Classification

基于Transformer的多视角架构AQIFormer，实现跨城市空气质量精准分类，创新融合计算机视觉与时空建模。

arXiv:2606.07648v1 Announce Type: new Abstract: Air pollution represents one of the most critical environmental and public health challenges globally,…

aqiformer transforme 多视角架构空气质量分类跨城市

17

📝 深度技术 arXiv AI 2026-06-05

SpanNorm: Reconciling Training Stability and Performance in Deep Transformers

突破PreNorm与PostNorm的困境：SpanNorm在提升深度Transformer训练稳定性同时保持高性能

arXiv:2601.22580v2 Announce Type: replace-cross Abstract: The success of Large Language Models (LLMs) hinges on the stable training of deep Transforme…

transforme 归一化 prenorm postnorm 训练稳定性

18

📝 深度技术 arXiv AI 2026-06-05

Consistency Training Along the Transformer Stack

Transformer一致性训练机制，通过堆叠层间约束提升模型表现与稳定性。

arXiv:2606.05817v1 Announce Type: cross Abstract: Consistency training encourages models to behave similarly across different contexts, and has shown …

transforme 一致性训练深度神经网络模型优化 emnlp2026

19

📝 深度技术 arXiv 计算机视觉 2026-06-03

Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking

GPT风格Transformer在十亿级运动数据上预训练，实现零样本全身运动跟踪。

arXiv:2606.03985v1 Announce Type: cross Abstract: We introduce Humanoid-GPT, a GPT-style Transformer with causal attention trained on a billion-scale …

humanoid-g 零样本运动跟踪 transforme 全身控制运动语料

20

🤖 AI·大模型 Hacker News AI 2026-06-02

AI Engineering for Developers

面向开发者的AI工程指南，从统计语言模型到基础模型，讲透LLM的演进与工程化实践。

Article URL: https://www.lucavall.in/blog/ai-engineering-for-developers Comments URL: https://news.ycombinator.com/item?id=48366525 Points: 1 # Commen…

ai工程大语言模型基础模型 transforme 开发者

🐂 牛哥精选