牛哥精选 · 三个月

1

🤖 AI·大模型 Hacker News LLM 2026-07-14

Show HN: Low-latency local LLM runner via OpenJDK Panama FFM (Java 22)

基于OpenJDK Panama FFM的Java 22低延迟本地LLM运行器，统一硬件编排与零分配ABI，线程安全C API实现零拷贝编译。

I wanted to run AI from inside the JVM. I started out with the standard REST sidecar, ripped that out to use Project Panama (Foreign Function & Memory…

低延迟本地llm openjdk java22 panamaffm

2

📝 深度技术 Hacker News LLM 2026-07-14

Show HN: Libargus:Low-latency local LLM runner via OpenJDK Panama FFM (Java 22)

低延迟的JVM本地LLM推理方案，利用OpenJDK Panama FFM避免传统串行化与GC开销。

Most existing approaches for running local LLM inference within the JVM ecosystem rely on spawning out-of-process daemons via REST sidecars (introduci…

openjdk java 22 llm 低延迟本地推理

3

📝 深度技术 arXiv AI 2026-07-14

OS-Pruner: Pruning Chains-of-Thought of Reasoning Models via Optimal Stopping

针对大模型CoT推理中“过度思考”的冗余步骤，用最优停止策略动态剪枝，降本增效不丢精度。

arXiv:2607.11089v1 Announce Type: new Abstract: Large Language Models (LLMs) have achieved remarkable success in complex reasoning tasks through Chain…

大语言模型 chain-of-t 推理模型最优停止剪枝

4

🚀 产品观察 IT 之家 2026-07-14

微软警告：AI 加快漏洞挖掘速度，不建议 Windows 安全更新延迟超过三天

微软警告AI加速漏洞挖掘，安全更新延迟别超三天，Windows用户必看。

IT之家 7 月 14 日消息，在例行的“星期二补丁”更新推送前夕，微软向 Windows 用户与 IT 管理员发出了一项前所未有的严正警告 —— 不建议将 Windows 安全更新的部署延迟超过三天。这一政策调整的核心原因直指人工智能：AI 技术正在将漏洞从公开披露到被黑客利用的时间窗口，从此前…

微软警告加快漏洞挖掘速度不建议安全更新延迟

5

🎨 设计工具 IT 之家 2026-07-14

iQOO TWS 5e 耳机开启预约：50hr 续航、42ms 全链路延迟

IT之家 7 月 14 日消息，iQOO 手机官方今日 10:00 表示，其无线蓝牙耳机新品 iQOO TWS 5e 启动预约，官网显示 7 月 24 日 10 点正式开售。这一型号可选锋芒黄与电光白两种配色，支持智能主动降噪和 Monster Sound 电竞声效，提供 50hr 超长续航，…

耳机开启预约续航全链路延迟

6

📝 深度技术 Hacker News AI 2026-07-13

AI Model Co-Design: Hardware-Friendly LLM Design

NVIDIA官方详解硬件感知的大模型设计，平衡吞吐量与延迟的Pareto前沿策略。

Article URL: https://developer.nvidia.com/blog/ai-model-co-design-hardware-friendly-llm-design/ Comments URL: https://news.ycombinator.com/item?id=488…

llm 模型协同设计硬件感知 transforme 吞吐量

7

📝 深度技术 Dev.to 2026-07-11

What Switching From C# to Rust Actually Taught Me

从C#转向Rust，一次关于可控性与编译期安全的深刻对比，揭示内存管理哲学如何塑造代码行为。

I've spent most of my career writing C#. ASP.NET Core, EF Core, the whole ecosystem — it's productive, well-documented, and I've never had a real comp…

c# rust 编程语言对比内存管理编译时安全

8

🚀 产品观察 36氪 2026-07-10

恒瑞医药回应“双艾”组合再次延迟批准上市：未涉及产品安全性及有效性

恒瑞医药公告公司收到美国食品药品监督管理局关于注射用卡瑞利珠单抗联合甲磺酸阿帕替尼片用于不可切除或转移性肝细胞癌患者的一线治疗的生物制品许可申请的完整回复信。这也是“双艾”组合第三次被FDA延迟批准上市。对此，恒瑞医药回应称，“稳步推进国际化是公司的长期发展战略，公司多项创新药的海外研发进程均按计划…

恒瑞医药回应双艾组合再次延迟批准上市未涉及产品安

9

🤖 AI·大模型 IT 之家 2026-07-08

OpenAI 更新 2.1 系列 GPT Realtime AI 模型，p95 延迟至少降低 25%

OpenAI实时AI模型延迟降低25%，价格更新，专注低延迟语音与多模态交互。

IT之家 7 月 8 日消息，OpenAI 昨日（7 月 7 日）发布公告，宣布在其 API 调用中，新增 gpt-realtime-2.1 和 gpt-realtime-2.1-mini 两款模型，官方表示 p95 延迟至少下降 25%。 IT之家注：p95 延迟（Percentile 95 D…

更新系列模型延迟至少降低 openai

10

🤖 AI·大模型 Dev.to 2026-07-08

Giving LLMs access to a bash terminal is terrifying

让LLM直接操作bash终端，轻量级快照平衡速度与安全，但潜在风险恐怖。

Recently, I was testing an autonomous AI agent in a local directory. I gave it a multi-step coding task and a terminal tool so it could run its own te…

llm bash终端安全风险轻量级快照代理工作流

11

📝 深度技术 Dev.to 2026-07-07

DeepSeek vs Qwen vs Kimi vs GLM: Which AI API Actually Wins in 2025?

实测DeepSeek、Qwen、Kimi、GLM四款AI API在生产环境下的表现，带p99延迟监控和负载均衡实战对比。

DeepSeek vs Qwen vs Kimi vs GLM: Which AI API Actually Wins in 2025? I've spent the last decade designing systems that need to stay up no matter what.…

deepseek qwen kimi glm ai api

12

🤖 AI·大模型 Hacker News LLM 2026-07-07

Beyond Prediction: Tail-Aware Scheduling for LLM Inference

大模型推理的延迟"长尾"痛点有解了！本文提出GPU调度新方法UniBoost，通过感知输出长度分布，显著降低高响应百分位延迟。

Article URL: https://yl3469.github.io/uniboost-icml26/ Comments URL: https://news.ycombinator.com/item?id=48808378 Points: 2 # Comments: 0

llm推理长尾延迟调度优化 uniboost 输出长度方差

13

📝 深度技术 Hacker News Ask 2026-07-07

The Hard Parts of Streaming Audio in Voice Agents

语音代理中流式音频的延迟难题，从原理到开源解决方案，技术深度与实战兼备。

Blog: https://gokuljs.com/blogs/when-latency-becomes-audible code: https://github.com/gokuljs/GoSFU Comments URL: https://news.ycombinator.com/item?id…

流式音频语音代理延迟优化实时通信开源项目

14

📝 深度技术 ByteByteGo 2026-07-02

How OpenAI Delivers Low-Latency Voice AI for 900M Users

揭秘OpenAI如何用WebRTC为9亿用户打造毫秒级语音AI体验，技术选型干货满满。

In this article, we will look at the entire journey in detail and challenges the OpenAI engineering team faced.

openai 低延迟语音ai webrtc 大规模部署

15

🤖 AI·大模型 IT 之家 2026-07-02

别克至境 E7 推出“交车关怀礼”：芯片供货紧张，延迟 1 天补贴 100 元

IT之家 7 月 2 日消息，上汽通用别克 7 月 1 日针对至境 E7 推出“交车关怀礼”，承诺将自锁单之日起 3 天内完成车辆交付，延迟 1 天补贴 100 元。公告称，受全球半导体供应链波动影响，近期出现芯片供货紧张，部分用户的车辆交付时间可能有所延迟。即日起至 7 月 31 日，下定…

别克至境推出交车关怀礼芯片供货紧张延迟

16

⚡ 效率工具 Dev.to 2026-07-01

From 30-Second Polling to Real Push Notifications

从30秒轮询到实时推送，一篇文章揭示推送通知的延迟痛点与FCM v1的升级关键

This article was originally published on Jo4 Blog . Our notification bell was lying to users. Not maliciously. It just... lagged. A publisher would su…

实时推送 fcm v1 轮询优化通知延迟移动开发

17

🔓 开源项目 Hacker News AI 2026-06-30

Show HN: A Firewall for AI agents with auditing

用Rust打造的AI agent防火墙，5毫秒内响应，通过计划强制执行防止幻觉和延迟问题，支持审计。

Hi all, As there are more and more agents in the internet; Security is going to be a big problem. Currently, the problem is solved using a LLM to guar…

ai安全防火墙 rust 低延迟审计

18

📧 通讯 IT 之家 2026-06-30

微软扩大推送低延迟配置功能，让更多 Win11 PC 提速开始菜单等界面

IT之家 6 月 30 日消息，科技媒体 Windows Latest 昨日（6 月 29 日）发布博文，报道称经官方证实，微软通过 6 月可选更新 KB5095093，向更多 PC 推送低延迟配置（Low Latency Profile）功能，从而提高开始菜单等打开速度。 IT之家注：低延迟配置是…

微软扩大推送低延迟配置功让更多提速开始菜单等界面

19

📝 深度技术 Dev.to 2026-06-29

What Actually Happens When You Call an LLM API

从光速延迟揭秘调用LLM API背后隐藏的网络物理限制，让你重新理解AI服务的响应时间。

you've felt it. you type a prompt, hit send, and the response starts streaming in under a second. smooth. instant. you feel like you're thinking out l…

llm api 网络延迟光速限制物理距离服务器处理

20

📝 深度技术 arXiv AI 2026-06-29

Ranking Before Serving: Low-Latency LLM Serving via Pairwise Learning-to-Rank

利用成对学习排序技术，在推理前优化请求排序，大幅降低大模型服务延迟。

arXiv:2510.03243v3 Announce Type: replace-cross Abstract: Efficient scheduling of large language model (LLM) inference tasks is critical for achieving…

llm 低延迟推理优化 learning-t 排序

🐂 牛哥精选