牛哥精选 · 所有

1

🔓 开源项目 Hacker News LLM 2026-07-14

Oya – Keep tool outputs away from the LLM to cut tokens and stop injection

Oya项目革新工具调用方式，省10倍token、提速3.5倍，且天然防注入，两行代码即可迁移。

Article URL: https://github.com/OyaAIProd/oya Comments URL: https://news.ycombinator.com/item?id=48907336 Points: 1 # Comments: 0

oya 工具调用 token优化注入防护确定性

2

🤖 AI·大模型 Dev.to 2026-07-13

Muse Spark 1.1 + GPT-5.6 launches; Rust 1.97 ships

Meta与OpenAI、Rust同日上新：百万Token代理模型、GPT-5.6三档定价、Rust 1.97稳定版发布，技术圈大事件速览。

This week, AI Gateway became the de facto routing layer for serious agentic workloads—Meta and OpenAI both landed major model releases there, and the …

muse spark gpt-5.6 rust 1.97 代理模型结构化输出

3

🤖 AI·大模型 Hacker News LLM 2026-07-09

Probelock – lockfile for LLM tool calling

像package-lock锁定包版本一样，Probelock用lockfile锁定LLM工具调用的行为一致性，避免量化或版本升级后功能回归。

Article URL: https://github.com/kelkalot/probelock Comments URL: https://news.ycombinator.com/item?id=48842075 Points: 1 # Comments: 0

probelock llm工具调用 lockfile 回归测试行为一致性

4

🤖 AI·大模型 arXiv AI 2026-07-07

SPORK: Self-Speculative Forking to Accelerate Agentic LLM Inference

自推测分支技术让LLM agent在等待工具返回时预生成后续推理，大幅减少GPU空闲时间，提升推理效率。

arXiv:2607.03333v1 Announce Type: cross Abstract: LLM agents are becoming a common interface for research, coding, and question answering, yet their T…

llm agents 推理加速推测执行 gpu利用率工具调用

5

📝 深度技术 Dev.to 2026-07-07

Your PII redactor probably leaks tool-call arguments

研究发现多数PII编辑工具在处理工具调用参数时存在泄露风险，利用458项多语言数据集验证了跨场景漏洞。

Most "redact PII before the LLM" tools scan the chat message text and stop there. That was fine when an LLM call was one string in, one string out. It…

pii 数据泄露工具调用隐私安全多模态

6

🤖 AI·大模型 arXiv NLP 2026-07-07

SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning

一种通过推测性感知与规划加速多模态大模型推理的新方法，显著降低顺序调用开销

arXiv:2603.23483v2 Announce Type: replace-cross Abstract: Agentic multimodal large language models (MLLMs) (e.g., OpenAI o3 and Gemini Agentic Vision)…

多模态大模型代理式ai 推测性推理推理加速视觉工具调用

7

🤖 AI·大模型 arXiv AI 2026-07-07

ToolFailBench: Diagnosing Tool-Use Failures in LLM Agents

新基准ToolFailBench精准诊断LLM agent工具调用失败原因，揭示聚合分数下的隐藏缺陷。

arXiv:2607.04686v1 Announce Type: cross Abstract: Tool calling is central to modern language model agents, but aggregate benchmark scores often hide w…

toolfailbe llm agents 工具调用失败诊断基准测试

8

🔓 开源项目 Hacker News LLM 2026-07-03

PrivAiTe: Self-hosted proxy that redacts PII from LLM calls, incl. tool-calls

一个自托管的隐私过滤代理，能在LLM调用中自动去除姓名、邮箱、密码、API密钥等PII，还支持自定义正则和工具调用场景。

Article URL: https://github.com/crp4222/PrivAiTe Comments URL: https://news.ycombinator.com/item?id=48776021 Points: 2 # Comments: 0

pii脱敏 llm安全自托管代理隐私保护正则表达式

9

📝 深度技术 arXiv AI 2026-07-03

Safeguarding LLM Agents from Misalignment through Provenance Analysis

提出用溯源分析检测LLM代理行为失配，防止工具调用偏离用户意图，提升AI安全性。

arXiv:2607.01236v1 Announce Type: cross Abstract: As LLM agents gain increasing access to powerful tools, ensuring that their actions are aligned with…

llm安全对齐溯源分析 ai代理工具调用

10

🤖 AI·大模型 Hacker News LLM 2026-07-02

LLM Colosseum – A zero-dependency browser RTS to test LLM tool calling

零依赖的浏览器RTS游戏，用趣味对战测试LLM工具调用能力，玩法与调试两不误。

Article URL: https://github.com/asp67/llm-colosseum/tree/main Comments URL: https://news.ycombinator.com/item?id=48752981 Points: 1 # Comments: 0

llm 工具调用浏览器rts 零依赖测试框架

11

🚀 产品观察 Hacker News AI 2026-06-30

Show HN: Not Another AI Platform

15岁少年自建AI工作空间，集工具调用、网络搜索、高级推理于一身，更用闪卡、邮件草稿等自定义元素刷新AI交互体验。

I'm 15 and this is my AI workspace platform that I have building as a side-project. I really wanted an AI platform that had all the latest features li…

15岁开发者 ai工作空间工具调用网络搜索 ui/ux设计

12

🔓 开源项目 Hacker News AI 2026-06-28

Cerberus – a local firewall for AI agents' tool calls

为AI代理工具调用提供安全边界，本地防火墙开源项目Cerberus可防范恶意操控风险。

Article URL: https://github.com/Adirdabush1/cerberus Comments URL: https://news.ycombinator.com/item?id=48704458 Points: 3 # Comments: 0

ai安全防火墙 ai代理工具调用开源

13

📝 深度技术 arXiv NLP 2026-06-25

Constraint Tax in Open-Weight LLMs: An Empirical Study of Tool Calling Suppression Under Structured Output Constraints

开源LLM在结构化输出约束下，工具调用能力被抑制的实证研究，揭示“约束税”现象。

arXiv:2606.25605v1 Announce Type: new Abstract: Tool Calling and Structured Output are two core capabilities of modern Agent systems, yet their intera…

llm 工具调用结构化输出约束实证研究

14

📝 深度技术 arXiv AI 2026-06-16

Looking Is Not Picking: An Attention-Segment Account of Tool-Selection Failures in LLM Agents

从注意力机制揭示LLM智能体选错工具的真实原因，颠覆“没看到正确工具”的直觉认知。

arXiv:2606.16364v1 Announce Type: new Abstract: LLM agents mis-call tools, and the natural guess is that the model failed to see the right tool in a c…

llm代理工具选择失败注意力机制 bfcl 模型分析

15

📝 深度技术 arXiv NLP 2026-06-12

HyperTool: Beyond Step-Wise Tool Calls for Tool-Augmented Agents

新框架HyperTool突破传统逐步工具调用，为Agent集成工具提供全新范式，实证表现优异。

arXiv:2606.13663v1 Announce Type: new Abstract: Tool-augmented LLM agents commonly rely on step-wise atomic tool calls, where each invocation, observa…

hypertool 工具增强型agent 逐步工具调用大模型agent 学术论文

16

🤖 AI·大模型 arXiv AI 2026-06-12

GeoNatureAgent Benchmark: Benchmarking LLM Agents for Environmental Geospatial Analysis Across Frontier and Open-Weight Foundation Models

首个评估大模型代理在环境地理空间分析中工具调用能力的基准，直击数据整理痛点。

arXiv:2606.12821v1 Announce Type: new Abstract: Environmental scientists spend disproportionate effort on data wrangling rather than analysis, and AI …

环境地理空间分析 llm代理基准测试工具调用大语言模型

17

🤖 AI·大模型 Dev.to 2026-06-12

Your MCP server can't take a file as an argument — here's why, and the fix

MCP服务器传递大文件时卡住？本文揭示原因并给出只需50 token的优雅解决方案

I built an MCP server that publishes HTML files, and I hit a wall I haven't seen documented anywhere: you can't pass a large file as an MCP tool argum…

mcp 文件参数大模型工具调用 token优化技术方案

18

🤖 AI·大模型 Hacker News LLM 2026-06-11

Ask HN: The next evolutionary step in LLM usage?

社区热议LLM使用演化路径：从聊天到自主智能体，下一步会是更高效的量化模型吗？

I'll keep this post short and sweet, we have seen several steps in the evolution of LLM (large language model) usage. 1. Chat 2. Autocomplete 3. Embed…

llm演进 agent智能体量化模型技术预测 2026趋势

19

🤖 AI·大模型 arXiv NLP 2026-06-10

Pushing the Limits of LLM Tool Calling via Experiential Knowledge Integration and Activation

通过经验知识集成与激活，大幅提升大语言模型的工具调用能力，创新性显著。

arXiv:2606.10875v1 Announce Type: new Abstract: Large language models (LLMs) rely on tool use to act as autonomous agents, yet often fail in multi-ste…

llm 工具调用经验知识知识集成激活机制

20

🤖 AI·大模型 arXiv AI 2026-06-10

T1-Bench: Benchmarking Multi-Scenario Agents in Real-World Domains

全新多场景代理基准T1-Bench，全面评测LLM在真实领域中的复杂交互能力

arXiv:2606.11070v1 Announce Type: cross Abstract: Recent advances in reasoning and tool-calling capabilities of large language models (LLMs) have enab…

llm代理基准多场景评测现实领域工具调用任务复杂度

🐂 牛哥精选