1
When Tools Fail: Benchmarking Dynamic Replanning and Anomaly Recovery in LLM Agents
别再只看“快乐路径”了!新基准ToolMaze专测LLM代理在工具故障时的动态重规划与异常恢复能力。
arXiv:2606.05806v1 Announce Type: new Abstract: Existing benchmarks evaluate Tool-Integrated Reasoning (TIR) in LLMs on idealized ''happy paths'', lar…