1
Implicit Hierarchical GRPO: Decoupling Tool Invocation from Execution for Tool-Integrated Mathematical Reasoning
将工具调用与执行解耦,提出隐式层次化GRPO框架,显著提升数学推理中的工具集成效率与泛化能力。
arXiv:2605.18500v1 Announce Type: new Abstract: Large language models (LLMs) have increasingly leveraged tool invocation to enhance their reasoning ca…