1
GPO: Learning from Critical Steps to Improve LLM Reasoning
通过识别推理中的关键步骤进行强化学习,GPO方法在多个基准上显著提升LLM推理能力
arXiv:2509.16456v3 Announce Type: replace Abstract: Large language models (LLMs) are increasingly used in various domains, showing impressive potentia…