SAFE: An LLM-as-Verifier Framework for Evidence-Grounded Multi-Hop Reasoning
LLM化身验证器,用证据链提升多跳推理的准确性与可信度
arXiv:2604.01993v2 Announce Type: replace-cross Abstract: Multi-hop QA benchmarks often reward Large Language Models (LLMs) for spurious correctness, …