1
ExploitBench: A Capability Ladder Benchmark for LLM Cybersecurity Agents
打破二进制安全评估,以能力阶梯基准衡量LLM从触发漏洞到完全控制目标的渐进利用能力
arXiv:2605.14153v1 Announce Type: cross Abstract: Exploitation is not a binary event. It is a ladder of acquiring progressive capabilities, from execu…