1
PBT-Bench: Benchmarking AI Agents on Property-Based Testing
首个聚焦属性基测试的AI基准,评估智能体从文档推导不变量并生成精确随机搜索策略的能力。
arXiv:2605.15229v1 Announce Type: cross Abstract: Existing code benchmarks measure whether an agent can produce any test that reproduces a known bug, …