1
PaperBench: Evaluating AI’s Ability to Replicate AI Research
OpenAI发布PaperBench,评估AI复制前沿AI研究的能力,考验智能体从论文到代码实现的完整流程。
We introduce PaperBench, a benchmark evaluating the ability of AI agents to replicate state-of-the-art AI research.
OpenAI发布PaperBench,评估AI复制前沿AI研究的能力,考验智能体从论文到代码实现的完整流程。
We introduce PaperBench, a benchmark evaluating the ability of AI agents to replicate state-of-the-art AI research.