牛哥精选 · 本月

🔓 开源项目 Hacker News AI 2026-05-20

LLM INQUISITOR: Evaluating how AI models handle long, realistic tasks

开源项目LLM INQUISITOR提供真实场景下的LLM行为评估框架，专注长任务和实用性，而非基准测试。

Article URL: https://github.com/AssimilatedHuman/LLM-Inquisitor Comments URL: https://news.ycombinator.com/item?id=48207330 Points: 1 # Comments: 0

llm评估真实任务行为评估开源工具 ai模型

2026-05-20 2026-05-19