牛哥精选 · 三个月

📋 全部 🤖 AI·大模型 ⚡ 效率工具 📝 深度技术 🚀 产品观察 💰 商业科技 🔓 开源项目 🎨 设计创意 📖 阅读推荐 🏷 资源合集 🌱 成长效率

🤖 AI·大模型 Dev.to 2026-07-12

GPT-5.6 Sol Matches Claude Fable 5 on Code Arena — For 40% Less

GPT-5.6在代码生成基准中与Claude Fable 5持平，成本仅需60%，开发者可优先选择更经济方案。

The benchmark numbers landed, and they are tight: GPT-5.6 Sol tied Claude Fable 5 on Code Arena — the standard coding agent evaluation — while costing…

gpt-5.6 claude fab code arena 代码生成成本效益

🤖 AI·大模型 arXiv AI 2026-07-09

Cost-Effective Agent Harnesses for Abstract Reasoning and Generalization on ARC-AGI-1

低成本智能体在ARC-AGI基准上实现惊人推理性能，兼顾效率与泛化。

arXiv:2607.06764v1 Announce Type: new Abstract: Recent progress on ARC-AGI-1 from disclosed architectures has come broadly from two regimes: heavy tes…

arc-agi-1 抽象推理泛化成本效益智能体

📝 深度技术 arXiv NLP 2026-06-12

Small LLMs for Biomedical Claim Verification: Cost-Effective Fine-Tuning, Structural Dataset Shortcuts, and Cross-Domain Generalization

用小语言模型低成本搞定生物医学声明验证，揭秘结构化数据集捷径与跨域泛化新发现。

arXiv:2606.12854v1 Announce Type: new Abstract: Large Language Models such as GPT-4o and GPT-5 achieve strong zero-shot performance on biomedical clai…

small llms 生物医学验证成本效益微调结构数据集捷径跨领域泛化

🤖 AI·大模型 IT 之家 2026-06-11

微软 CEO 纳德拉反思 AI 滥用：并非所有问题都需要最强模型

微软CEO纳德拉坦言：AI不是万能钥匙，盲目追求最强模型反而可能造成资源浪费，微软内部已开始反思与管控。

IT之家 6 月 11 日消息，萨提亚 · 纳德拉想对微软一众沉迷人工智能的员工说：并非所有问题都要动用性能最强的 AI 模型。在《纽约时报》旗下播客《Hard Fork》的现场录制活动中，有人向这位微软首席执行官提问，公司内部如今盛行多大程度的算力堆砌（tokenmaxxing）行为。主持人凯…

微软纳德拉反思滥用并非所有问题都需要最强模

⚡ 效率工具 Hacker News AI 2026-06-07

Ask HN: What do you currently use for AI coding (personal or professional)?

Hacker News热帖：开发者分享当前最爱的AI编码工具，从Copilot到开源方案，还有省钱切换小技巧。

Given how quickly things evolve, it's easy to get lost in the numerous offerings and hard to get the best deal. So, what do you use? Both clients/harn…

ai编码 github cop opencode openrouter 成本效益

📝 深度技术 arXiv NLP 2026-05-28

Adaptive Cost-Efficient Evaluation for Reliable Patent Claim Generation

想降低专利生成中的人工审核成本？这篇论文提出了自适应成本高效评估方法，兼顾可靠性与效率。

arXiv:2604.04295v3 Announce Type: replace Abstract: Automated patent claim validation demands low error tolerance. However, existing approaches face a…

专利生成自适应评估成本效益可靠性 nlp

📝 深度技术 arXiv 机器学习 2026-05-20

Augmenting Human Evaluation with LLM Judges: How Many Human Reviews Do You Need?

探讨如何用LLM评估人效，量化所需人类评审数量，高效平衡AI系统评估的成本与质量。

arXiv:2605.16354v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used as automated evaluators of AI systems, including in…

llm评估人工评估模型评估自动化评价成本效益

📝 深度技术 arXiv 计算机视觉 2026-05-20

Leveraging Unsupervised Learning for Cost-Effective Visual Anomaly Detection

探索无监督学习实现低成本视觉异常检测，新方法兼顾效率与精度。

arXiv:2409.15980v2 Announce Type: replace Abstract: Traditional machine learning-based visual inspection systems require extensive data collection and…

无监督学习视觉异常检测成本效益计算机视觉论文

🤖 AI·大模型 arXiv AI 2026-05-19

CompactQE: Interpretable Translation Quality Estimation via Small Open-Weight LLMs

用小型开源LLM实现可解释的翻译质量评估，兼顾隐私与成本，性能媲美大模型。

arXiv:2605.15763v1 Announce Type: cross Abstract: Current state-of-the-art Quality Estimation (QE) in machine translation relies on massive, proprieta…

紧凑qe 翻译质量估计小型开源llm 可解释性隐私保护

📅 日期

2026-05-20 2026-05-19

🐂 牛哥精选

📅 日期