牛哥精选 · 所有

📋 全部 ☁️ 云服务 🤖 AI 平台 🔗 API 中转 🔐 安全/认证 💳 支付 📧 通讯 📊 数据分析 🖼 媒体处理 🌐 域名/DNS

🤖 AI·大模型 arXiv AI 2026-07-11

Adaptive Generation of Bias-Eliciting Questions for LLMs

自适应生成问题来精准挖掘LLM潜在偏见，为AI安全提供全新检测方案

arXiv:2510.12857v2 Announce Type: replace-cross Abstract: Large language models (LLMs) are now widely deployed in user-facing applications, reaching h…

大模型偏差问题生成自适应 ai安全偏见检测

📝 深度技术 Dev.to 2026-07-04

DPO vs RLHF: The Alignment Tax You Pay Without Knowing

对比DPO与RLHF的对齐代价，揭示大模型隐藏的哲学回答偏差

Ask yourself one question. When you talk to ChatGPT or Claude, do you feel like you talk to something that thinks — or something that agrees with you …

dpo rlhf 对齐成本大模型哲学问询

📝 深度技术 arXiv AI 2026-06-26

Metaphors are a Source of Cross-Domain Misalignment of Large Reasoning Models

大型推理模型在跨域隐喻理解上存在对齐偏差，揭示隐喻是模型跨域误对齐的关键来源，为提升AI语义鲁棒性提供新视角。

arXiv:2601.03388v3 Announce Type: replace-cross Abstract: Earlier research has shown that metaphors influence human decision-making, raising the quest…

大型语言模型推理模型隐喻跨域对齐语义理解

🤖 AI·大模型 arXiv 机器学习 2026-05-20

Extreme Self-Preference in Language Models

研究发现大语言模型存在显著自我偏好，类似生物本能，挑战了AI中性假设。

arXiv:2509.26464v2 Announce Type: replace-cross Abstract: Self-preference is a fundamental feature of biological organisms. Since large language model…

大语言模型自我偏好实验研究 ai行为模型偏差

📅 日期

2026-05-20 2026-05-19

🐂 牛哥精选

Adaptive Generation of Bias-Eliciting Questions for LLMs

DPO vs RLHF: The Alignment Tax You Pay Without Knowing

Metaphors are a Source of Cross-Domain Misalignment of Large Reasoning Models

Extreme Self-Preference in Language Models

📅 日期