AMEL: Accumulated Message Effects on LLM Judgments
大语言模型作为自动评估者,会被对话历史中的观点极性所影响,一项覆盖7.5万样本的研究揭示了这种“累积消息效应”
arXiv:2605.22714v1 Announce Type: cross Abstract: Large language models are routinely used as automated evaluators: to review code, moderate content, …