1
When Models Disagree: Rethinking LLM Evaluation for Public Comment Analysis
探讨LLM在公共评论分析中的评估困境,当模型意见分歧时,传统准确性指标失效,需重新思考评价体系
arXiv:2605.29025v1 Announce Type: new Abstract: Federal agencies are deploying large language models (LLMs) to categorize public comment corpora, wher…