MirrorBench: A Benchmark to Evaluate Conversational User-Proxy Agents for Human-Likeness
KDD 2026发布最新基准MirrorBench,重新定义对话代理拟人化评估标准,推动人机交互研究新高度
arXiv:2601.08118v3 Announce Type: replace-cross Abstract: Large language models (LLMs) are increasingly used as human simulators, both for evaluating …