1
How reliable are LLMs when it comes to playing dice?
用掷骰子测试LLM的“概率直觉”,揭示大模型在简单随机任务中的可靠性短板。
arXiv:2606.07515v1 Announce Type: cross Abstract: We investigate the probabilistic reasoning capabilities of large language models through a controlle…