1
LPDS: Evaluating LLM Robustness Through Logic-Preserving Difficulty Scaling
新方法LPDS通过保留逻辑改变实体,精准测试大模型鲁棒性,避免模型因细节变化而翻车。
arXiv:2605.15393v1 Announce Type: new Abstract: As large language models (LLMs) are increasingly deployed to perform tasks with minimal human oversigh…