1
Targeted Tests for LLM Reasoning: An Audit-Constrained Protocol
提出审计约束协议,精准测试LLM推理对提示变化的脆弱性,避免错误归因。
arXiv:2605.11599v2 Announce Type: replace Abstract: Fixed reasoning benchmarks evaluate canonical prompts, but semantically valid changes in presentat…