1
Active Testing of Large Language Models via Approximate Neyman Allocation
提出主动测试方法降低大模型评估成本,基于近似Neyman分配实现高效采样
arXiv:2605.10075v2 Announce Type: replace Abstract: Large language models (LLMs) require reliable evaluation from pre-training to test-time scaling, m…