1
PQR: A Framework to Generate Diverse and Realistic User Queries that Elicit QA Agent Failures
提出PQR框架自动生成多样真实用户查询,精准发现QA agent的失败边界,补足对抗性测试的盲区
arXiv:2605.16551v1 Announce Type: new Abstract: Evaluating LLM-based agents remains challenging because identifying meaningful failure cases often req…