1
AgenticEval: Toward Agentic and Self-Evolving Safety Evaluation of Large Language Models
提出动态自进化安全评估框架,解决大模型静态基准无法应对AI风险演变的问题。
arXiv:2509.26100v2 Announce Type: replace Abstract: The rapid integration of Large Language Models (LLMs) into high-stakes domains necessitates reliab…