1
MENTOR: A Metacognition-Driven Self-Evolution Framework for Uncovering and Mitigating Implicit Domain Risks in LLMs
揭示大模型隐式领域风险,提出元认知驱动自我进化框架MENTOR,并构建多领域标注数据集。
arXiv:2511.07107v3 Announce Type: replace Abstract: Ensuring the safety of Large Language Models (LLMs) is critical for real-world deployment. However…