1
Efficient LLM Moderation with Multi-Layer Latent Prototypes
提出多层潜在原型方法,高效提升LLM内容审核的准确性与速度
arXiv:2502.16174v4 Announce Type: replace Abstract: Although modern LLMs are aligned with human values during post-training, robust moderation remains…