1
Toxic Subword Pruning for Dialogue Response Generation on Large Language Models
LLM对话生成新防御:剪除有毒子词,提升模型安全性
arXiv:2410.04155v2 Announce Type: replace Abstract: How to defend large language models (LLMs) from generating toxic content is an important research …