1
On the Sensitivity of Instruction-tuned LLMs to Harmful Sentences in Long Inputs
最新研究揭示指令微调LLM在长上下文输入中,对有害句子的敏感性存在显著风险。
arXiv:2510.05864v2 Announce Type: replace Abstract: Large language models (LLMs) increasingly operate on long inputs, yet their behavior when harmful …