HarDBench: A Benchmark for Draft-Based Co-Authoring Jailbreak Attacks for Safe Human-LLM Collaborative Writing
人机协作写作竟藏越狱风险?新基准揭示大模型安全新盲区
arXiv:2604.19274v2 Announce Type: replace Abstract: Large language models (LLMs) are increasingly used as co-authors in collaborative writing, where u…