↓Skip to main content

🏢 School of Information Sciences, University of Illinois at Urbana-Champaign

Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters

26 September 2024·2559 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 School of Information Sciences, University of Illinois at Urbana-Champaign

New benchmark and jailbreak method exposes vulnerabilities of LLM moderation, achieving significantly higher success rates than existing methods.