Skip to main content

🏢 School of Information Sciences, University of Illinois at Urbana-Champaign

Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters
·2559 words·13 mins· loading · loading
Natural Language Processing Large Language Models 🏢 School of Information Sciences, University of Illinois at Urbana-Champaign
New benchmark and jailbreak method exposes vulnerabilities of LLM moderation, achieving significantly higher success rates than existing methods.