🏢 School of Information Sciences, University of Illinois at Urbana-Champaign
Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters
·2559 words·13 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 School of Information Sciences, University of Illinois at Urbana-Champaign
New benchmark and jailbreak method exposes vulnerabilities of LLM moderation, achieving significantly higher success rates than existing methods.