Skip to main content

🏢 Gray Swan AI

Improving Alignment and Robustness with Circuit Breakers
·2515 words·12 mins· loading · loading
AI Theory Safety 🏢 Gray Swan AI
AI systems are made safer by ‘circuit breakers’ that directly control harmful internal representations, significantly improving alignment and robustness against adversarial attacks with minimal impact…