🏢 Gray Swan AI
Improving Alignment and Robustness with Circuit Breakers
·2515 words·12 mins·
loading
·
loading
AI Theory
Safety
🏢 Gray Swan AI
AI systems are made safer by ‘circuit breakers’ that directly control harmful internal representations, significantly improving alignment and robustness against adversarial attacks with minimal impact…