Skip to main content

🏢 Institute for Language, Cognition and Computation, University of Edinburgh

Spectral Editing of Activations for Large Language Model Alignment
·2511 words·12 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Institute for Language, Cognition and Computation, University of Edinburgh
Spectral Editing of Activations (SEA) improves large language model truthfulness and fairness by projecting input representations to maximize covariance with positive demonstrations while minimizing c…