🏢 Institute for Language, Cognition and Computation, University of Edinburgh
Spectral Editing of Activations for Large Language Model Alignment
·2511 words·12 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Institute for Language, Cognition and Computation, University of Edinburgh
Spectral Editing of Activations (SEA) improves large language model truthfulness and fairness by projecting input representations to maximize covariance with positive demonstrations while minimizing c…