↓Skip to main content

🏢 Institute for Language, Cognition and Computation, University of Edinburgh

Spectral Editing of Activations for Large Language Model Alignment

26 September 2024·2511 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Institute for Language, Cognition and Computation, University of Edinburgh

Spectral Editing of Activations (SEA) improves large language model truthfulness and fairness by projecting input representations to maximize covariance with positive demonstrations while minimizing c…