Interpretability
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
·3290 words·16 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
AI Theory
Interpretability
🏢 AIRI
LLMs’ reasoning is decoded via sparse autoencoders, revealing key features that, when steered, enhance performance. First mechanistic account of reasoning in LLMs!
Mixture of Experts Made Intrinsically Interpretable
·3052 words·15 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
AI Theory
Interpretability
🏢 University of Oxford
MoE-X: An intrinsically interpretable Mixture-of-Experts language model that uses sparse, wide networks to enhance transparency.
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers
·1710 words·9 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
AI Theory
Interpretability
🏢 AIRI
LLMs use punctuation in context memory, surprisingly boosting performance by using seemingly trivial tokens.