Skip to main content

🏢 Simplex

Transformers Represent Belief State Geometry in their Residual Stream
·1739 words·9 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Simplex
Transformers encode information beyond next-token prediction by linearly representing belief state geometry in their residual stream, even with complex fractal structures.