🏢 Simplex
Transformers Represent Belief State Geometry in their Residual Stream
·1739 words·9 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Simplex
Transformers encode information beyond next-token prediction by linearly representing belief state geometry in their residual stream, even with complex fractal structures.