↓Skip to main content

🏢 International School for Advanced Studies

A distributional simplicity bias in the learning dynamics of transformers

26 September 2024·2474 words·12 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 International School for Advanced Studies

Transformers learn increasingly complex language patterns sequentially, starting with simpler interactions before mastering higher-order ones.