🏢 International School for Advanced Studies
A distributional simplicity bias in the learning dynamics of transformers
·2474 words·12 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 International School for Advanced Studies
Transformers learn increasingly complex language patterns sequentially, starting with simpler interactions before mastering higher-order ones.