🏢 CNRS@CREATE LTD
Approximation Rate of the Transformer Architecture for Sequence Modeling
·1599 words·8 mins·
loading
·
loading
Machine Learning
Deep Learning
🏢 CNRS@CREATE LTD
This paper unveils the Transformer’s approximation power, deriving explicit Jackson-type rates to reveal its strengths and limitations in handling various sequential relationships.