↓Skip to main content

🏢 CNRS@CREATE LTD

Approximation Rate of the Transformer Architecture for Sequence Modeling

26 September 2024·1599 words·8 mins· loading · loading

Machine Learning Deep Learning 🏢 CNRS@CREATE LTD

This paper unveils the Transformer’s approximation power, deriving explicit Jackson-type rates to reveal its strengths and limitations in handling various sequential relationships.