Skip to main content

🏢 FAIR, Meta

Transformers without Normalization
·4050 words·20 mins· loading · loading
AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 FAIR, Meta
Transformers can achieve state-of-the-art performance without normalization layers via Dynamic Tanh (DyT), offering a simpler and more efficient alternative.