Skip to main content

🏢 Medical Artificial Intelligence Laboratory, Westlake University

The Curse of Depth in Large Language Models
·2429 words·12 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Medical Artificial Intelligence Laboratory, Westlake University
Deep layers in LLMs underperform due to Pre-Layer Normalization; LayerNorm Scaling resolves this by controlling output variance, significantly improving training efficiency.