🏢 Medical Artificial Intelligence Laboratory, Westlake University
The Curse of Depth in Large Language Models
·2429 words·12 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Medical Artificial Intelligence Laboratory, Westlake University
Deep layers in LLMs underperform due to Pre-Layer Normalization; LayerNorm Scaling resolves this by controlling output variance, significantly improving training efficiency.