2025-02-19s
2025
You Do Not Fully Utilize Transformer's Representation Capacity
·4126 words·20 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 T-Tech HSE University Moscow Institute of Physics and Technology
Boosting Transformer performance, Layer-Integrated Memory (LIMe) enhances representation capacity by enabling access to earlier layers’ hidden states, significantly improving performance across variou…
MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections
·2116 words·10 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Beijing University of Posts and Telecommunications
MUDDFormer boosts Transformer performance by dynamically generating connection weights, improving cross-layer information flow and surpassing models trained with significantly more compute.