🏢 Mila & Université De Montréal
Forgetting Transformer: Softmax Attention with a Forget Gate
·4225 words·20 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Mila & Université De Montréal
Transformers get forgetful! This paper introduces the Forgetting Transformer (FoX), incorporating a forget gate into the attention mechanism for improved sequence modeling.