Skip to main content

🏢 Mila & Université De Montréal

Forgetting Transformer: Softmax Attention with a Forget Gate
·4225 words·20 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Mila & Université De Montréal
Transformers get forgetful! This paper introduces the Forgetting Transformer (FoX), incorporating a forget gate into the attention mechanism for improved sequence modeling.