Skip to main content

🏢 Seed-Foundation-Model Team, Bytedance

Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling
·3794 words·18 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Seed-Foundation-Model Team, Bytedance
Boosting Large Language Model (LLM) performance, researchers introduce Over-Tokenized Transformers, decoupling input/output vocabularies to improve language modeling. Scaling input vocabularies improv…