🏢 Seed-Foundation-Model Team, Bytedance
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling
·3794 words·18 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Seed-Foundation-Model Team, Bytedance
Boosting Large Language Model (LLM) performance, researchers introduce Over-Tokenized Transformers, decoupling input/output vocabularies to improve language modeling. Scaling input vocabularies improv…