↓Skip to main content

🏢 Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology

How does Architecture Influence the Base Capabilities of Pre-trained Language Models? A Case Study Based on FFN-Wider and MoE Transformers

26 September 2024·3480 words·17 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology

Pre-trained language models’ base capabilities are significantly influenced by architecture, not just scale; a novel Combination Enhanced Architecture (CEA) improves performance by addressing FFN-Wide…