🏢 Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology
How does Architecture Influence the Base Capabilities of Pre-trained Language Models? A Case Study Based on FFN-Wider and MoE Transformers
·3480 words·17 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology
Pre-trained language models’ base capabilities are significantly influenced by architecture, not just scale; a novel Combination Enhanced Architecture (CEA) improves performance by addressing FFN-Wide…