Skip to main content

Large Language Models

MutaPLM: Protein Language Modeling for Mutation Explanation and Engineering
·2665 words·13 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Tsinghua University
MutaPLM: a novel protein language model, provides human-understandable mutation explanations and designs novel mutations with desirable properties using a unique protein delta network and chain-of-tho…
Multivariate Stochastic Dominance via Optimal Transport and Applications to Models Benchmarking
·1726 words·9 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Cornell University
This paper introduces an efficient multivariate stochastic dominance test using optimal transport, enabling robust model benchmarking by considering metric dependencies.
Multi-LLM Debate: Framework, Principals, and Interventions
·1604 words·8 mins· loading · loading
Natural Language Processing Large Language Models 🏢 ByteDance Research
Boosting LLM collaboration, this research introduces a novel theoretical framework for multi-LLM debate, revealing key principles like the effect of similar models and interventions to enhance accurac…
Multi-language Diversity Benefits Autoformalization
·1698 words·8 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Cambridge
Researchers created MMA, a large multilingual dataset of informal-formal mathematical pairs, leveraging a language model for reverse translation. Fine-tuned models achieved significantly improved aut…
Multi-Head Mixture-of-Experts
·2844 words·14 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Microsoft Research
Multi-Head Mixture-of-Experts (MH-MoE) drastically boosts large language model efficiency by activating almost all expert networks, achieving superior performance compared to existing Sparse Mixture-o…
MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs
·4032 words·19 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Chinese University of Hong Kong
MR-Ben: A new benchmark reveals LLMs’ meta-reasoning flaws, pushing the boundaries of AI evaluation beyond simple accuracy.
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts
·2790 words·14 mins· loading · loading
Natural Language Processing Large Language Models 🏢 National University of Singapore
MomentumSMoE boosts Sparse Mixture of Experts’ (SMoE) performance by integrating momentum, resulting in more stable training and robust models.
MoGU: A Framework for Enhancing Safety of LLMs While Preserving Their Usability
·2311 words·11 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Harbin Institute of Technology
MoGU: A framework dynamically balances safety and usability in LLMs by routing benign and malicious instructions to different LLM variants, leading to safer, more useful responses.
MoEUT: Mixture-of-Experts Universal Transformers
·2486 words·12 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Stanford University
MoEUT: Mixture-of-Experts Universal Transformers significantly improves the compute efficiency of Universal Transformers, making them competitive with standard Transformers in large-scale language mod…
Model Fusion through Bayesian Optimization in Language Model Fine-Tuning
·3140 words·15 mins· loading · loading
Large Language Models 🏢 KAIST
Bayesian Optimization Model Fusion (BOMF) significantly boosts language model fine-tuning by optimizing both loss and metrics through multi-objective Bayesian optimization, yielding considerable perfo…
Model Decides How to Tokenize: Adaptive DNA Sequence Tokenization with MxDNA
·2786 words·14 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Shanghai Artificial Intelligence Laboratory
MxDNA: Model learns optimal DNA tokenization via gradient descent, outperforming existing methods.
Mobility-LLM: Learning Visiting Intentions and Travel Preference from Human Mobility Data with Large Language Models
·2805 words·14 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Beijing Jiaotong University
Mobility-LLM leverages LLMs to analyze human mobility data from check-in sequences, significantly outperforming existing models in location prediction, user identification, and time prediction tasks.
MKGL: Mastery of a Three-Word Language
·2110 words·10 mins· loading · loading
Large Language Models 🏢 Zhejiang University
Researchers taught a large language model (LLM) a three-word ‘Knowledge Graph Language’ (KGL) to improve knowledge graph (KG) completion, drastically reducing errors compared to other methods.
Mixture of Tokens: Continuous MoE through Cross-Example Aggregation
·1989 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Warsaw
Mixture of Tokens (MoT) achieves 3x faster LLM training than dense Transformers and matches state-of-the-art MoE performance via continuous token mixing.
Mixture of Scales: Memory-Efficient Token-Adaptive Binarization for Large Language Models
·1888 words·9 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Seoul National University
BinaryMoS: a novel token-adaptive binarization method that boosts LLM accuracy and efficiency by dynamically merging multiple scaling experts for each token.
Mixture of In-Context Experts Enhance LLMs' Long Context Awareness
·2372 words·12 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Gaoling School of Artificial Intelligence, Renmin University of China
MoICE, a novel plug-in, significantly enhances LLMs’ long context awareness by dynamically routing attention using multiple RoPE angles, achieving superior performance with high inference efficiency.
Mixture of Demonstrations for In-Context Learning
·1953 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Virginia
MoD, a novel Mixture of Demonstrations framework, enhances in-context learning by partitioning demonstration pools and employing expert-wise training, achieving state-of-the-art performance.
MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures
·3103 words·15 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 National University of Singapore
MixEval revolutionizes LLM benchmarking by blending real-world user queries with existing datasets, creating a cost-effective, unbiased, and dynamic evaluation method.
Mitigating Reward Overoptimization via Lightweight Uncertainty Estimation
·1697 words·8 mins· loading · loading
Natural Language Processing Large Language Models 🏢 ByteDance Research
ADVPO, a novel method, tackles reward overoptimization in RLHF via a lightweight uncertainty quantification approach, resulting in enhanced LLM performance and alignment with human values.
MiniCache: KV Cache Compression in Depth Dimension for Large Language Models
·3037 words·15 mins· loading · loading
Natural Language Processing Large Language Models 🏢 ZIP Lab, Monash University
MiniCache: A novel approach to drastically reduce LLM KV cache memory footprint.