Large Language Models

MutaPLM: Protein Language Modeling for Mutation Explanation and Engineering

26 September 2024·2665 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Tsinghua University

MutaPLM: a novel protein language model, provides human-understandable mutation explanations and designs novel mutations with desirable properties using a unique protein delta network and chain-of-tho…

Multivariate Stochastic Dominance via Optimal Transport and Applications to Models Benchmarking

26 September 2024·1726 words·9 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Cornell University

This paper introduces an efficient multivariate stochastic dominance test using optimal transport, enabling robust model benchmarking by considering metric dependencies.

Multi-LLM Debate: Framework, Principals, and Interventions

26 September 2024·1604 words·8 mins· loading · loading

Natural Language Processing Large Language Models 🏢 ByteDance Research

Boosting LLM collaboration, this research introduces a novel theoretical framework for multi-LLM debate, revealing key principles like the effect of similar models and interventions to enhance accurac…

Multi-language Diversity Benefits Autoformalization

26 September 2024·1698 words·8 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Cambridge

Researchers created MMA, a large multilingual dataset of informal-formal mathematical pairs, leveraging a language model for reverse translation. Fine-tuned models achieved significantly improved aut…

Multi-Head Mixture-of-Experts

26 September 2024·2844 words·14 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Microsoft Research

Multi-Head Mixture-of-Experts (MH-MoE) drastically boosts large language model efficiency by activating almost all expert networks, achieving superior performance compared to existing Sparse Mixture-o…

MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs

26 September 2024·4032 words·19 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Chinese University of Hong Kong

MR-Ben: A new benchmark reveals LLMs’ meta-reasoning flaws, pushing the boundaries of AI evaluation beyond simple accuracy.

MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts

26 September 2024·2790 words·14 mins· loading · loading

Natural Language Processing Large Language Models 🏢 National University of Singapore

MomentumSMoE boosts Sparse Mixture of Experts’ (SMoE) performance by integrating momentum, resulting in more stable training and robust models.

MoGU: A Framework for Enhancing Safety of LLMs While Preserving Their Usability

26 September 2024·2311 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Harbin Institute of Technology

MoGU: A framework dynamically balances safety and usability in LLMs by routing benign and malicious instructions to different LLM variants, leading to safer, more useful responses.

MoEUT: Mixture-of-Experts Universal Transformers

26 September 2024·2486 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Stanford University

MoEUT: Mixture-of-Experts Universal Transformers significantly improves the compute efficiency of Universal Transformers, making them competitive with standard Transformers in large-scale language mod…

Model Fusion through Bayesian Optimization in Language Model Fine-Tuning

26 September 2024·3140 words·15 mins· loading · loading

Large Language Models 🏢 KAIST

Bayesian Optimization Model Fusion (BOMF) significantly boosts language model fine-tuning by optimizing both loss and metrics through multi-objective Bayesian optimization, yielding considerable perfo…

Model Decides How to Tokenize: Adaptive DNA Sequence Tokenization with MxDNA

26 September 2024·2786 words·14 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Shanghai Artificial Intelligence Laboratory

MxDNA: Model learns optimal DNA tokenization via gradient descent, outperforming existing methods.

Mobility-LLM: Learning Visiting Intentions and Travel Preference from Human Mobility Data with Large Language Models

26 September 2024·2805 words·14 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Beijing Jiaotong University

Mobility-LLM leverages LLMs to analyze human mobility data from check-in sequences, significantly outperforming existing models in location prediction, user identification, and time prediction tasks.

MKGL: Mastery of a Three-Word Language

26 September 2024·2110 words·10 mins· loading · loading

Large Language Models 🏢 Zhejiang University

Researchers taught a large language model (LLM) a three-word ‘Knowledge Graph Language’ (KGL) to improve knowledge graph (KG) completion, drastically reducing errors compared to other methods.

Mixture of Tokens: Continuous MoE through Cross-Example Aggregation

26 September 2024·1989 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Warsaw

Mixture of Tokens (MoT) achieves 3x faster LLM training than dense Transformers and matches state-of-the-art MoE performance via continuous token mixing.

Mixture of Scales: Memory-Efficient Token-Adaptive Binarization for Large Language Models

26 September 2024·1888 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Seoul National University

BinaryMoS: a novel token-adaptive binarization method that boosts LLM accuracy and efficiency by dynamically merging multiple scaling experts for each token.

Mixture of In-Context Experts Enhance LLMs' Long Context Awareness

26 September 2024·2372 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Gaoling School of Artificial Intelligence, Renmin University of China

MoICE, a novel plug-in, significantly enhances LLMs’ long context awareness by dynamically routing attention using multiple RoPE angles, achieving superior performance with high inference efficiency.

Mixture of Demonstrations for In-Context Learning

26 September 2024·1953 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Virginia

MoD, a novel Mixture of Demonstrations framework, enhances in-context learning by partitioning demonstration pools and employing expert-wise training, achieving state-of-the-art performance.

MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures

26 September 2024·3103 words·15 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 National University of Singapore

MixEval revolutionizes LLM benchmarking by blending real-world user queries with existing datasets, creating a cost-effective, unbiased, and dynamic evaluation method.

Mitigating Reward Overoptimization via Lightweight Uncertainty Estimation

26 September 2024·1697 words·8 mins· loading · loading

Natural Language Processing Large Language Models 🏢 ByteDance Research

ADVPO, a novel method, tackles reward overoptimization in RLHF via a lightweight uncertainty quantification approach, resulting in enhanced LLM performance and alignment with human values.

MiniCache: KV Cache Compression in Depth Dimension for Large Language Models

26 September 2024·3037 words·15 mins· loading · loading

Natural Language Processing Large Language Models 🏢 ZIP Lab, Monash University

MiniCache: A novel approach to drastically reduce LLM KV cache memory footprint.