Natural Language Processing

MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts

26 September 2024·2790 words·14 mins· loading · loading

Natural Language Processing Large Language Models 🏢 National University of Singapore

MomentumSMoE boosts Sparse Mixture of Experts’ (SMoE) performance by integrating momentum, resulting in more stable training and robust models.

MoGU: A Framework for Enhancing Safety of LLMs While Preserving Their Usability

26 September 2024·2311 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Harbin Institute of Technology

MoGU: A framework dynamically balances safety and usability in LLMs by routing benign and malicious instructions to different LLM variants, leading to safer, more useful responses.

MoEUT: Mixture-of-Experts Universal Transformers

26 September 2024·2486 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Stanford University

MoEUT: Mixture-of-Experts Universal Transformers significantly improves the compute efficiency of Universal Transformers, making them competitive with standard Transformers in large-scale language mod…

Model Decides How to Tokenize: Adaptive DNA Sequence Tokenization with MxDNA

26 September 2024·2786 words·14 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Shanghai Artificial Intelligence Laboratory

MxDNA: Model learns optimal DNA tokenization via gradient descent, outperforming existing methods.

Mobility-LLM: Learning Visiting Intentions and Travel Preference from Human Mobility Data with Large Language Models

26 September 2024·2805 words·14 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Beijing Jiaotong University

Mobility-LLM leverages LLMs to analyze human mobility data from check-in sequences, significantly outperforming existing models in location prediction, user identification, and time prediction tasks.

Mixture of Tokens: Continuous MoE through Cross-Example Aggregation

26 September 2024·1989 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Warsaw

Mixture of Tokens (MoT) achieves 3x faster LLM training than dense Transformers and matches state-of-the-art MoE performance via continuous token mixing.

Mixture of Scales: Memory-Efficient Token-Adaptive Binarization for Large Language Models

26 September 2024·1888 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Seoul National University

BinaryMoS: a novel token-adaptive binarization method that boosts LLM accuracy and efficiency by dynamically merging multiple scaling experts for each token.

Mixture of In-Context Experts Enhance LLMs' Long Context Awareness

26 September 2024·2372 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Gaoling School of Artificial Intelligence, Renmin University of China

MoICE, a novel plug-in, significantly enhances LLMs’ long context awareness by dynamically routing attention using multiple RoPE angles, achieving superior performance with high inference efficiency.

Mixture of Demonstrations for In-Context Learning

26 September 2024·1953 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Virginia

MoD, a novel Mixture of Demonstrations framework, enhances in-context learning by partitioning demonstration pools and employing expert-wise training, achieving state-of-the-art performance.

MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures

26 September 2024·3103 words·15 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 National University of Singapore

MixEval revolutionizes LLM benchmarking by blending real-world user queries with existing datasets, creating a cost-effective, unbiased, and dynamic evaluation method.

Mitigating Reward Overoptimization via Lightweight Uncertainty Estimation

26 September 2024·1697 words·8 mins· loading · loading

Natural Language Processing Large Language Models 🏢 ByteDance Research

ADVPO, a novel method, tackles reward overoptimization in RLHF via a lightweight uncertainty quantification approach, resulting in enhanced LLM performance and alignment with human values.

MiniCache: KV Cache Compression in Depth Dimension for Large Language Models

26 September 2024·3037 words·15 mins· loading · loading

Natural Language Processing Large Language Models 🏢 ZIP Lab, Monash University

MiniCache: A novel approach to drastically reduce LLM KV cache memory footprint.

Mini-Sequence Transformers: Optimizing Intermediate Memory for Long Sequences Training

26 September 2024·2712 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 California Institute of Technology

MINI-SEQUENCE TRANSFORMER (MST) drastically reduces memory usage in LLM training by processing mini-sequences iteratively, enabling training with 12-24x longer sequences than conventional methods with…

MindMerger: Efficiently Boosting LLM Reasoning in non-English Languages

26 September 2024·2639 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Shanghai Artificial Intelligence Laboratory

MindMerger efficiently boosts LLM reasoning in non-English languages by merging LLMs with external multilingual language understanding capabilities, achieving significant accuracy improvements, especi…

Mind's Eye of LLMs: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models

26 September 2024·4302 words·21 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Microsoft Research

LLMs’ spatial reasoning abilities are boosted by visualizing their thought processes via ‘Visualization-of-Thought’ prompting, significantly improving performance on navigation and tiling tasks.

Microstructures and Accuracy of Graph Recall by Large Language Models

26 September 2024·2595 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Cornell University

LLMs struggle with graph recall, exhibiting biases like favoring triangles and underperforming compared to humans; advanced models show striking domain dependence.

MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence

26 September 2024·2050 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Institute of Science and Technology Austria (ISTA)

MICROADAM: A new Adam optimizer variant dramatically cuts memory usage for training large language models without compromising accuracy or provable convergence.

Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving

26 September 2024·2343 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Google DeepMind

LLMs gain math skills via prompt-guided skill labeling and exemplar selection, significantly boosting accuracy.

MetaAligner: Towards Generalizable Multi-Objective Alignment of Language Models

26 September 2024·3742 words·18 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 University of Manchester

MetaAligner: a novel, policy-agnostic, and generalizable method for efficiently aligning LLMs to multiple objectives, even unseen ones, achieving significant and balanced improvements while saving up …

Meta-Diffu$B$: A Contextualized Sequence-to-Sequence Text Diffusion Model with Meta-Exploration

26 September 2024·3164 words·15 mins· loading · loading

AI Generated Natural Language Processing Text Generation 🏢 University of Washington

Meta-DiffuB enhances sequence-to-sequence text diffusion models by using meta-exploration to learn a contextualized noise schedule, resulting in state-of-the-art performance.