Large Language Models

Mini-Sequence Transformers: Optimizing Intermediate Memory for Long Sequences Training

26 September 2024·2712 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 California Institute of Technology

MINI-SEQUENCE TRANSFORMER (MST) drastically reduces memory usage in LLM training by processing mini-sequences iteratively, enabling training with 12-24x longer sequences than conventional methods with…

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

26 September 2024·2998 words·15 mins· loading · loading

Large Language Models 🏢 Microsoft Corporation

MInference 1.0 accelerates LLM pre-filling via dynamic sparse attention, achieving up to 10x speedup on an A100 GPU while maintaining accuracy.

MindMerger: Efficiently Boosting LLM Reasoning in non-English Languages

26 September 2024·2639 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Shanghai Artificial Intelligence Laboratory

MindMerger efficiently boosts LLM reasoning in non-English languages by merging LLMs with external multilingual language understanding capabilities, achieving significant accuracy improvements, especi…

Mind's Eye of LLMs: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models

26 September 2024·4302 words·21 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Microsoft Research

LLMs’ spatial reasoning abilities are boosted by visualizing their thought processes via ‘Visualization-of-Thought’ prompting, significantly improving performance on navigation and tiling tasks.

Microstructures and Accuracy of Graph Recall by Large Language Models

26 September 2024·2595 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Cornell University

LLMs struggle with graph recall, exhibiting biases like favoring triangles and underperforming compared to humans; advanced models show striking domain dependence.

MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence

26 September 2024·2050 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Institute of Science and Technology Austria (ISTA)

MICROADAM: A new Adam optimizer variant dramatically cuts memory usage for training large language models without compromising accuracy or provable convergence.

MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map

26 September 2024·2608 words·13 mins· loading · loading

Large Language Models 🏢 Hong Kong Polytechnic University

MetaLA: Unified optimal linear approximation to softmax attention map, achieving linear complexity and surpassing existing models in various benchmarks.

Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving

26 September 2024·2343 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Google DeepMind

LLMs gain math skills via prompt-guided skill labeling and exemplar selection, significantly boosting accuracy.

MetaAligner: Towards Generalizable Multi-Objective Alignment of Language Models

26 September 2024·3742 words·18 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 University of Manchester

MetaAligner: a novel, policy-agnostic, and generalizable method for efficiently aligning LLMs to multiple objectives, even unseen ones, achieving significant and balanced improvements while saving up …

Mesa-Extrapolation: A Weave Position Encoding Method for Enhanced Extrapolation in LLMs

26 September 2024·3226 words·16 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Tsinghua University

Mesa-Extrapolation enhances LLM extrapolation using a novel weave position encoding method, boosting performance while significantly reducing memory and inference time.

MemoryFormer : Minimize Transformer Computation by Removing Fully-Connected Layers

26 September 2024·2036 words·10 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Peking University

MemoryFormer drastically cuts large language model computation by replacing fully-connected layers with memory-efficient hashing, enabling faster and more scalable AI.

Memory-Efficient LLM Training with Online Subspace Descent

26 September 2024·1794 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Texas at Austin

Online Subspace Descent: a novel memory-efficient LLM training algorithm guaranteed to converge, closing the performance gap with full-rank methods.

Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration

26 September 2024·2645 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Tsinghua University

SPV-MIA, a novel membership inference attack, significantly improves the accuracy of identifying training data in fine-tuned LLMs by using self-prompt calibration and probabilistic variation.

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

26 September 2024·1897 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Meta AI

MEGALODON: A new neural architecture for LLMs, enabling unlimited context length with improved efficiency and accuracy.

Meaningful Learning: Enhancing Abstract Reasoning in Large Language Models via Generic Fact Guidance

26 September 2024·2532 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Harbin Institute of Technology

Boosting LLMs’ abstract reasoning via ‘Meaningful Learning’: A new dataset and learning paradigm significantly enhance LLMs’ capacity for abstract reasoning, moving beyond simple memorization.

MatFormer: Nested Transformer for Elastic Inference

26 September 2024·3341 words·16 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Texas at Austin

MatFormer: Train one universal model, extract hundreds of accurate submodels for elastic inference!

MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models

26 September 2024·2759 words·13 mins· loading · loading

Large Language Models 🏢 NVIDIA

MaskLLM learns efficient semi-structured sparsity in LLMs via end-to-end training, achieving significant speedup and memory reduction without sacrificing performance.

Many-shot Jailbreaking

26 September 2024·5721 words·27 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Anthropic

Long-context attacks easily manipulate LLMs by feeding hundreds of harmful examples, highlighting a critical vulnerability amplified by larger context windows.

Many-Shot In-Context Learning

26 September 2024·3209 words·16 mins· loading · loading

Large Language Models 🏢 Google DeepMind

Scaling up in-context learning using thousands of examples significantly boosts Large Language Model (LLM) performance, particularly for complex tasks. Novel training methods mitigate reliance on hum…

MAmmoTH2: Scaling Instructions from the Web

26 September 2024·2418 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Carnegie Mellon University

MAmmoTH2: Harvesting 10M web instructions for enhanced LLM reasoning!