Skip to main content

Large Language Models

Mini-Sequence Transformers: Optimizing Intermediate Memory for Long Sequences Training
·2712 words·13 mins· loading · loading
Natural Language Processing Large Language Models 🏢 California Institute of Technology
MINI-SEQUENCE TRANSFORMER (MST) drastically reduces memory usage in LLM training by processing mini-sequences iteratively, enabling training with 12-24x longer sequences than conventional methods with…
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
·2998 words·15 mins· loading · loading
Large Language Models 🏢 Microsoft Corporation
MInference 1.0 accelerates LLM pre-filling via dynamic sparse attention, achieving up to 10x speedup on an A100 GPU while maintaining accuracy.
MindMerger: Efficiently Boosting LLM Reasoning in non-English Languages
·2639 words·13 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Shanghai Artificial Intelligence Laboratory
MindMerger efficiently boosts LLM reasoning in non-English languages by merging LLMs with external multilingual language understanding capabilities, achieving significant accuracy improvements, especi…
Mind's Eye of LLMs: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models
·4302 words·21 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Microsoft Research
LLMs’ spatial reasoning abilities are boosted by visualizing their thought processes via ‘Visualization-of-Thought’ prompting, significantly improving performance on navigation and tiling tasks.
Microstructures and Accuracy of Graph Recall by Large Language Models
·2595 words·13 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Cornell University
LLMs struggle with graph recall, exhibiting biases like favoring triangles and underperforming compared to humans; advanced models show striking domain dependence.
MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence
·2050 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Institute of Science and Technology Austria (ISTA)
MICROADAM: A new Adam optimizer variant dramatically cuts memory usage for training large language models without compromising accuracy or provable convergence.
MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map
·2608 words·13 mins· loading · loading
Large Language Models 🏢 Hong Kong Polytechnic University
MetaLA: Unified optimal linear approximation to softmax attention map, achieving linear complexity and surpassing existing models in various benchmarks.
Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving
·2343 words·11 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Google DeepMind
LLMs gain math skills via prompt-guided skill labeling and exemplar selection, significantly boosting accuracy.
MetaAligner: Towards Generalizable Multi-Objective Alignment of Language Models
·3742 words·18 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 University of Manchester
MetaAligner: a novel, policy-agnostic, and generalizable method for efficiently aligning LLMs to multiple objectives, even unseen ones, achieving significant and balanced improvements while saving up …
Mesa-Extrapolation: A Weave Position Encoding Method for Enhanced Extrapolation in LLMs
·3226 words·16 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Tsinghua University
Mesa-Extrapolation enhances LLM extrapolation using a novel weave position encoding method, boosting performance while significantly reducing memory and inference time.
MemoryFormer : Minimize Transformer Computation by Removing Fully-Connected Layers
·2036 words·10 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Peking University
MemoryFormer drastically cuts large language model computation by replacing fully-connected layers with memory-efficient hashing, enabling faster and more scalable AI.
Memory-Efficient LLM Training with Online Subspace Descent
·1794 words·9 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Texas at Austin
Online Subspace Descent: a novel memory-efficient LLM training algorithm guaranteed to converge, closing the performance gap with full-rank methods.
Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration
·2645 words·13 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Tsinghua University
SPV-MIA, a novel membership inference attack, significantly improves the accuracy of identifying training data in fine-tuned LLMs by using self-prompt calibration and probabilistic variation.
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
·1897 words·9 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Meta AI
MEGALODON: A new neural architecture for LLMs, enabling unlimited context length with improved efficiency and accuracy.
Meaningful Learning: Enhancing Abstract Reasoning in Large Language Models via Generic Fact Guidance
·2532 words·12 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Harbin Institute of Technology
Boosting LLMs’ abstract reasoning via ‘Meaningful Learning’: A new dataset and learning paradigm significantly enhance LLMs’ capacity for abstract reasoning, moving beyond simple memorization.
MatFormer: Nested Transformer for Elastic Inference
·3341 words·16 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Texas at Austin
MatFormer: Train one universal model, extract hundreds of accurate submodels for elastic inference!
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models
·2759 words·13 mins· loading · loading
Large Language Models 🏢 NVIDIA
MaskLLM learns efficient semi-structured sparsity in LLMs via end-to-end training, achieving significant speedup and memory reduction without sacrificing performance.
Many-shot Jailbreaking
·5721 words·27 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Anthropic
Long-context attacks easily manipulate LLMs by feeding hundreds of harmful examples, highlighting a critical vulnerability amplified by larger context windows.
Many-Shot In-Context Learning
·3209 words·16 mins· loading · loading
Large Language Models 🏢 Google DeepMind
Scaling up in-context learning using thousands of examples significantly boosts Large Language Model (LLM) performance, particularly for complex tasks. Novel training methods mitigate reliance on hum…
MAmmoTH2: Scaling Instructions from the Web
·2418 words·12 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Carnegie Mellon University
MAmmoTH2: Harvesting 10M web instructions for enhanced LLM reasoning!