Natural Language Processing

Mesa-Extrapolation: A Weave Position Encoding Method for Enhanced Extrapolation in LLMs

26 September 2024·3226 words·16 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Tsinghua University

Mesa-Extrapolation enhances LLM extrapolation using a novel weave position encoding method, boosting performance while significantly reducing memory and inference time.

MemoryFormer : Minimize Transformer Computation by Removing Fully-Connected Layers

26 September 2024·2036 words·10 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Peking University

MemoryFormer drastically cuts large language model computation by replacing fully-connected layers with memory-efficient hashing, enabling faster and more scalable AI.

Memory-Efficient LLM Training with Online Subspace Descent

26 September 2024·1794 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Texas at Austin

Online Subspace Descent: a novel memory-efficient LLM training algorithm guaranteed to converge, closing the performance gap with full-rank methods.

Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration

26 September 2024·2645 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Tsinghua University

SPV-MIA, a novel membership inference attack, significantly improves the accuracy of identifying training data in fine-tuned LLMs by using self-prompt calibration and probabilistic variation.

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

26 September 2024·1897 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Meta AI

MEGALODON: A new neural architecture for LLMs, enabling unlimited context length with improved efficiency and accuracy.

MediQ: Question-Asking LLMs and a Benchmark for Reliable Interactive Clinical Reasoning

26 September 2024·2561 words·13 mins· loading · loading

Natural Language Processing Question Answering 🏢 University of Washington

MEDIQ benchmark revolutionizes LLM evaluation by shifting from static to interactive clinical reasoning, revealing LLMs’ struggles with proactive information-seeking and highlighting the importance of…

Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models

26 September 2024·2097 words·10 mins· loading · loading

Natural Language Processing Interpretability 🏢 MIT

New metrics and p-annealing improve sparse autoencoder training for better language model interpretability.

Meaningful Learning: Enhancing Abstract Reasoning in Large Language Models via Generic Fact Guidance

26 September 2024·2532 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Harbin Institute of Technology

Boosting LLMs’ abstract reasoning via ‘Meaningful Learning’: A new dataset and learning paradigm significantly enhance LLMs’ capacity for abstract reasoning, moving beyond simple memorization.

MatFormer: Nested Transformer for Elastic Inference

26 September 2024·3341 words·16 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Texas at Austin

MatFormer: Train one universal model, extract hundreds of accurate submodels for elastic inference!

Masked Hard-Attention Transformers Recognize Exactly the Star-Free Languages

26 September 2024·1697 words·8 mins· loading · loading

AI Generated Natural Language Processing AI Theory 🏢 University of Notre Dame

Masked hard-attention transformers, with strict masking, precisely capture star-free languages, matching the expressive power of linear temporal logic.

Many-shot Jailbreaking

26 September 2024·5721 words·27 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Anthropic

Long-context attacks easily manipulate LLMs by feeding hundreds of harmful examples, highlighting a critical vulnerability amplified by larger context windows.

MAmmoTH2: Scaling Instructions from the Web

26 September 2024·2418 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Carnegie Mellon University

MAmmoTH2: Harvesting 10M web instructions for enhanced LLM reasoning!

Make Your LLM Fully Utilize the Context

26 September 2024·2445 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Microsoft

FILM-7B, trained with Information-Intensive (IN2) training, significantly overcomes the ’lost-in-the-middle’ problem in long-context LLMs, enabling robust information retrieval from all context positi…

MagR: Weight Magnitude Reduction for Enhancing Post-Training Quantization

26 September 2024·1848 words·9 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 University at Albany, SUNY

MagR: a novel preprocessing technique boosts post-training quantization of LLMs by reducing weight magnitudes without inference overhead, achieving state-of-the-art performance.

MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization

26 September 2024·2236 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Washington

MAGNET, a novel adaptive gradient-based tokenization method, tackles multilingual language model bias by employing language-specific boundary predictors to achieve equitable segmentation across divers…

MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution

26 September 2024·3263 words·16 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 University of Hong Kong

MAGIS: A novel LLM-based multi-agent framework significantly boosts GitHub issue resolution by leveraging agent collaboration for planning and coding, achieving an eight-fold performance increase comp…

MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical Problems

26 September 2024·2015 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Minnesota

Multi-Agent System for Condition Mining (MACM) dramatically boosts large language model accuracy in complex math problem-solving, exceeding existing methods by achieving higher accuracy and better gen…

LT-Defense: Searching-free Backdoor Defense via Exploiting the Long-tailed Effect

26 September 2024·2148 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Beijing University of Posts and Telecommunications

LT-Defense: a searching-free backdoor defense for language models leveraging the long-tailed effect of poisoned data. It achieves 98% accuracy across 1440 models with less than 1% time cost of existin…

LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing

26 September 2024·2125 words·10 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Peking University

LSH-MoE accelerates Mixture-of-Experts training by 1.28x-2.2x via Locality-Sensitive Hashing, significantly reducing communication costs.

LoRA-GA: Low-Rank Adaptation with Gradient Approximation

26 September 2024·2382 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Tsinghua University

LoRA-GA: A novel initialization method dramatically speeds up low-rank adaptation (LoRA) for LLMs, achieving convergence rates comparable to full fine-tuning while improving performance.