Skip to main content

Natural Language Processing

Mesa-Extrapolation: A Weave Position Encoding Method for Enhanced Extrapolation in LLMs
·3226 words·16 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Tsinghua University
Mesa-Extrapolation enhances LLM extrapolation using a novel weave position encoding method, boosting performance while significantly reducing memory and inference time.
MemoryFormer : Minimize Transformer Computation by Removing Fully-Connected Layers
·2036 words·10 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Peking University
MemoryFormer drastically cuts large language model computation by replacing fully-connected layers with memory-efficient hashing, enabling faster and more scalable AI.
Memory-Efficient LLM Training with Online Subspace Descent
·1794 words·9 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Texas at Austin
Online Subspace Descent: a novel memory-efficient LLM training algorithm guaranteed to converge, closing the performance gap with full-rank methods.
Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration
·2645 words·13 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Tsinghua University
SPV-MIA, a novel membership inference attack, significantly improves the accuracy of identifying training data in fine-tuned LLMs by using self-prompt calibration and probabilistic variation.
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
·1897 words·9 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Meta AI
MEGALODON: A new neural architecture for LLMs, enabling unlimited context length with improved efficiency and accuracy.
MediQ: Question-Asking LLMs and a Benchmark for Reliable Interactive Clinical Reasoning
·2561 words·13 mins· loading · loading
Natural Language Processing Question Answering 🏢 University of Washington
MEDIQ benchmark revolutionizes LLM evaluation by shifting from static to interactive clinical reasoning, revealing LLMs’ struggles with proactive information-seeking and highlighting the importance of…
Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models
·2097 words·10 mins· loading · loading
Natural Language Processing Interpretability 🏢 MIT
New metrics and p-annealing improve sparse autoencoder training for better language model interpretability.
Meaningful Learning: Enhancing Abstract Reasoning in Large Language Models via Generic Fact Guidance
·2532 words·12 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Harbin Institute of Technology
Boosting LLMs’ abstract reasoning via ‘Meaningful Learning’: A new dataset and learning paradigm significantly enhance LLMs’ capacity for abstract reasoning, moving beyond simple memorization.
MatFormer: Nested Transformer for Elastic Inference
·3341 words·16 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Texas at Austin
MatFormer: Train one universal model, extract hundreds of accurate submodels for elastic inference!
Masked Hard-Attention Transformers Recognize Exactly the Star-Free Languages
·1697 words·8 mins· loading · loading
AI Generated Natural Language Processing AI Theory 🏢 University of Notre Dame
Masked hard-attention transformers, with strict masking, precisely capture star-free languages, matching the expressive power of linear temporal logic.
Many-shot Jailbreaking
·5721 words·27 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Anthropic
Long-context attacks easily manipulate LLMs by feeding hundreds of harmful examples, highlighting a critical vulnerability amplified by larger context windows.
MAmmoTH2: Scaling Instructions from the Web
·2418 words·12 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Carnegie Mellon University
MAmmoTH2: Harvesting 10M web instructions for enhanced LLM reasoning!
Make Your LLM Fully Utilize the Context
·2445 words·12 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Microsoft
FILM-7B, trained with Information-Intensive (IN2) training, significantly overcomes the ’lost-in-the-middle’ problem in long-context LLMs, enabling robust information retrieval from all context positi…
MagR: Weight Magnitude Reduction for Enhancing Post-Training Quantization
·1848 words·9 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 University at Albany, SUNY
MagR: a novel preprocessing technique boosts post-training quantization of LLMs by reducing weight magnitudes without inference overhead, achieving state-of-the-art performance.
MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization
·2236 words·11 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Washington
MAGNET, a novel adaptive gradient-based tokenization method, tackles multilingual language model bias by employing language-specific boundary predictors to achieve equitable segmentation across divers…
MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution
·3263 words·16 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 University of Hong Kong
MAGIS: A novel LLM-based multi-agent framework significantly boosts GitHub issue resolution by leveraging agent collaboration for planning and coding, achieving an eight-fold performance increase comp…
MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical Problems
·2015 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Minnesota
Multi-Agent System for Condition Mining (MACM) dramatically boosts large language model accuracy in complex math problem-solving, exceeding existing methods by achieving higher accuracy and better gen…
LT-Defense: Searching-free Backdoor Defense via Exploiting the Long-tailed Effect
·2148 words·11 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Beijing University of Posts and Telecommunications
LT-Defense: a searching-free backdoor defense for language models leveraging the long-tailed effect of poisoned data. It achieves 98% accuracy across 1440 models with less than 1% time cost of existin…
LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing
·2125 words·10 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Peking University
LSH-MoE accelerates Mixture-of-Experts training by 1.28x-2.2x via Locality-Sensitive Hashing, significantly reducing communication costs.
LoRA-GA: Low-Rank Adaptation with Gradient Approximation
·2382 words·12 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Tsinghua University
LoRA-GA: A novel initialization method dramatically speeds up low-rank adaptation (LoRA) for LLMs, achieving convergence rates comparable to full fine-tuning while improving performance.