Large Language Models

Make Your LLM Fully Utilize the Context

26 September 2024·2445 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Microsoft

FILM-7B, trained with Information-Intensive (IN2) training, significantly overcomes the ’lost-in-the-middle’ problem in long-context LLMs, enabling robust information retrieval from all context positi…

MagR: Weight Magnitude Reduction for Enhancing Post-Training Quantization

26 September 2024·1848 words·9 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 University at Albany, SUNY

MagR: a novel preprocessing technique boosts post-training quantization of LLMs by reducing weight magnitudes without inference overhead, achieving state-of-the-art performance.

MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization

26 September 2024·2236 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Washington

MAGNET, a novel adaptive gradient-based tokenization method, tackles multilingual language model bias by employing language-specific boundary predictors to achieve equitable segmentation across divers…

MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution

26 September 2024·3263 words·16 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 University of Hong Kong

MAGIS: A novel LLM-based multi-agent framework significantly boosts GitHub issue resolution by leveraging agent collaboration for planning and coding, achieving an eight-fold performance increase comp…

MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical Problems

26 September 2024·2015 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Minnesota

Multi-Agent System for Condition Mining (MACM) dramatically boosts large language model accuracy in complex math problem-solving, exceeding existing methods by achieving higher accuracy and better gen…

LT-Defense: Searching-free Backdoor Defense via Exploiting the Long-tailed Effect

26 September 2024·2148 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Beijing University of Posts and Telecommunications

LT-Defense: a searching-free backdoor defense for language models leveraging the long-tailed effect of poisoned data. It achieves 98% accuracy across 1440 models with less than 1% time cost of existin…

LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing

26 September 2024·2125 words·10 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Peking University

LSH-MoE accelerates Mixture-of-Experts training by 1.28x-2.2x via Locality-Sensitive Hashing, significantly reducing communication costs.

LoRA-GA: Low-Rank Adaptation with Gradient Approximation

26 September 2024·2382 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Tsinghua University

LoRA-GA: A novel initialization method dramatically speeds up low-rank adaptation (LoRA) for LLMs, achieving convergence rates comparable to full fine-tuning while improving performance.

LoQT: Low-Rank Adapters for Quantized Pretraining

26 September 2024·2483 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Copenhagen

LoQT enables efficient large language model training on consumer hardware via quantized weights and low-rank weight updates, overcoming memory limitations.

Long-form factuality in large language models

26 September 2024·4779 words·23 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Google DeepMind

LLMs often generate factually inaccurate long-form text. This work introduces LongFact, a new benchmark dataset of 2280 fact-seeking prompts, and SAFE, a novel automated evaluation method that outperf…

Loki: Low-rank Keys for Efficient Sparse Attention

26 September 2024·3255 words·16 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Maryland

Loki: Low-rank Keys for Efficient Sparse Attention accelerates attention mechanisms in LLMs by exploiting the low-dimensionality of key vectors. It dynamically selects key tokens based on approximate…

LoFiT: Localized Fine-tuning on LLM Representations

26 September 2024·4045 words·19 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 University of Texas at Austin

LOFIT: Localized fine-tuning boosts LLMs’ performance by selectively training only a small subset of attention heads, achieving comparable accuracy to other methods while using significantly fewer par…

Localized Zeroth-Order Prompt Optimization

26 September 2024·3110 words·15 mins· loading · loading

Large Language Models 🏢 National University of Singapore

Localized Zeroth-Order Prompt Optimization (ZOPO) efficiently finds high-performing local optima for prompt optimization in black-box LLMs, outperforming existing global optimization methods.

LLMs as Zero-shot Graph Learners: Alignment of GNN Representations with LLM Token Embeddings

26 September 2024·1927 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Beihang University

TEA-GLM leverages LLMs for zero-shot graph learning by aligning GNN representations with LLM token embeddings, achieving state-of-the-art performance on unseen datasets and tasks.

LLMDFA: Analyzing Dataflow in Code with Large Language Models

26 September 2024·3865 words·19 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Purdue University

LLMDFA: A novel LLM-powered framework performs compilation-free and customizable dataflow analysis, achieving high accuracy in bug detection by decomposing the task into sub-problems and mitigating L…

LLM-Check: Investigating Detection of Hallucinations in Large Language Models

26 September 2024·2270 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Maryland, College Park

LLM-Check efficiently detects LLM hallucinations in a single response, using internal model analysis, improving real-time applications.

LLM Processes: Numerical Predictive Distributions Conditioned on Natural Language

26 September 2024·5678 words·27 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Toronto

LLM Processes leverage LLMs to create probabilistic regression models guided by natural language, enabling seamless integration of expert knowledge and improving prediction accuracy.

LLM Evaluators Recognize and Favor Their Own Generations

26 September 2024·3818 words·18 mins· loading · loading

Large Language Models 🏢 MATS

LLMs show self-preference bias in evaluations, favoring their own outputs. This study reveals that LLMs surprisingly recognize their own generations, and this self-recognition directly causes the self…

LLM Dataset Inference: Did you train on my dataset?

26 September 2024·4983 words·24 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Carnegie Mellon University

LLM dataset inference reliably detects if a dataset was used in training, overcoming limitations of existing membership inference attacks.

LLM Circuit Analyses Are Consistent Across Training and Scale

26 September 2024·2075 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 EleutherAl

LLM circuit analyses remain consistent across model scales and extensive training, enabling more efficient interpretability research.