Skip to main content

Large Language Models

Make Your LLM Fully Utilize the Context
·2445 words·12 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Microsoft
FILM-7B, trained with Information-Intensive (IN2) training, significantly overcomes the ’lost-in-the-middle’ problem in long-context LLMs, enabling robust information retrieval from all context positi…
MagR: Weight Magnitude Reduction for Enhancing Post-Training Quantization
·1848 words·9 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 University at Albany, SUNY
MagR: a novel preprocessing technique boosts post-training quantization of LLMs by reducing weight magnitudes without inference overhead, achieving state-of-the-art performance.
MAGNET: Improving the Multilingual Fairness of Language Models with Adaptive Gradient-Based Tokenization
·2236 words·11 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Washington
MAGNET, a novel adaptive gradient-based tokenization method, tackles multilingual language model bias by employing language-specific boundary predictors to achieve equitable segmentation across divers…
MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution
·3263 words·16 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 University of Hong Kong
MAGIS: A novel LLM-based multi-agent framework significantly boosts GitHub issue resolution by leveraging agent collaboration for planning and coding, achieving an eight-fold performance increase comp…
MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical Problems
·2015 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Minnesota
Multi-Agent System for Condition Mining (MACM) dramatically boosts large language model accuracy in complex math problem-solving, exceeding existing methods by achieving higher accuracy and better gen…
LT-Defense: Searching-free Backdoor Defense via Exploiting the Long-tailed Effect
·2148 words·11 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Beijing University of Posts and Telecommunications
LT-Defense: a searching-free backdoor defense for language models leveraging the long-tailed effect of poisoned data. It achieves 98% accuracy across 1440 models with less than 1% time cost of existin…
LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing
·2125 words·10 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Peking University
LSH-MoE accelerates Mixture-of-Experts training by 1.28x-2.2x via Locality-Sensitive Hashing, significantly reducing communication costs.
LoRA-GA: Low-Rank Adaptation with Gradient Approximation
·2382 words·12 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Tsinghua University
LoRA-GA: A novel initialization method dramatically speeds up low-rank adaptation (LoRA) for LLMs, achieving convergence rates comparable to full fine-tuning while improving performance.
LoQT: Low-Rank Adapters for Quantized Pretraining
·2483 words·12 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Copenhagen
LoQT enables efficient large language model training on consumer hardware via quantized weights and low-rank weight updates, overcoming memory limitations.
Long-form factuality in large language models
·4779 words·23 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Google DeepMind
LLMs often generate factually inaccurate long-form text. This work introduces LongFact, a new benchmark dataset of 2280 fact-seeking prompts, and SAFE, a novel automated evaluation method that outperf…
Loki: Low-rank Keys for Efficient Sparse Attention
·3255 words·16 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Maryland
Loki: Low-rank Keys for Efficient Sparse Attention accelerates attention mechanisms in LLMs by exploiting the low-dimensionality of key vectors. It dynamically selects key tokens based on approximate…
LoFiT: Localized Fine-tuning on LLM Representations
·4045 words·19 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 University of Texas at Austin
LOFIT: Localized fine-tuning boosts LLMs’ performance by selectively training only a small subset of attention heads, achieving comparable accuracy to other methods while using significantly fewer par…
Localized Zeroth-Order Prompt Optimization
·3110 words·15 mins· loading · loading
Large Language Models 🏢 National University of Singapore
Localized Zeroth-Order Prompt Optimization (ZOPO) efficiently finds high-performing local optima for prompt optimization in black-box LLMs, outperforming existing global optimization methods.
LLMs as Zero-shot Graph Learners: Alignment of GNN Representations with LLM Token Embeddings
·1927 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Beihang University
TEA-GLM leverages LLMs for zero-shot graph learning by aligning GNN representations with LLM token embeddings, achieving state-of-the-art performance on unseen datasets and tasks.
LLMDFA: Analyzing Dataflow in Code with Large Language Models
·3865 words·19 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Purdue University
LLMDFA: A novel LLM-powered framework performs compilation-free and customizable dataflow analysis, achieving high accuracy in bug detection by decomposing the task into sub-problems and mitigating L…
LLM-Check: Investigating Detection of Hallucinations in Large Language Models
·2270 words·11 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Maryland, College Park
LLM-Check efficiently detects LLM hallucinations in a single response, using internal model analysis, improving real-time applications.
LLM Processes: Numerical Predictive Distributions Conditioned on Natural Language
·5678 words·27 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Toronto
LLM Processes leverage LLMs to create probabilistic regression models guided by natural language, enabling seamless integration of expert knowledge and improving prediction accuracy.
LLM Evaluators Recognize and Favor Their Own Generations
·3818 words·18 mins· loading · loading
Large Language Models 🏢 MATS
LLMs show self-preference bias in evaluations, favoring their own outputs. This study reveals that LLMs surprisingly recognize their own generations, and this self-recognition directly causes the self…
LLM Dataset Inference: Did you train on my dataset?
·4983 words·24 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Carnegie Mellon University
LLM dataset inference reliably detects if a dataset was used in training, overcoming limitations of existing membership inference attacks.
LLM Circuit Analyses Are Consistent Across Training and Scale
·2075 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏢 EleutherAl
LLM circuit analyses remain consistent across model scales and extensive training, enabling more efficient interpretability research.