Skip to main content

Spotlight Large Language Models

2024

Observational Scaling Laws and the Predictability of Langauge Model Performance
·4816 words·23 mins· loading · loading
Large Language Models 🏒 University of Toronto
Researchers predict language model performance by observing existing models, bypassing costly training, revealing surprising predictability in complex scaling phenomena.
Model Fusion through Bayesian Optimization in Language Model Fine-Tuning
·3140 words·15 mins· loading · loading
Large Language Models 🏒 KAIST
Bayesian Optimization Model Fusion (BOMF) significantly boosts language model fine-tuning by optimizing both loss and metrics through multi-objective Bayesian optimization, yielding considerable perfo…
MKGL: Mastery of a Three-Word Language
·2110 words·10 mins· loading · loading
Large Language Models 🏒 Zhejiang University
Researchers taught a large language model (LLM) a three-word ‘Knowledge Graph Language’ (KGL) to improve knowledge graph (KG) completion, drastically reducing errors compared to other methods.
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
·2998 words·15 mins· loading · loading
Large Language Models 🏒 Microsoft Corporation
MInference 1.0 accelerates LLM pre-filling via dynamic sparse attention, achieving up to 10x speedup on an A100 GPU while maintaining accuracy.
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models
·2759 words·13 mins· loading · loading
Large Language Models 🏒 NVIDIA
MaskLLM learns efficient semi-structured sparsity in LLMs via end-to-end training, achieving significant speedup and memory reduction without sacrificing performance.
Many-Shot In-Context Learning
·3209 words·16 mins· loading · loading
Large Language Models 🏒 Google DeepMind
Scaling up in-context learning using thousands of examples significantly boosts Large Language Model (LLM) performance, particularly for complex tasks. Novel training methods mitigate reliance on hum…
Localized Zeroth-Order Prompt Optimization
·3110 words·15 mins· loading · loading
Large Language Models 🏒 National University of Singapore
Localized Zeroth-Order Prompt Optimization (ZOPO) efficiently finds high-performing local optima for prompt optimization in black-box LLMs, outperforming existing global optimization methods.
Learn To be Efficient: Build Structured Sparsity in Large Language Models
·2525 words·12 mins· loading · loading
Large Language Models 🏒 University of Michigan
Learn-To-be-Efficient (LTE) trains LLMs to achieve structured sparsity, boosting inference speed by 25% at 50% sparsity without sacrificing accuracy.
Induced Model Matching: Restricted Models Help Train Full-Featured Models
·2402 words·12 mins· loading · loading
Large Language Models 🏒 University of Illinois Chicago
Restricted models often outperform full-featured models when training data is limited. This paper introduces Induced Model Matching (IMM), a novel technique that uses a restricted model as a guide to…
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models
·3990 words·19 mins· loading · loading
Large Language Models 🏒 University of British Columbia
Adam’s superior performance on language models stems from its resilience to heavy-tailed class imbalance, unlike SGD, which struggles with infrequent word losses.
HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection
·2304 words·11 mins· loading · loading
Large Language Models 🏒 University of Wisconsin-Madison
HaloScope leverages unlabeled LLM outputs to accurately detect AI hallucinations without human annotation, significantly outperforming existing methods.
GREATS: Online Selection of High-Quality Data for LLM Training in Every Iteration
·1719 words·9 mins· loading · loading
Large Language Models 🏒 Princeton University
GREATS: a novel online batch selection method significantly speeds up LLM training by greedily selecting high-quality data batches in every iteration, improving both convergence and generalization per…
Graph-based Uncertainty Metrics for Long-form Language Model Generations
·2055 words·10 mins· loading · loading
Large Language Models 🏒 Stanford University
Graph Uncertainty boosts LLM factuality by 6.8% using graph centrality to estimate claim-level uncertainty and a novel uncertainty-aware decoding process.
Generated and Pseudo Content guided Prototype Refinement for Few-shot Point Cloud Segmentation
·1934 words·10 mins· loading · loading
Large Language Models 🏒 Beijing Jiaotong University
LLM-powered prototype refinement boosts few-shot 3D point cloud segmentation accuracy.
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
·2517 words·12 mins· loading · loading
Large Language Models 🏒 Colfax Research
FlashAttention-3: Achieves 1.5-2x faster attention on H100 GPUs using asynchrony and low-precision, reaching 1.3 PFLOPs/s.
Exploring Context Window of Large Language Models via Decomposed Positional Vectors
·3403 words·16 mins· loading · loading
Large Language Models 🏒 Gaoling School of Artificial Intelligence, Renmin University of China
Researchers extended large language models’ context windows by training-free methods via analyzing and manipulating positional vectors, improving long-text processing.
Evaluating the World Model Implicit in a Generative Model
·4059 words·20 mins· loading · loading
Large Language Models 🏒 Harvard University
New metrics reveal that generative models often possess surprisingly incoherent world models, despite seemingly accurate next-token predictions. This incoherence leads to fragility in solving related …
Ensemble Learning for Heterogeneous Large Language Models with Deep Parallel Collaboration
·2339 words·11 mins· loading · loading
Large Language Models 🏒 Harbin Institute of Technology
DEEPEN: a training-free LLM ensemble framework fusing probability distributions in a relative space to overcome vocabulary misalignment, improving performance consistently across benchmarks.
Efficient Adversarial Training in LLMs with Continuous Attacks
·2099 words·10 mins· loading · loading
Large Language Models 🏒 Mila, Université De Montréal
Boosting LLM robustness against attacks efficiently: Continuous adversarial training in embedding space outperforms discrete methods, achieving improved robustness with less computation.
Discrete Flow Matching
·2076 words·10 mins· loading · loading
Large Language Models 🏒 Meta FAIR
Discrete Flow Matching (DFM) revolutionizes discrete data generation by introducing a novel flow paradigm that surpasses existing methods. DFM leverages flexible probability paths, enabling efficient …