Spotlight Large Language Models

Observational Scaling Laws and the Predictability of Langauge Model Performance

26 September 2024·4816 words·23 mins· loading · loading

Large Language Models 🏢 University of Toronto

Researchers predict language model performance by observing existing models, bypassing costly training, revealing surprising predictability in complex scaling phenomena.

Model Fusion through Bayesian Optimization in Language Model Fine-Tuning

26 September 2024·3140 words·15 mins· loading · loading

Large Language Models 🏢 KAIST

Bayesian Optimization Model Fusion (BOMF) significantly boosts language model fine-tuning by optimizing both loss and metrics through multi-objective Bayesian optimization, yielding considerable perfo…

MKGL: Mastery of a Three-Word Language

26 September 2024·2110 words·10 mins· loading · loading

Large Language Models 🏢 Zhejiang University

Researchers taught a large language model (LLM) a three-word ‘Knowledge Graph Language’ (KGL) to improve knowledge graph (KG) completion, drastically reducing errors compared to other methods.

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

26 September 2024·2998 words·15 mins· loading · loading

Large Language Models 🏢 Microsoft Corporation

MInference 1.0 accelerates LLM pre-filling via dynamic sparse attention, achieving up to 10x speedup on an A100 GPU while maintaining accuracy.

MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models

26 September 2024·2759 words·13 mins· loading · loading

Large Language Models 🏢 NVIDIA

MaskLLM learns efficient semi-structured sparsity in LLMs via end-to-end training, achieving significant speedup and memory reduction without sacrificing performance.

Many-Shot In-Context Learning

26 September 2024·3209 words·16 mins· loading · loading

Large Language Models 🏢 Google DeepMind

Scaling up in-context learning using thousands of examples significantly boosts Large Language Model (LLM) performance, particularly for complex tasks. Novel training methods mitigate reliance on hum…

Localized Zeroth-Order Prompt Optimization

26 September 2024·3110 words·15 mins· loading · loading

Large Language Models 🏢 National University of Singapore

Localized Zeroth-Order Prompt Optimization (ZOPO) efficiently finds high-performing local optima for prompt optimization in black-box LLMs, outperforming existing global optimization methods.

Learn To be Efficient: Build Structured Sparsity in Large Language Models

26 September 2024·2525 words·12 mins· loading · loading

Large Language Models 🏢 University of Michigan

Learn-To-be-Efficient (LTE) trains LLMs to achieve structured sparsity, boosting inference speed by 25% at 50% sparsity without sacrificing accuracy.

Induced Model Matching: Restricted Models Help Train Full-Featured Models

26 September 2024·2402 words·12 mins· loading · loading

Large Language Models 🏢 University of Illinois Chicago

Restricted models often outperform full-featured models when training data is limited. This paper introduces Induced Model Matching (IMM), a novel technique that uses a restricted model as a guide to…

Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models

26 September 2024·3990 words·19 mins· loading · loading

Large Language Models 🏢 University of British Columbia

Adam’s superior performance on language models stems from its resilience to heavy-tailed class imbalance, unlike SGD, which struggles with infrequent word losses.

HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection

26 September 2024·2304 words·11 mins· loading · loading

Large Language Models 🏢 University of Wisconsin-Madison

HaloScope leverages unlabeled LLM outputs to accurately detect AI hallucinations without human annotation, significantly outperforming existing methods.

GREATS: Online Selection of High-Quality Data for LLM Training in Every Iteration

26 September 2024·1719 words·9 mins· loading · loading

Large Language Models 🏢 Princeton University

GREATS: a novel online batch selection method significantly speeds up LLM training by greedily selecting high-quality data batches in every iteration, improving both convergence and generalization per…

Graph-based Uncertainty Metrics for Long-form Language Model Generations

26 September 2024·2055 words·10 mins· loading · loading

Large Language Models 🏢 Stanford University

Graph Uncertainty boosts LLM factuality by 6.8% using graph centrality to estimate claim-level uncertainty and a novel uncertainty-aware decoding process.

Generated and Pseudo Content guided Prototype Refinement for Few-shot Point Cloud Segmentation

26 September 2024·1934 words·10 mins· loading · loading

Large Language Models 🏢 Beijing Jiaotong University

LLM-powered prototype refinement boosts few-shot 3D point cloud segmentation accuracy.

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

26 September 2024·2517 words·12 mins· loading · loading

Large Language Models 🏢 Colfax Research

FlashAttention-3: Achieves 1.5-2x faster attention on H100 GPUs using asynchrony and low-precision, reaching 1.3 PFLOPs/s.

Exploring Context Window of Large Language Models via Decomposed Positional Vectors

26 September 2024·3403 words·16 mins· loading · loading

Large Language Models 🏢 Gaoling School of Artificial Intelligence, Renmin University of China

Researchers extended large language models’ context windows by training-free methods via analyzing and manipulating positional vectors, improving long-text processing.

Evaluating the World Model Implicit in a Generative Model

26 September 2024·4059 words·20 mins· loading · loading

Large Language Models 🏢 Harvard University

New metrics reveal that generative models often possess surprisingly incoherent world models, despite seemingly accurate next-token predictions. This incoherence leads to fragility in solving related …

Ensemble Learning for Heterogeneous Large Language Models with Deep Parallel Collaboration

26 September 2024·2339 words·11 mins· loading · loading

Large Language Models 🏢 Harbin Institute of Technology

DEEPEN: a training-free LLM ensemble framework fusing probability distributions in a relative space to overcome vocabulary misalignment, improving performance consistently across benchmarks.

Efficient Adversarial Training in LLMs with Continuous Attacks

26 September 2024·2099 words·10 mins· loading · loading

Large Language Models 🏢 Mila, Université De Montréal

Boosting LLM robustness against attacks efficiently: Continuous adversarial training in embedding space outperforms discrete methods, achieving improved robustness with less computation.

Discrete Flow Matching

26 September 2024·2076 words·10 mins· loading · loading

Large Language Models 🏢 Meta FAIR

Discrete Flow Matching (DFM) revolutionizes discrete data generation by introducing a novel flow paradigm that surpasses existing methods. DFM leverages flexible probability paths, enabling efficient …