Spotlight Large Language Models

xLSTM: Extended Long Short-Term Memory

26 September 2024·4451 words·21 mins· loading · loading

Large Language Models 🏢 ELLIS Unit, LIT AI Lab

XLSTM: Extended Long Short-Term Memory, introduces exponential gating and novel memory structures to overcome LSTM limitations, achieving performance comparable to state-of-the-art Transformers and St…

Who's asking? User personas and the mechanics of latent misalignment

26 September 2024·3650 words·18 mins· loading · loading

Large Language Models 🏢 Google Research

User personas significantly impact the safety of large language models, bypassing safety filters more effectively than direct prompting methods.

Watermarking Makes Language Models Radioactive

26 September 2024·3285 words·16 mins· loading · loading

Large Language Models 🏢 Meta FAIR

LLM watermarking leaves detectable traces in subsequently trained models, enabling detection of synthetic data usage—a phenomenon termed ‘radioactivity’.

Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models

26 September 2024·2088 words·10 mins· loading · loading

Large Language Models 🏢 New York University

Unlocking tight generalization bounds for massive LLMs using a novel token-level approach.

Training Compute-Optimal Protein Language Models

26 September 2024·3023 words·15 mins· loading · loading

Large Language Models 🏢 Tsinghua University

Compute-optimal protein language models are trained efficiently using scaling laws derived from a massive dataset, improving performance while optimizing compute budgets.

Toxicity Detection for Free

26 September 2024·2767 words·13 mins· loading · loading

Large Language Models 🏢 University of California, Berkeley

Moderation Using LLM Introspection (MULI) leverages the first response token’s logits from LLMs to create a highly accurate toxicity detector, surpassing state-of-the-art methods with minimal overhead…

TOPA: Extending Large Language Models for Video Understanding via Text-Only Pre-Alignment

26 September 2024·4872 words·23 mins· loading · loading

Large Language Models 🏢 Zhejiang University

TOPA: Extending LLMs for video understanding using only text data.

Time-Reversal Provides Unsupervised Feedback to LLMs

26 September 2024·2584 words·13 mins· loading · loading

Large Language Models 🏢 Google DeepMind

Time-reversed language models provide unsupervised feedback for improving LLMs, offering a cost-effective alternative to human feedback and enhancing LLM safety.

Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training

26 September 2024·5133 words·25 mins· loading · loading

Large Language Models 🏢 University of Hong Kong

Stacking Your Transformers accelerates LLM pre-training by leveraging smaller, pre-trained models to efficiently train larger ones, achieving significant speedups and improved performance.

Sequoia: Scalable and Robust Speculative Decoding

26 September 2024·2372 words·12 mins· loading · loading

Large Language Models 🏢 Carnegie Mellon University

SEQUOIA: A novel algorithm boosts Large Language Model (LLM) inference speed by up to 9.5x using a scalable and robust speculative decoding approach!

Selective Generation for Controllable Language Models

26 September 2024·2256 words·11 mins· loading · loading

Large Language Models 🏢 POSTECH

Certified selective generation controls language model hallucinations by leveraging textual entailment and a novel semi-supervised algorithm, guaranteeing a controlled false discovery rate.

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations

26 September 2024·4063 words·20 mins· loading · loading

Large Language Models 🏢 EPFL

Revolutionizing LLM training: Constant learning rate with cooldown replaces cosine schedule, enabling cost-effective scaling experiments!

Rule Extrapolation in Language Modeling: A Study of Compositional Generalization on OOD Prompts

26 September 2024·2787 words·14 mins· loading · loading

Large Language Models 🏢 University of Cambridge

LLMs struggle with out-of-distribution (OOD) generalization. This research introduces ‘rule extrapolation’ using formal languages to rigorously evaluate OOD behavior in various LLM architectures, rev…

Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks

26 September 2024·1538 words·8 mins· loading · loading

Large Language Models 🏢 University of Illinois Urbana-Champaign

Robust Prompt Optimization (RPO) creates robust LLM defenses against jailbreaking attacks by optimizing a transferable suffix, achieving state-of-the-art robustness.

Resolving Discrepancies in Compute-Optimal Scaling of Language Models

26 September 2024·3628 words·18 mins· loading · loading

Large Language Models 🏢 Tel Aviv University

New research resolves discrepancies in language model scaling laws, revealing three key factors driving the differences and improving accuracy in predicting optimal model size based on compute budget.

Reranking Laws for Language Generation: A Communication-Theoretic Perspective

26 September 2024·1835 words·9 mins· loading · loading

Large Language Models 🏢 Instituto Superior Técnico, Universidade De Lisboa

Boost LLM reliability by adding redundancy! This paper uses a communication theory framework to show that generating multiple LLM outputs and reranking them significantly reduces errors, even with imp…

ReFT: Representation Finetuning for Language Models

26 September 2024·3382 words·16 mins· loading · loading

Large Language Models 🏢 Stanford University

ReFT: Revolutionizing language model finetuning by directly manipulating hidden representations, achieving superior efficiency and performance compared to existing methods.

QTIP: Quantization with Trellises and Incoherence Processing

26 September 2024·2586 words·13 mins· loading · loading

Large Language Models 🏢 Cornell University

QTIP: Ultra-high dimensional LLM quantization using trellis codes for faster, higher-quality inference.

PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models

26 September 2024·3100 words·15 mins· loading · loading

Large Language Models 🏢 Peking University

PiSSA, a novel parameter-efficient fine-tuning method, surpasses LoRA by initializing adapter matrices using the principal components of the original model, achieving faster convergence and enhanced p…

One-Shot Safety Alignment for Large Language Models via Optimal Dualization

26 September 2024·2069 words·10 mins· loading · loading

Large Language Models 🏢 University of Pennsylvania

One-shot dualization aligns large language models with safety constraints efficiently, eliminating iterative primal-dual methods for improved stability and reduced computational burden.