Skip to main content

Spotlight Large Language Models

2024

xLSTM: Extended Long Short-Term Memory
·4451 words·21 mins· loading · loading
Large Language Models 🏒 ELLIS Unit, LIT AI Lab
XLSTM: Extended Long Short-Term Memory, introduces exponential gating and novel memory structures to overcome LSTM limitations, achieving performance comparable to state-of-the-art Transformers and St…
Who's asking? User personas and the mechanics of latent misalignment
·3650 words·18 mins· loading · loading
Large Language Models 🏒 Google Research
User personas significantly impact the safety of large language models, bypassing safety filters more effectively than direct prompting methods.
Watermarking Makes Language Models Radioactive
·3285 words·16 mins· loading · loading
Large Language Models 🏒 Meta FAIR
LLM watermarking leaves detectable traces in subsequently trained models, enabling detection of synthetic data usageβ€”a phenomenon termed ‘radioactivity’.
Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models
·2088 words·10 mins· loading · loading
Large Language Models 🏒 New York University
Unlocking tight generalization bounds for massive LLMs using a novel token-level approach.
Training Compute-Optimal Protein Language Models
·3023 words·15 mins· loading · loading
Large Language Models 🏒 Tsinghua University
Compute-optimal protein language models are trained efficiently using scaling laws derived from a massive dataset, improving performance while optimizing compute budgets.
Toxicity Detection for Free
·2767 words·13 mins· loading · loading
Large Language Models 🏒 University of California, Berkeley
Moderation Using LLM Introspection (MULI) leverages the first response token’s logits from LLMs to create a highly accurate toxicity detector, surpassing state-of-the-art methods with minimal overhead…
TOPA: Extending Large Language Models for Video Understanding via Text-Only Pre-Alignment
·4872 words·23 mins· loading · loading
Large Language Models 🏒 Zhejiang University
TOPA: Extending LLMs for video understanding using only text data.
Time-Reversal Provides Unsupervised Feedback to LLMs
·2584 words·13 mins· loading · loading
Large Language Models 🏒 Google DeepMind
Time-reversed language models provide unsupervised feedback for improving LLMs, offering a cost-effective alternative to human feedback and enhancing LLM safety.
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training
·5133 words·25 mins· loading · loading
Large Language Models 🏒 University of Hong Kong
Stacking Your Transformers accelerates LLM pre-training by leveraging smaller, pre-trained models to efficiently train larger ones, achieving significant speedups and improved performance.
Sequoia: Scalable and Robust Speculative Decoding
·2372 words·12 mins· loading · loading
Large Language Models 🏒 Carnegie Mellon University
SEQUOIA: A novel algorithm boosts Large Language Model (LLM) inference speed by up to 9.5x using a scalable and robust speculative decoding approach!
Selective Generation for Controllable Language Models
·2256 words·11 mins· loading · loading
Large Language Models 🏒 POSTECH
Certified selective generation controls language model hallucinations by leveraging textual entailment and a novel semi-supervised algorithm, guaranteeing a controlled false discovery rate.
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
·4063 words·20 mins· loading · loading
Large Language Models 🏒 EPFL
Revolutionizing LLM training: Constant learning rate with cooldown replaces cosine schedule, enabling cost-effective scaling experiments!
Rule Extrapolation in Language Modeling: A Study of Compositional Generalization on OOD Prompts
·2787 words·14 mins· loading · loading
Large Language Models 🏒 University of Cambridge
LLMs struggle with out-of-distribution (OOD) generalization. This research introduces ‘rule extrapolation’ using formal languages to rigorously evaluate OOD behavior in various LLM architectures, rev…
Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks
·1538 words·8 mins· loading · loading
Large Language Models 🏒 University of Illinois Urbana-Champaign
Robust Prompt Optimization (RPO) creates robust LLM defenses against jailbreaking attacks by optimizing a transferable suffix, achieving state-of-the-art robustness.
Resolving Discrepancies in Compute-Optimal Scaling of Language Models
·3628 words·18 mins· loading · loading
Large Language Models 🏒 Tel Aviv University
New research resolves discrepancies in language model scaling laws, revealing three key factors driving the differences and improving accuracy in predicting optimal model size based on compute budget.
Reranking Laws for Language Generation: A Communication-Theoretic Perspective
·1835 words·9 mins· loading · loading
Large Language Models 🏒 Instituto Superior Técnico, Universidade De Lisboa
Boost LLM reliability by adding redundancy! This paper uses a communication theory framework to show that generating multiple LLM outputs and reranking them significantly reduces errors, even with imp…
ReFT: Representation Finetuning for Language Models
·3382 words·16 mins· loading · loading
Large Language Models 🏒 Stanford University
ReFT: Revolutionizing language model finetuning by directly manipulating hidden representations, achieving superior efficiency and performance compared to existing methods.
QTIP: Quantization with Trellises and Incoherence Processing
·2586 words·13 mins· loading · loading
Large Language Models 🏒 Cornell University
QTIP: Ultra-high dimensional LLM quantization using trellis codes for faster, higher-quality inference.
PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models
·3100 words·15 mins· loading · loading
Large Language Models 🏒 Peking University
PiSSA, a novel parameter-efficient fine-tuning method, surpasses LoRA by initializing adapter matrices using the principal components of the original model, achieving faster convergence and enhanced p…
One-Shot Safety Alignment for Large Language Models via Optimal Dualization
·2069 words·10 mins· loading · loading
Large Language Models 🏒 University of Pennsylvania
One-shot dualization aligns large language models with safety constraints efficiently, eliminating iterative primal-dual methods for improved stability and reduced computational burden.