Large Language Models

SELF-DISCOVER: Large Language Models Self-Compose Reasoning Structures

26 September 2024·2441 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Google DeepMind

LLMs self-discover optimal reasoning structures for complex problems, boosting performance by up to 32% compared to existing methods.

Selective Generation for Controllable Language Models

26 September 2024·2256 words·11 mins· loading · loading

Large Language Models 🏢 POSTECH

Certified selective generation controls language model hallucinations by leveraging textual entailment and a novel semi-supervised algorithm, guaranteeing a controlled false discovery rate.

Selective Attention: Enhancing Transformer through Principled Context Control

26 September 2024·2002 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Michigan

Enhance Transformer models via Selective Self-Attention (SSA), a principled context control method that boosts accuracy and efficiency.

SelectIT: Selective Instruction Tuning for LLMs via Uncertainty-Aware Self-Reflection

26 September 2024·3120 words·15 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Institute of Computing and Intelligence, Harbin Institute of Technology, Shenzhen, China

SelectIT leverages LLMs’ intrinsic uncertainty to efficiently select high-quality instruction tuning data, enhancing model performance without extra resources.

Segmenting Watermarked Texts From Language Models

26 September 2024·2577 words·13 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Texas A&M University

This paper presents novel statistical methods to reliably watermark and segment LLMs-generated text, ensuring source traceability even after user modifications.

Search for Efficient Large Language Models

26 September 2024·2477 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Northeastern University

Training-free architecture search finds optimal subnets in LLMs, boosting inference speed and slashing memory needs without retraining.

SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training

26 September 2024·2596 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Indiana University

SDP4Bit achieves up to 4.08x speedup in LLM training by quantizing weight differences and gradients to ~4 bits, maintaining accuracy.

Scaling Retrieval-Based Language Models with a Trillion-Token Datastore

26 September 2024·4019 words·19 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 University of Washington

Massive language models improve with bigger datastores at inference time. A 1.4 trillion-token datastore, MASSIVEDS, shows that retrieval-based LMs outperform larger, solely-trained models on knowled…

Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies

26 September 2024·2496 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Stanford University

Boosting LLM performance: This research shows how larger language models need bigger vocabularies for optimal efficiency and performance.

Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms

26 September 2024·2949 words·14 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Stanford University

Direct Alignment Algorithms (DAAs) for LLM alignment suffer from over-optimization, even without explicit reward models; this paper empirically demonstrates this and proposes scaling laws to understan…

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations

26 September 2024·4063 words·20 mins· loading · loading

Large Language Models 🏢 EPFL

Revolutionizing LLM training: Constant learning rate with cooldown replaces cosine schedule, enabling cost-effective scaling experiments!

SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal Domain

26 September 2024·2626 words·13 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 CINES

SaulLM-54B & SaulLM-141B achieve state-of-the-art performance on legal tasks by scaling up model size, employing a specialized instruction-following protocol, and aligning model outputs with human pre…

SafeWorld: Geo-Diverse Safety Alignment

26 September 2024·3977 words·19 mins· loading · loading

Natural Language Processing Large Language Models 🏢 UC Los Angeles

SAFEWORLD: a new benchmark reveals and fixes LLMs’ struggle with diverse safety standards.

S$^{2}$FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity

26 September 2024·1908 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Carnegie Mellon University

S2FT: Structured Sparse Fine-Tuning achieves state-of-the-art LLM fine-tuning performance, training efficiency, and inference scalability by selecting sparsely and computing densely.

S-STE: Continuous Pruning Function for Efficient 2:4 Sparse Pre-training

26 September 2024·2718 words·13 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Tsinghua University

S-STE achieves efficient 2:4 sparse pre-training by introducing a novel continuous pruning function, overcoming the limitations of previous methods and leading to improved accuracy and speed.

Rule Extrapolation in Language Modeling: A Study of Compositional Generalization on OOD Prompts

26 September 2024·2787 words·14 mins· loading · loading

Large Language Models 🏢 University of Cambridge

LLMs struggle with out-of-distribution (OOD) generalization. This research introduces ‘rule extrapolation’ using formal languages to rigorously evaluate OOD behavior in various LLM architectures, rev…

RouterDC: Query-Based Router by Dual Contrastive Learning for Assembling Large Language Models

26 September 2024·3132 words·15 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Hong Kong University of Science and Technology

RouterDC: A query-based router trained via dual contrastive learning assembles multiple LLMs, significantly outperforming individual LLMs and existing routing methods on both in- and out-of-distributi…

Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks

26 September 2024·1538 words·8 mins· loading · loading

Large Language Models 🏢 University of Illinois Urbana-Champaign

Robust Prompt Optimization (RPO) creates robust LLM defenses against jailbreaking attacks by optimizing a transferable suffix, achieving state-of-the-art robustness.

RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold

26 September 2024·2612 words·13 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Carnegie Mellon University

Leveraging model-generated synthetic data for LLM finetuning significantly improves efficiency when using both positive and strategically constructed negative examples, resulting in an eight-fold incr…

Risk-Averse Fine-tuning of Large Language Models

26 September 2024·3716 words·18 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Amazon

Risk-Averse RLHF fine-tunes LLMs to minimize toxic outputs while maintaining performance.