Large Language Models
SELF-DISCOVER: Large Language Models Self-Compose Reasoning Structures
·2441 words·12 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Google DeepMind
LLMs self-discover optimal reasoning structures for complex problems, boosting performance by up to 32% compared to existing methods.
Selective Generation for Controllable Language Models
·2256 words·11 mins·
loading
·
loading
Large Language Models
🏢 POSTECH
Certified selective generation controls language model hallucinations by leveraging textual entailment and a novel semi-supervised algorithm, guaranteeing a controlled false discovery rate.
Selective Attention: Enhancing Transformer through Principled Context Control
·2002 words·10 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 University of Michigan
Enhance Transformer models via Selective Self-Attention (SSA), a principled context control method that boosts accuracy and efficiency.
SelectIT: Selective Instruction Tuning for LLMs via Uncertainty-Aware Self-Reflection
·3120 words·15 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Institute of Computing and Intelligence, Harbin Institute of Technology, Shenzhen, China
SelectIT leverages LLMs’ intrinsic uncertainty to efficiently select high-quality instruction tuning data, enhancing model performance without extra resources.
Segmenting Watermarked Texts From Language Models
·2577 words·13 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 Texas A&M University
This paper presents novel statistical methods to reliably watermark and segment LLMs-generated text, ensuring source traceability even after user modifications.
Search for Efficient Large Language Models
·2477 words·12 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Northeastern University
Training-free architecture search finds optimal subnets in LLMs, boosting inference speed and slashing memory needs without retraining.
SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training
·2596 words·13 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Indiana University
SDP4Bit achieves up to 4.08x speedup in LLM training by quantizing weight differences and gradients to ~4 bits, maintaining accuracy.
Scaling Retrieval-Based Language Models with a Trillion-Token Datastore
·4019 words·19 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 University of Washington
Massive language models improve with bigger datastores at inference time. A 1.4 trillion-token datastore, MASSIVEDS, shows that retrieval-based LMs outperform larger, solely-trained models on knowled…
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies
·2496 words·12 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Stanford University
Boosting LLM performance: This research shows how larger language models need bigger vocabularies for optimal efficiency and performance.
Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms
·2949 words·14 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Stanford University
Direct Alignment Algorithms (DAAs) for LLM alignment suffer from over-optimization, even without explicit reward models; this paper empirically demonstrates this and proposes scaling laws to understan…
Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
·4063 words·20 mins·
loading
·
loading
Large Language Models
🏢 EPFL
Revolutionizing LLM training: Constant learning rate with cooldown replaces cosine schedule, enabling cost-effective scaling experiments!
SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal Domain
·2626 words·13 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 CINES
SaulLM-54B & SaulLM-141B achieve state-of-the-art performance on legal tasks by scaling up model size, employing a specialized instruction-following protocol, and aligning model outputs with human pre…
SafeWorld: Geo-Diverse Safety Alignment
·3977 words·19 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 UC Los Angeles
SAFEWORLD: a new benchmark reveals and fixes LLMs’ struggle with diverse safety standards.
S$^{2}$FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity
·1908 words·9 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Carnegie Mellon University
S2FT: Structured Sparse Fine-Tuning achieves state-of-the-art LLM fine-tuning performance, training efficiency, and inference scalability by selecting sparsely and computing densely.
S-STE: Continuous Pruning Function for Efficient 2:4 Sparse Pre-training
·2718 words·13 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 Tsinghua University
S-STE achieves efficient 2:4 sparse pre-training by introducing a novel continuous pruning function, overcoming the limitations of previous methods and leading to improved accuracy and speed.
Rule Extrapolation in Language Modeling: A Study of Compositional Generalization on OOD Prompts
·2787 words·14 mins·
loading
·
loading
Large Language Models
🏢 University of Cambridge
LLMs struggle with out-of-distribution (OOD) generalization. This research introduces ‘rule extrapolation’ using formal languages to rigorously evaluate OOD behavior in various LLM architectures, rev…
RouterDC: Query-Based Router by Dual Contrastive Learning for Assembling Large Language Models
·3132 words·15 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 Hong Kong University of Science and Technology
RouterDC: A query-based router trained via dual contrastive learning assembles multiple LLMs, significantly outperforming individual LLMs and existing routing methods on both in- and out-of-distributi…
Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks
·1538 words·8 mins·
loading
·
loading
Large Language Models
🏢 University of Illinois Urbana-Champaign
Robust Prompt Optimization (RPO) creates robust LLM defenses against jailbreaking attacks by optimizing a transferable suffix, achieving state-of-the-art robustness.
RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold
·2612 words·13 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 Carnegie Mellon University
Leveraging model-generated synthetic data for LLM finetuning significantly improves efficiency when using both positive and strategically constructed negative examples, resulting in an eight-fold incr…
Risk-Averse Fine-tuning of Large Language Models
·3716 words·18 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Amazon
Risk-Averse RLHF fine-tunes LLMs to minimize toxic outputs while maintaining performance.