Large Language Models

SnapKV: LLM Knows What You are Looking for Before Generation

26 September 2024·2730 words·13 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 University of Illinois Urbana-Champaign

SnapKV: Slashing LLM memory usage & boosting speed via smart KV cache compression!

Smoothie: Label Free Language Model Routing

26 September 2024·3245 words·16 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Stanford University

SMOOTHIE: Label-free LLM routing achieves up to 10% accuracy gains by using a latent variable model to estimate LLM quality without labeled data.

SmallToLarge (S2L): Scalable Data Selection for Fine-tuning Large Language Models by Summarizing Training Trajectories of Small Models

26 September 2024·2599 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 UC Los Angeles

SMALLTOLARGE (S2L) revolutionizes large language model (LLM) fine-tuning by using a small model to summarize training loss trajectories, enabling efficient data selection for larger models.

SLTrain: a sparse plus low rank approach for parameter and memory efficient pretraining

26 September 2024·4422 words·21 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 RIKEN AIP

SLTrain: Sparsity+low-rank pretraining boosts LLM efficiency by up to 73% memory reduction without performance loss!

SlimGPT: Layer-wise Structured Pruning for Large Language Models

26 September 2024·2966 words·14 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Alibaba Group

SlimGPT: Achieve near-optimal LLM structured pruning via Batched Greedy Pruning and Incremental Pruning Ratio, improving efficiency without sacrificing accuracy.

SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Models

26 September 2024·2353 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Google Research

Self Logits Evolution Decoding (SLED) boosts LLM factuality by up to 20% without extra data or fine-tuning!

SIRIUS : Contexual Sparisty with Correction for Efficient LLMs

26 September 2024·5392 words·26 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Carnegie Mellon University

SIRIUS: A novel correction mechanism boosts the efficiency of contextually sparse LLMs for complex reasoning tasks, achieving significant latency reduction.

SimPO: Simple Preference Optimization with a Reference-Free Reward

26 September 2024·3091 words·15 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Princeton University

SimPO: a simpler, reference-free reward algorithm significantly outperforming existing offline preference optimization methods, achieving higher accuracy and efficiency in aligning LLMs with human pre…

Simple and Effective Masked Diffusion Language Models

26 September 2024·2145 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Cornell Tech

Simple masked discrete diffusion models achieve state-of-the-art language modeling results, closing the performance gap with autoregressive methods by using a novel training recipe and a Rao-Blackwell…

Should We Really Edit Language Models? On the Evaluation of Edited Language Models

26 September 2024·3638 words·18 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Hong Kong University of Science and Technology

Language model editing’s limitations exposed: Scaling current methods leads to knowledge loss and compromised safety, urging research into more robust techniques.

ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization

26 September 2024·3020 words·15 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Google DeepMind

ShiftAddLLM accelerates pretrained LLMs via post-training, multiplication-less reparameterization, achieving significant memory and energy reductions with comparable or better accuracy than existing m…

SHED: Shapley-Based Automated Dataset Refinement for Instruction Fine-Tuning

26 September 2024·2560 words·13 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 University of Maryland

SHED, a Shapley value-based framework, efficiently refines instruction-tuning datasets for LLMs, producing high-performing subsets, only 10% of original size, that transfer well across different model…

SGLang: Efficient Execution of Structured Language Model Programs

26 September 2024·1898 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 UC Berkeley

SGLang: A new system boosts LLM program execution speed by up to 6.4x, simplifying complex LLM application programming.

Sequoia: Scalable and Robust Speculative Decoding

26 September 2024·2372 words·12 mins· loading · loading

Large Language Models 🏢 Carnegie Mellon University

SEQUOIA: A novel algorithm boosts Large Language Model (LLM) inference speed by up to 9.5x using a scalable and robust speculative decoding approach!

Separations in the Representational Capabilities of Transformers and Recurrent Architectures

26 September 2024·1587 words·8 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 University of Oxford

Transformers and RNNs show contrasting representational capabilities: Transformers excel at tasks requiring associative recall, while RNNs are better suited for hierarchical language processing. This …

SemCoder: Training Code Language Models with Comprehensive Semantics Reasoning

26 September 2024·1972 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Columbia University

SEMCODER: A novel 6.7B parameter code LLM surpasses GPT-3.5-turbo’s performance on code generation and execution reasoning by employing ‘monologue reasoning’—training the model to verbally explain cod…

SelfCodeAlign: Self-Alignment for Code Generation

26 September 2024·1983 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Illinois Urbana-Champaign

SelfCodeAlign is a novel self-alignment method for code generation LLMs that surpasses existing methods by avoiding reliance on expensive human annotation or proprietary LLMs. The method achieves thi…

Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels

26 September 2024·5609 words·27 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Stanford University

SAMI: Self-Supervised Alignment with Mutual Information, effectively teaches language models to follow principles without human preference labels by maximizing the mutual information between principle…

Self-playing Adversarial Language Game Enhances LLM Reasoning

26 September 2024·2197 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Tencent AI Lab

Self-play adversarial language game boosts LLM reasoning!

Self-Guiding Exploration for Combinatorial Problems

26 September 2024·2441 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 MBZUAI

LLMs excel at reasoning tasks, but their application to combinatorial problems (CPs) is underexplored. This paper introduces Self-Guiding Exploration (SGE), a novel prompting strategy that significan…