Large Language Models

SWT-Bench: Testing and Validating Real-World Bug-Fixes with Code Agents

26 September 2024·3127 words·15 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 ETH Zurich

SWT-Bench, a new benchmark, reveals that LLMs excel at generating tests for real-world bug fixes, surpassing dedicated test generation systems and significantly improving code-fix precision.

SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention

26 September 2024·3239 words·16 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Stanford University

SwitchHead: A novel MoE attention mechanism accelerates Transformers by significantly reducing computation and memory, matching baseline performance.

SVFT: Parameter-Efficient Fine-Tuning with Singular Vectors

26 September 2024·2772 words·14 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Texas at Austin

SVFT: a novel parameter-efficient fine-tuning method achieves near full fine-tuning accuracy using only 0.006% to 0.25% of parameters, significantly outperforming existing techniques.

Stress-Testing Capability Elicitation With Password-Locked Models

26 September 2024·2650 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Redwood Research

Fine-tuning, even on a single demonstration, effectively uncovers hidden LLM capabilities, surpassing simple prompting methods.

Streaming Long Video Understanding with Large Language Models

26 September 2024·2706 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Chinese University of Hong Kong

VideoStreaming, a novel vision-language model, enables efficient and accurate understanding of arbitrarily long videos using a constant number of tokens via streaming encoding and adaptive memory sele…

Stratified Prediction-Powered Inference for Effective Hybrid Evaluation of Language Models

26 September 2024·1611 words·8 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Google DeepMind

Stratified Prediction-Powered Inference (StratPPI) significantly improves language model evaluation by combining human and automated ratings, using stratified sampling for enhanced accuracy and tighte…

StrategyLLM: Large Language Models as Strategy Generators, Executors, Optimizers, and Evaluators for Problem Solving

26 September 2024·4275 words·21 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Tencent AI Lab

StrategyLLM uses four LLM agents to generate consistent, generalizable few-shot prompts, significantly improving LLM problem-solving performance across various tasks.

Stealth edits to large language models

26 September 2024·3221 words·16 mins· loading · loading

Natural Language Processing Large Language Models 🏢 King's College London

Researchers unveil stealth edits for large language models, offering a new metric to assess editability and reveal vulnerability to malicious attacks.

Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning

26 September 2024·1847 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Huawei Noah's Ark Lab

Star-Agents automates data optimization for instruction-tuned LLMs via multi-agent collaboration, achieving a 12% average performance boost.

Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training

26 September 2024·5133 words·25 mins· loading · loading

Large Language Models 🏢 University of Hong Kong

Stacking Your Transformers accelerates LLM pre-training by leveraging smaller, pre-trained models to efficiently train larger ones, achieving significant speedups and improved performance.

SSDM: Scalable Speech Dysfluency Modeling

26 September 2024·2807 words·14 mins· loading · loading

Natural Language Processing Large Language Models 🏢 UC Berkeley

SSDM: Scalable Speech Dysfluency Modeling tackles challenges in speech dysfluency analysis by using articulatory gestures for scalable alignment, a connectionist subsequence aligner for efficient dysf…

SS1: Accelerating Inference with Fast and Expressive Sketch Structured Transform

26 September 2024·2142 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Rice University

SS1: A novel GPU-friendly operator accelerates deep learning inference by leveraging structured parameter sharing, achieving superior quality-efficiency tradeoffs compared to existing methods.

SpeedLoader: An I/O efficient scheme for heterogeneous and distributed LLM operation

26 September 2024·1914 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 National University of Singapore

SpeedLoader: A groundbreaking I/O efficient scheme dramatically boosts LLM training & inference speed on diverse hardware, even with limited resources!

Speculative Decoding with CTC-based Draft Model for LLM Inference Acceleration

26 September 2024·1644 words·8 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences

Boosting LLM inference speed, a CTC-based draft model significantly improves speculative decoding’s acceptance rate, leading to faster inference.

Spectral Editing of Activations for Large Language Model Alignment

26 September 2024·2511 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Institute for Language, Cognition and Computation, University of Edinburgh

Spectral Editing of Activations (SEA) improves large language model truthfulness and fairness by projecting input representations to maximize covariance with positive demonstrations while minimizing c…

Spectral Adapter: Fine-Tuning in Spectral Space

26 September 2024·3909 words·19 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Stanford University

Spectral Adapter boosts parameter-efficient fine-tuning by incorporating pretrained weight matrices’ spectral information, enhancing efficiency and multi-adapter fusion.

SpecExec: Massively Parallel Speculative Decoding For Interactive LLM Inference on Consumer Devices

26 September 2024·2263 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Yandex HSE University

SpecExec achieves massively parallel speculative decoding, enabling interactive 50B+ parameter LLM inference on consumer devices at 4-6 tokens/second.

SparseLLM: Towards Global Pruning of Pre-trained Language Models

26 September 2024·2184 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Emory University

SparseLLM globally prunes large language models efficiently by decomposing the problem into manageable subproblems, achieving significant performance improvements, especially at high sparsity.

SpaceByte: Towards Deleting Tokenization from Large Language Modeling

26 September 2024·1675 words·8 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Rice University

SpaceByte: A novel byte-level decoder architecture achieving near-tokenized-model performance without tokenization!

Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space

26 September 2024·2028 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Technical University of Munich

Open-source LLMs are vulnerable to embedding space attacks, which efficiently bypass safety mechanisms and enable data extraction, even after unlearning.