Large Language Models
SWT-Bench: Testing and Validating Real-World Bug-Fixes with Code Agents
·3127 words·15 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 ETH Zurich
SWT-Bench, a new benchmark, reveals that LLMs excel at generating tests for real-world bug fixes, surpassing dedicated test generation systems and significantly improving code-fix precision.
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
·3239 words·16 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 Stanford University
SwitchHead: A novel MoE attention mechanism accelerates Transformers by significantly reducing computation and memory, matching baseline performance.
SVFT: Parameter-Efficient Fine-Tuning with Singular Vectors
·2772 words·14 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 University of Texas at Austin
SVFT: a novel parameter-efficient fine-tuning method achieves near full fine-tuning accuracy using only 0.006% to 0.25% of parameters, significantly outperforming existing techniques.
Stress-Testing Capability Elicitation With Password-Locked Models
·2650 words·13 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Redwood Research
Fine-tuning, even on a single demonstration, effectively uncovers hidden LLM capabilities, surpassing simple prompting methods.
Streaming Long Video Understanding with Large Language Models
·2706 words·13 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Chinese University of Hong Kong
VideoStreaming, a novel vision-language model, enables efficient and accurate understanding of arbitrarily long videos using a constant number of tokens via streaming encoding and adaptive memory sele…
Stratified Prediction-Powered Inference for Effective Hybrid Evaluation of Language Models
·1611 words·8 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 Google DeepMind
Stratified Prediction-Powered Inference (StratPPI) significantly improves language model evaluation by combining human and automated ratings, using stratified sampling for enhanced accuracy and tighte…
StrategyLLM: Large Language Models as Strategy Generators, Executors, Optimizers, and Evaluators for Problem Solving
·4275 words·21 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 Tencent AI Lab
StrategyLLM uses four LLM agents to generate consistent, generalizable few-shot prompts, significantly improving LLM problem-solving performance across various tasks.
Stealth edits to large language models
·3221 words·16 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 King's College London
Researchers unveil stealth edits for large language models, offering a new metric to assess editability and reveal vulnerability to malicious attacks.
Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning
·1847 words·9 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Huawei Noah's Ark Lab
Star-Agents automates data optimization for instruction-tuned LLMs via multi-agent collaboration, achieving a 12% average performance boost.
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training
·5133 words·25 mins·
loading
·
loading
Large Language Models
🏢 University of Hong Kong
Stacking Your Transformers accelerates LLM pre-training by leveraging smaller, pre-trained models to efficiently train larger ones, achieving significant speedups and improved performance.
SSDM: Scalable Speech Dysfluency Modeling
·2807 words·14 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 UC Berkeley
SSDM: Scalable Speech Dysfluency Modeling tackles challenges in speech dysfluency analysis by using articulatory gestures for scalable alignment, a connectionist subsequence aligner for efficient dysf…
SS1: Accelerating Inference with Fast and Expressive Sketch Structured Transform
·2142 words·11 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Rice University
SS1: A novel GPU-friendly operator accelerates deep learning inference by leveraging structured parameter sharing, achieving superior quality-efficiency tradeoffs compared to existing methods.
SpeedLoader: An I/O efficient scheme for heterogeneous and distributed LLM operation
·1914 words·9 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 National University of Singapore
SpeedLoader: A groundbreaking I/O efficient scheme dramatically boosts LLM training & inference speed on diverse hardware, even with limited resources!
Speculative Decoding with CTC-based Draft Model for LLM Inference Acceleration
·1644 words·8 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences
Boosting LLM inference speed, a CTC-based draft model significantly improves speculative decoding’s acceptance rate, leading to faster inference.
Spectral Editing of Activations for Large Language Model Alignment
·2511 words·12 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Institute for Language, Cognition and Computation, University of Edinburgh
Spectral Editing of Activations (SEA) improves large language model truthfulness and fairness by projecting input representations to maximize covariance with positive demonstrations while minimizing c…
Spectral Adapter: Fine-Tuning in Spectral Space
·3909 words·19 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 Stanford University
Spectral Adapter boosts parameter-efficient fine-tuning by incorporating pretrained weight matrices’ spectral information, enhancing efficiency and multi-adapter fusion.
SpecExec: Massively Parallel Speculative Decoding For Interactive LLM Inference on Consumer Devices
·2263 words·11 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Yandex HSE University
SpecExec achieves massively parallel speculative decoding, enabling interactive 50B+ parameter LLM inference on consumer devices at 4-6 tokens/second.
SparseLLM: Towards Global Pruning of Pre-trained Language Models
·2184 words·11 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Emory University
SparseLLM globally prunes large language models efficiently by decomposing the problem into manageable subproblems, achieving significant performance improvements, especially at high sparsity.
SpaceByte: Towards Deleting Tokenization from Large Language Modeling
·1675 words·8 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Rice University
SpaceByte: A novel byte-level decoder architecture achieving near-tokenized-model performance without tokenization!
Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space
·2028 words·10 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Technical University of Munich
Open-source LLMs are vulnerable to embedding space attacks, which efficiently bypass safety mechanisms and enable data extraction, even after unlearning.