Large Language Models
SnapKV: LLM Knows What You are Looking for Before Generation
·2730 words·13 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
π’ University of Illinois Urbana-Champaign
SnapKV: Slashing LLM memory usage & boosting speed via smart KV cache compression!
Smoothie: Label Free Language Model Routing
·3245 words·16 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
π’ Stanford University
SMOOTHIE: Label-free LLM routing achieves up to 10% accuracy gains by using a latent variable model to estimate LLM quality without labeled data.
SmallToLarge (S2L): Scalable Data Selection for Fine-tuning Large Language Models by Summarizing Training Trajectories of Small Models
·2599 words·13 mins·
loading
·
loading
Natural Language Processing
Large Language Models
π’ UC Los Angeles
SMALLTOLARGE (S2L) revolutionizes large language model (LLM) fine-tuning by using a small model to summarize training loss trajectories, enabling efficient data selection for larger models.
SLTrain: a sparse plus low rank approach for parameter and memory efficient pretraining
·4422 words·21 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
π’ RIKEN AIP
SLTrain: Sparsity+low-rank pretraining boosts LLM efficiency by up to 73% memory reduction without performance loss!
SlimGPT: Layer-wise Structured Pruning for Large Language Models
·2966 words·14 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
π’ Alibaba Group
SlimGPT: Achieve near-optimal LLM structured pruning via Batched Greedy Pruning and Incremental Pruning Ratio, improving efficiency without sacrificing accuracy.
SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Models
·2353 words·12 mins·
loading
·
loading
Natural Language Processing
Large Language Models
π’ Google Research
Self Logits Evolution Decoding (SLED) boosts LLM factuality by up to 20% without extra data or fine-tuning!
SIRIUS : Contexual Sparisty with Correction for Efficient LLMs
·5392 words·26 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
π’ Carnegie Mellon University
SIRIUS: A novel correction mechanism boosts the efficiency of contextually sparse LLMs for complex reasoning tasks, achieving significant latency reduction.
SimPO: Simple Preference Optimization with a Reference-Free Reward
·3091 words·15 mins·
loading
·
loading
Natural Language Processing
Large Language Models
π’ Princeton University
SimPO: a simpler, reference-free reward algorithm significantly outperforming existing offline preference optimization methods, achieving higher accuracy and efficiency in aligning LLMs with human pre…
Simple and Effective Masked Diffusion Language Models
·2145 words·11 mins·
loading
·
loading
Natural Language Processing
Large Language Models
π’ Cornell Tech
Simple masked discrete diffusion models achieve state-of-the-art language modeling results, closing the performance gap with autoregressive methods by using a novel training recipe and a Rao-Blackwell…
Should We Really Edit Language Models? On the Evaluation of Edited Language Models
·3638 words·18 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
π’ Hong Kong University of Science and Technology
Language model editing’s limitations exposed: Scaling current methods leads to knowledge loss and compromised safety, urging research into more robust techniques.
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization
·3020 words·15 mins·
loading
·
loading
Natural Language Processing
Large Language Models
π’ Google DeepMind
ShiftAddLLM accelerates pretrained LLMs via post-training, multiplication-less reparameterization, achieving significant memory and energy reductions with comparable or better accuracy than existing m…
SHED: Shapley-Based Automated Dataset Refinement for Instruction Fine-Tuning
·2560 words·13 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
π’ University of Maryland
SHED, a Shapley value-based framework, efficiently refines instruction-tuning datasets for LLMs, producing high-performing subsets, only 10% of original size, that transfer well across different model…
SGLang: Efficient Execution of Structured Language Model Programs
·1898 words·9 mins·
loading
·
loading
Natural Language Processing
Large Language Models
π’ UC Berkeley
SGLang: A new system boosts LLM program execution speed by up to 6.4x, simplifying complex LLM application programming.
Sequoia: Scalable and Robust Speculative Decoding
·2372 words·12 mins·
loading
·
loading
Large Language Models
π’ Carnegie Mellon University
SEQUOIA: A novel algorithm boosts Large Language Model (LLM) inference speed by up to 9.5x using a scalable and robust speculative decoding approach!
Separations in the Representational Capabilities of Transformers and Recurrent Architectures
·1587 words·8 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
π’ University of Oxford
Transformers and RNNs show contrasting representational capabilities: Transformers excel at tasks requiring associative recall, while RNNs are better suited for hierarchical language processing. This …
SemCoder: Training Code Language Models with Comprehensive Semantics Reasoning
·1972 words·10 mins·
loading
·
loading
Natural Language Processing
Large Language Models
π’ Columbia University
SEMCODER: A novel 6.7B parameter code LLM surpasses GPT-3.5-turbo’s performance on code generation and execution reasoning by employing ‘monologue reasoning’βtraining the model to verbally explain cod…
SelfCodeAlign: Self-Alignment for Code Generation
·1983 words·10 mins·
loading
·
loading
Natural Language Processing
Large Language Models
π’ University of Illinois Urbana-Champaign
SelfCodeAlign is a novel self-alignment method for code generation LLMs that surpasses existing methods by avoiding reliance on expensive human annotation or proprietary LLMs. The method achieves thi…
Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels
·5609 words·27 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
π’ Stanford University
SAMI: Self-Supervised Alignment with Mutual Information, effectively teaches language models to follow principles without human preference labels by maximizing the mutual information between principle…
Self-playing Adversarial Language Game Enhances LLM Reasoning
·2197 words·11 mins·
loading
·
loading
Natural Language Processing
Large Language Models
π’ Tencent AI Lab
Self-play adversarial language game boosts LLM reasoning!
Self-Guiding Exploration for Combinatorial Problems
·2441 words·12 mins·
loading
·
loading
Natural Language Processing
Large Language Models
π’ MBZUAI
LLMs excel at reasoning tasks, but their application to combinatorial problems (CPs) is underexplored. This paper introduces Self-Guiding Exploration (SGE), a novel prompting strategy that significan…