Natural Language Processing
SHED: Shapley-Based Automated Dataset Refinement for Instruction Fine-Tuning
·2560 words·13 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
π’ University of Maryland
SHED, a Shapley value-based framework, efficiently refines instruction-tuning datasets for LLMs, producing high-performing subsets, only 10% of original size, that transfer well across different model…
SGLang: Efficient Execution of Structured Language Model Programs
·1898 words·9 mins·
loading
·
loading
Natural Language Processing
Large Language Models
π’ UC Berkeley
SGLang: A new system boosts LLM program execution speed by up to 6.4x, simplifying complex LLM application programming.
Separations in the Representational Capabilities of Transformers and Recurrent Architectures
·1587 words·8 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
π’ University of Oxford
Transformers and RNNs show contrasting representational capabilities: Transformers excel at tasks requiring associative recall, while RNNs are better suited for hierarchical language processing. This …
SemCoder: Training Code Language Models with Comprehensive Semantics Reasoning
·1972 words·10 mins·
loading
·
loading
Natural Language Processing
Large Language Models
π’ Columbia University
SEMCODER: A novel 6.7B parameter code LLM surpasses GPT-3.5-turbo’s performance on code generation and execution reasoning by employing ‘monologue reasoning’βtraining the model to verbally explain cod…
Semantics and Spatiality of Emergent Communication
·2868 words·14 mins·
loading
·
loading
AI Generated
Natural Language Processing
Emergent Communication
π’ Technion - Israel Institute of Technology
Emergent communication protocols are surprisingly inconsistent; this paper proves reconstruction-based objectives yield semantically consistent protocols, unlike discrimination-based ones, highlightin…
Semantic Routing via Autoregressive Modeling
·2894 words·14 mins·
loading
·
loading
Natural Language Processing
AI Applications
π’ Google Research
Learning-based semantic routing, a scalable approach to route planning using rich user queries, is introduced, accompanied by a large-scale public benchmark and a proof-of-concept model demonstrating …
SelfCodeAlign: Self-Alignment for Code Generation
·1983 words·10 mins·
loading
·
loading
Natural Language Processing
Large Language Models
π’ University of Illinois Urbana-Champaign
SelfCodeAlign is a novel self-alignment method for code generation LLMs that surpasses existing methods by avoiding reliance on expensive human annotation or proprietary LLMs. The method achieves thi…
Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models
·2366 words·12 mins·
loading
·
loading
Natural Language Processing
Speech Recognition
π’ NVIDIA Research
STAR, a novel unsupervised adaptation framework, drastically improves automatic speech recognition (ASR) robustness across diverse domains using only unlabeled data and outperforms existing self-train…
Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels
·5609 words·27 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
π’ Stanford University
SAMI: Self-Supervised Alignment with Mutual Information, effectively teaches language models to follow principles without human preference labels by maximizing the mutual information between principle…
Self-Retrieval: End-to-End Information Retrieval with One Large Language Model
·2148 words·11 mins·
loading
·
loading
AI Generated
Natural Language Processing
Information Retrieval
π’ Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences
Self-Retrieval revolutionizes information retrieval by unifying indexing, retrieval, and reranking within a single large language model, achieving significantly improved performance.
Self-playing Adversarial Language Game Enhances LLM Reasoning
·2197 words·11 mins·
loading
·
loading
Natural Language Processing
Large Language Models
π’ Tencent AI Lab
Self-play adversarial language game boosts LLM reasoning!
Self-Guiding Exploration for Combinatorial Problems
·2441 words·12 mins·
loading
·
loading
Natural Language Processing
Large Language Models
π’ MBZUAI
LLMs excel at reasoning tasks, but their application to combinatorial problems (CPs) is underexplored. This paper introduces Self-Guiding Exploration (SGE), a novel prompting strategy that significan…
SELF-DISCOVER: Large Language Models Self-Compose Reasoning Structures
·2441 words·12 mins·
loading
·
loading
Natural Language Processing
Large Language Models
π’ Google DeepMind
LLMs self-discover optimal reasoning structures for complex problems, boosting performance by up to 32% compared to existing methods.
Selective Attention: Enhancing Transformer through Principled Context Control
·2002 words·10 mins·
loading
·
loading
Natural Language Processing
Large Language Models
π’ University of Michigan
Enhance Transformer models via Selective Self-Attention (SSA), a principled context control method that boosts accuracy and efficiency.
SelectIT: Selective Instruction Tuning for LLMs via Uncertainty-Aware Self-Reflection
·3120 words·15 mins·
loading
·
loading
Natural Language Processing
Large Language Models
π’ Institute of Computing and Intelligence, Harbin Institute of Technology, Shenzhen, China
SelectIT leverages LLMs’ intrinsic uncertainty to efficiently select high-quality instruction tuning data, enhancing model performance without extra resources.
Segmenting Watermarked Texts From Language Models
·2577 words·13 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
π’ Texas A&M University
This paper presents novel statistical methods to reliably watermark and segment LLMs-generated text, ensuring source traceability even after user modifications.
Search for Efficient Large Language Models
·2477 words·12 mins·
loading
·
loading
Natural Language Processing
Large Language Models
π’ Northeastern University
Training-free architecture search finds optimal subnets in LLMs, boosting inference speed and slashing memory needs without retraining.
SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training
·2596 words·13 mins·
loading
·
loading
Natural Language Processing
Large Language Models
π’ Indiana University
SDP4Bit achieves up to 4.08x speedup in LLM training by quantizing weight differences and gradients to ~4 bits, maintaining accuracy.
Scaling Sign Language Translation
·4741 words·23 mins·
loading
·
loading
AI Generated
Natural Language Processing
Machine Translation
π’ Google DeepMind
Researchers dramatically improved sign language translation by scaling up data, model size, and the number of languages, achieving state-of-the-art results.
Scaling Retrieval-Based Language Models with a Trillion-Token Datastore
·4019 words·19 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
π’ University of Washington
Massive language models improve with bigger datastores at inference time. A 1.4 trillion-token datastore, MASSIVEDS, shows that retrieval-based LMs outperform larger, solely-trained models on knowled…