Natural Language Processing

SHED: Shapley-Based Automated Dataset Refinement for Instruction Fine-Tuning

26 September 2024·2560 words·13 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 University of Maryland

SHED, a Shapley value-based framework, efficiently refines instruction-tuning datasets for LLMs, producing high-performing subsets, only 10% of original size, that transfer well across different model…

SGLang: Efficient Execution of Structured Language Model Programs

26 September 2024·1898 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 UC Berkeley

SGLang: A new system boosts LLM program execution speed by up to 6.4x, simplifying complex LLM application programming.

Separations in the Representational Capabilities of Transformers and Recurrent Architectures

26 September 2024·1587 words·8 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 University of Oxford

Transformers and RNNs show contrasting representational capabilities: Transformers excel at tasks requiring associative recall, while RNNs are better suited for hierarchical language processing. This …

SemCoder: Training Code Language Models with Comprehensive Semantics Reasoning

26 September 2024·1972 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Columbia University

SEMCODER: A novel 6.7B parameter code LLM surpasses GPT-3.5-turbo’s performance on code generation and execution reasoning by employing ‘monologue reasoning’—training the model to verbally explain cod…

Semantics and Spatiality of Emergent Communication

26 September 2024·2868 words·14 mins· loading · loading

AI Generated Natural Language Processing Emergent Communication 🏢 Technion - Israel Institute of Technology

Emergent communication protocols are surprisingly inconsistent; this paper proves reconstruction-based objectives yield semantically consistent protocols, unlike discrimination-based ones, highlightin…

Semantic Routing via Autoregressive Modeling

26 September 2024·2894 words·14 mins· loading · loading

Natural Language Processing AI Applications 🏢 Google Research

Learning-based semantic routing, a scalable approach to route planning using rich user queries, is introduced, accompanied by a large-scale public benchmark and a proof-of-concept model demonstrating …

SelfCodeAlign: Self-Alignment for Code Generation

26 September 2024·1983 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Illinois Urbana-Champaign

SelfCodeAlign is a novel self-alignment method for code generation LLMs that surpasses existing methods by avoiding reliance on expensive human annotation or proprietary LLMs. The method achieves thi…

Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models

26 September 2024·2366 words·12 mins· loading · loading

Natural Language Processing Speech Recognition 🏢 NVIDIA Research

STAR, a novel unsupervised adaptation framework, drastically improves automatic speech recognition (ASR) robustness across diverse domains using only unlabeled data and outperforms existing self-train…

Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels

26 September 2024·5609 words·27 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Stanford University

SAMI: Self-Supervised Alignment with Mutual Information, effectively teaches language models to follow principles without human preference labels by maximizing the mutual information between principle…

Self-Retrieval: End-to-End Information Retrieval with One Large Language Model

26 September 2024·2148 words·11 mins· loading · loading

AI Generated Natural Language Processing Information Retrieval 🏢 Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences

Self-Retrieval revolutionizes information retrieval by unifying indexing, retrieval, and reranking within a single large language model, achieving significantly improved performance.

Self-playing Adversarial Language Game Enhances LLM Reasoning

26 September 2024·2197 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Tencent AI Lab

Self-play adversarial language game boosts LLM reasoning!

Self-Guiding Exploration for Combinatorial Problems

26 September 2024·2441 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 MBZUAI

LLMs excel at reasoning tasks, but their application to combinatorial problems (CPs) is underexplored. This paper introduces Self-Guiding Exploration (SGE), a novel prompting strategy that significan…

SELF-DISCOVER: Large Language Models Self-Compose Reasoning Structures

26 September 2024·2441 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Google DeepMind

LLMs self-discover optimal reasoning structures for complex problems, boosting performance by up to 32% compared to existing methods.

Selective Attention: Enhancing Transformer through Principled Context Control

26 September 2024·2002 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Michigan

Enhance Transformer models via Selective Self-Attention (SSA), a principled context control method that boosts accuracy and efficiency.

SelectIT: Selective Instruction Tuning for LLMs via Uncertainty-Aware Self-Reflection

26 September 2024·3120 words·15 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Institute of Computing and Intelligence, Harbin Institute of Technology, Shenzhen, China

SelectIT leverages LLMs’ intrinsic uncertainty to efficiently select high-quality instruction tuning data, enhancing model performance without extra resources.

Segmenting Watermarked Texts From Language Models

26 September 2024·2577 words·13 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Texas A&M University

This paper presents novel statistical methods to reliably watermark and segment LLMs-generated text, ensuring source traceability even after user modifications.

Search for Efficient Large Language Models

26 September 2024·2477 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Northeastern University

Training-free architecture search finds optimal subnets in LLMs, boosting inference speed and slashing memory needs without retraining.

SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training

26 September 2024·2596 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Indiana University

SDP4Bit achieves up to 4.08x speedup in LLM training by quantizing weight differences and gradients to ~4 bits, maintaining accuracy.

Scaling Sign Language Translation

26 September 2024·4741 words·23 mins· loading · loading

AI Generated Natural Language Processing Machine Translation 🏢 Google DeepMind

Researchers dramatically improved sign language translation by scaling up data, model size, and the number of languages, achieving state-of-the-art results.

Scaling Retrieval-Based Language Models with a Trillion-Token Datastore

26 September 2024·4019 words·19 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 University of Washington

Massive language models improve with bigger datastores at inference time. A 1.4 trillion-token datastore, MASSIVEDS, shows that retrieval-based LMs outperform larger, solely-trained models on knowled…