Natural Language Processing

Thinking Forward: Memory-Efficient Federated Finetuning of Language Models

26 September 2024·4828 words·23 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Massachusetts Amherst

SPRY: A memory-efficient federated learning algorithm for finetuning LLMs on resource-constrained devices, achieving high accuracy and speed.

The Representation Landscape of Few-Shot Learning and Fine-Tuning in Large Language Models

26 September 2024·3617 words·17 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Area Science Park

LLMs use different internal structures for few-shot learning and fine-tuning, showing a transition in the middle network layers that impacts information encoding and task solving strategies.

The Mamba in the Llama: Distilling and Accelerating Hybrid Models

26 September 2024·2037 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Cornell University

This research dramatically accelerates and improves hybrid language models by distilling large Transformers into linear RNNs, achieving performance comparable to the original Transformer with signific…

The Importance of Online Data: Understanding Preference Fine-tuning via Coverage

26 September 2024·1878 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Carnegie Mellon University

Hybrid Preference Optimization (HyPO) outperforms existing offline methods for fine-tuning LLMs by leveraging both offline and online data, achieving better performance and efficiency.

The Impact of Initialization on LoRA Finetuning Dynamics

26 September 2024·2220 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 UC Berkeley

LoRA’s initialization significantly impacts finetuning; initializing matrix A randomly and B to zero yields better performance than vice-versa due to enabling larger learning rates.

The Fine-Grained Complexity of Gradient Computation for Training Large Language Models

26 September 2024·336 words·2 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Columbia University

New research precisely defines the computational limits of training large language models, revealing a sharp threshold based on parameter matrix entries, paving the way for faster algorithms.

The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More

26 September 2024·3501 words·17 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 MIT

Large language models (LLMs) struggle with factual inconsistencies (‘hallucinations’) and the ‘reversal curse,’ where information recall depends heavily on the input order. This work reframes the cur…

The Expressive Capacity of State Space Models: A Formal Language Perspective

26 September 2024·1723 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Saarland University

State-space models (SSMs) rival transformers in language modeling, but their capabilities remain unclear; this paper rigorously analyzes SSM expressivity, revealing unique strengths and limitations, i…

The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains

26 September 2024·2128 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Harvard University

Transformers learn to perform in-context learning of Markov chains hierarchically, progressing from simpler unigram strategies to more complex bigram solutions, with the presence of simpler solutions …

The Closeness of In-Context Learning and Weight Shifting for Softmax Regression

26 September 2024·2475 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Shanghai Jiao Tong University

Softmax regression reveals in-context learning’s surprising similarity to gradient descent in self-attention Transformers, showing the models’ remarkable learning capabilities.

Text2NKG: Fine-Grained N-ary Relation Extraction for N-ary relational Knowledge Graph Construction

26 September 2024·2178 words·11 mins· loading · loading

AI Generated Natural Language Processing Information Extraction 🏢 School of Computer Science, Beijing University of Posts and Telecommunications, China

Text2NKG: a novel framework for building N-ary relational knowledge graphs by performing fine-grained n-ary relation extraction, supporting multiple schemas, and achieving state-of-the-art accuracy.

Temporal Sentence Grounding with Relevance Feedback in Videos

26 September 2024·2432 words·12 mins· loading · loading

Natural Language Processing Vision-Language Models 🏢 Peking University

RaTSG network tackles Temporal Sentence Grounding with Relevance Feedback (TSG-RF) by discerning query relevance at multiple granularities before selectively grounding segments.

Teach Better or Show Smarter? On Instructions and Exemplars in Automatic Prompt Optimization

26 September 2024·4856 words·23 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Google Cloud AI

Smart prompt engineering is key to unlocking LLMs’ full potential. This paper reveals that cleverly selecting examples (exemplar optimization) can outperform optimizing instructions alone, even with S…

Talking Heads: Understanding Inter-Layer Communication in Transformer Language Models

26 September 2024·3768 words·18 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Brown University

Transformer Language Models’ (LMs) sensitivity to seemingly arbitrary prompt changes is explained by identifying low-rank communication channels between layers. By decomposing attention heads, resear…

TAIA: Large Language Models are Out-of-Distribution Data Learners

26 September 2024·2712 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Fudan University

LLMs struggle with downstream tasks using mismatched data. TAIA, a novel inference-time method, solves this by selectively using only attention parameters during inference after training all parameter…

TableRAG: Million-Token Table Understanding with Language Models

26 September 2024·2446 words·12 mins· loading · loading

Natural Language Processing Question Answering 🏢 National Taiwan University

TableRAG, a novel Retrieval-Augmented Generation framework, achieves state-of-the-art performance in large-scale table understanding by efficiently integrating schema and cell retrieval with language …

Synthetic Programming Elicitation for Text-to-Code in Very Low-Resource Programming and Formal Languages

26 September 2024·1817 words·9 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 UC Berkeley

LLMs struggle with very low-resource programming languages. SPEAC, a novel synthetic programming elicitation and compilation approach, uses an intermediate language to enable LLMs to generate syntact…

Synthesize, Partition, then Adapt: Eliciting Diverse Samples from Foundation Models

26 September 2024·1594 words·8 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Texas at Austin

The Synthesize-Partition-Adapt (SPA) framework leverages synthetic data to generate diverse, high-quality responses from foundation models, enriching user experience.

Synatra: Turning Indirect Knowledge into Direct Demonstrations for Digital Agents at Scale

26 September 2024·2845 words·14 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Carnegie Mellon University

Synatra synthesizes high-quality digital agent training data from online tutorials and web pages, significantly improving agent performance on complex web-based tasks at a fraction of the cost of huma…

Symbolic Regression with a Learned Concept Library

26 September 2024·2112 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Texas at Austin

LASR, a novel symbolic regression method, uses zero-shot LLM queries to discover and evolve abstract concepts, substantially outperforming state-of-the-art approaches and discovering a new LLM scaling…