Large Language Models

To Believe or Not to Believe Your LLM: IterativePrompting for Estimating Epistemic Uncertainty

26 September 2024·1940 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Google DeepMind

This paper introduces an innovative iterative prompting method for estimating epistemic uncertainty in LLMs, enabling reliable detection of hallucinations.

Time-Reversal Provides Unsupervised Feedback to LLMs

26 September 2024·2584 words·13 mins· loading · loading

Large Language Models 🏢 Google DeepMind

Time-reversed language models provide unsupervised feedback for improving LLMs, offering a cost-effective alternative to human feedback and enhancing LLM safety.

Thought of Search: Planning with Language Models Through The Lens of Efficiency

26 September 2024·282 words·2 mins· loading · loading

Natural Language Processing Large Language Models 🏢 IBM Research

This paper introduces ‘Thought of Search,’ a novel, efficient planning approach using LLMs that prioritizes soundness and completeness. It leverages LLMs to generate Python code for search components,…

Thinking Forward: Memory-Efficient Federated Finetuning of Language Models

26 September 2024·4828 words·23 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Massachusetts Amherst

SPRY: A memory-efficient federated learning algorithm for finetuning LLMs on resource-constrained devices, achieving high accuracy and speed.

The Representation Landscape of Few-Shot Learning and Fine-Tuning in Large Language Models

26 September 2024·3617 words·17 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Area Science Park

LLMs use different internal structures for few-shot learning and fine-tuning, showing a transition in the middle network layers that impacts information encoding and task solving strategies.

The Mamba in the Llama: Distilling and Accelerating Hybrid Models

26 September 2024·2037 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Cornell University

This research dramatically accelerates and improves hybrid language models by distilling large Transformers into linear RNNs, achieving performance comparable to the original Transformer with signific…

The Importance of Online Data: Understanding Preference Fine-tuning via Coverage

26 September 2024·1878 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Carnegie Mellon University

Hybrid Preference Optimization (HyPO) outperforms existing offline methods for fine-tuning LLMs by leveraging both offline and online data, achieving better performance and efficiency.

The Impact of Initialization on LoRA Finetuning Dynamics

26 September 2024·2220 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 UC Berkeley

LoRA’s initialization significantly impacts finetuning; initializing matrix A randomly and B to zero yields better performance than vice-versa due to enabling larger learning rates.

The Fine-Grained Complexity of Gradient Computation for Training Large Language Models

26 September 2024·336 words·2 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Columbia University

New research precisely defines the computational limits of training large language models, revealing a sharp threshold based on parameter matrix entries, paving the way for faster algorithms.

The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More

26 September 2024·3501 words·17 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 MIT

Large language models (LLMs) struggle with factual inconsistencies (‘hallucinations’) and the ‘reversal curse,’ where information recall depends heavily on the input order. This work reframes the cur…

The Expressive Capacity of State Space Models: A Formal Language Perspective

26 September 2024·1723 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Saarland University

State-space models (SSMs) rival transformers in language modeling, but their capabilities remain unclear; this paper rigorously analyzes SSM expressivity, revealing unique strengths and limitations, i…

The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains

26 September 2024·2128 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Harvard University

Transformers learn to perform in-context learning of Markov chains hierarchically, progressing from simpler unigram strategies to more complex bigram solutions, with the presence of simpler solutions …

The Closeness of In-Context Learning and Weight Shifting for Softmax Regression

26 September 2024·2475 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Shanghai Jiao Tong University

Softmax regression reveals in-context learning’s surprising similarity to gradient descent in self-attention Transformers, showing the models’ remarkable learning capabilities.

Teach Better or Show Smarter? On Instructions and Exemplars in Automatic Prompt Optimization

26 September 2024·4856 words·23 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Google Cloud AI

Smart prompt engineering is key to unlocking LLMs’ full potential. This paper reveals that cleverly selecting examples (exemplar optimization) can outperform optimizing instructions alone, even with S…

Talking Heads: Understanding Inter-Layer Communication in Transformer Language Models

26 September 2024·3768 words·18 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Brown University

Transformer Language Models’ (LMs) sensitivity to seemingly arbitrary prompt changes is explained by identifying low-rank communication channels between layers. By decomposing attention heads, resear…

TAIA: Large Language Models are Out-of-Distribution Data Learners

26 September 2024·2712 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Fudan University

LLMs struggle with downstream tasks using mismatched data. TAIA, a novel inference-time method, solves this by selectively using only attention parameters during inference after training all parameter…

Synthetic Programming Elicitation for Text-to-Code in Very Low-Resource Programming and Formal Languages

26 September 2024·1817 words·9 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 UC Berkeley

LLMs struggle with very low-resource programming languages. SPEAC, a novel synthetic programming elicitation and compilation approach, uses an intermediate language to enable LLMs to generate syntact…

Synthesize, Partition, then Adapt: Eliciting Diverse Samples from Foundation Models

26 September 2024·1594 words·8 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Texas at Austin

The Synthesize-Partition-Adapt (SPA) framework leverages synthetic data to generate diverse, high-quality responses from foundation models, enriching user experience.

Synatra: Turning Indirect Knowledge into Direct Demonstrations for Digital Agents at Scale

26 September 2024·2845 words·14 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Carnegie Mellon University

Synatra synthesizes high-quality digital agent training data from online tutorials and web pages, significantly improving agent performance on complex web-based tasks at a fraction of the cost of huma…

Symbolic Regression with a Learned Concept Library

26 September 2024·2112 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Texas at Austin

LASR, a novel symbolic regression method, uses zero-shot LLM queries to discover and evolve abstract concepts, substantially outperforming state-of-the-art approaches and discovering a new LLM scaling…