Skip to main content

Large Language Models

To Believe or Not to Believe Your LLM: IterativePrompting for Estimating Epistemic Uncertainty
·1940 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Google DeepMind
This paper introduces an innovative iterative prompting method for estimating epistemic uncertainty in LLMs, enabling reliable detection of hallucinations.
Time-Reversal Provides Unsupervised Feedback to LLMs
·2584 words·13 mins· loading · loading
Large Language Models 🏢 Google DeepMind
Time-reversed language models provide unsupervised feedback for improving LLMs, offering a cost-effective alternative to human feedback and enhancing LLM safety.
Thought of Search: Planning with Language Models Through The Lens of Efficiency
·282 words·2 mins· loading · loading
Natural Language Processing Large Language Models 🏢 IBM Research
This paper introduces ‘Thought of Search,’ a novel, efficient planning approach using LLMs that prioritizes soundness and completeness. It leverages LLMs to generate Python code for search components,…
Thinking Forward: Memory-Efficient Federated Finetuning of Language Models
·4828 words·23 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Massachusetts Amherst
SPRY: A memory-efficient federated learning algorithm for finetuning LLMs on resource-constrained devices, achieving high accuracy and speed.
The Representation Landscape of Few-Shot Learning and Fine-Tuning in Large Language Models
·3617 words·17 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Area Science Park
LLMs use different internal structures for few-shot learning and fine-tuning, showing a transition in the middle network layers that impacts information encoding and task solving strategies.
The Mamba in the Llama: Distilling and Accelerating Hybrid Models
·2037 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Cornell University
This research dramatically accelerates and improves hybrid language models by distilling large Transformers into linear RNNs, achieving performance comparable to the original Transformer with signific…
The Importance of Online Data: Understanding Preference Fine-tuning via Coverage
·1878 words·9 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Carnegie Mellon University
Hybrid Preference Optimization (HyPO) outperforms existing offline methods for fine-tuning LLMs by leveraging both offline and online data, achieving better performance and efficiency.
The Impact of Initialization on LoRA Finetuning Dynamics
·2220 words·11 mins· loading · loading
Natural Language Processing Large Language Models 🏢 UC Berkeley
LoRA’s initialization significantly impacts finetuning; initializing matrix A randomly and B to zero yields better performance than vice-versa due to enabling larger learning rates.
The Fine-Grained Complexity of Gradient Computation for Training Large Language Models
·336 words·2 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Columbia University
New research precisely defines the computational limits of training large language models, revealing a sharp threshold based on parameter matrix entries, paving the way for faster algorithms.
The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More
·3501 words·17 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 MIT
Large language models (LLMs) struggle with factual inconsistencies (‘hallucinations’) and the ‘reversal curse,’ where information recall depends heavily on the input order. This work reframes the cur…
The Expressive Capacity of State Space Models: A Formal Language Perspective
·1723 words·9 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Saarland University
State-space models (SSMs) rival transformers in language modeling, but their capabilities remain unclear; this paper rigorously analyzes SSM expressivity, revealing unique strengths and limitations, i…
The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains
·2128 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Harvard University
Transformers learn to perform in-context learning of Markov chains hierarchically, progressing from simpler unigram strategies to more complex bigram solutions, with the presence of simpler solutions …
The Closeness of In-Context Learning and Weight Shifting for Softmax Regression
·2475 words·12 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Shanghai Jiao Tong University
Softmax regression reveals in-context learning’s surprising similarity to gradient descent in self-attention Transformers, showing the models’ remarkable learning capabilities.
Teach Better or Show Smarter? On Instructions and Exemplars in Automatic Prompt Optimization
·4856 words·23 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Google Cloud AI
Smart prompt engineering is key to unlocking LLMs’ full potential. This paper reveals that cleverly selecting examples (exemplar optimization) can outperform optimizing instructions alone, even with S…
Talking Heads: Understanding Inter-Layer Communication in Transformer Language Models
·3768 words·18 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Brown University
Transformer Language Models’ (LMs) sensitivity to seemingly arbitrary prompt changes is explained by identifying low-rank communication channels between layers. By decomposing attention heads, resear…
TAIA: Large Language Models are Out-of-Distribution Data Learners
·2712 words·13 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Fudan University
LLMs struggle with downstream tasks using mismatched data. TAIA, a novel inference-time method, solves this by selectively using only attention parameters during inference after training all parameter…
Synthetic Programming Elicitation for Text-to-Code in Very Low-Resource Programming and Formal Languages
·1817 words·9 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 UC Berkeley
LLMs struggle with very low-resource programming languages. SPEAC, a novel synthetic programming elicitation and compilation approach, uses an intermediate language to enable LLMs to generate syntact…
Synthesize, Partition, then Adapt: Eliciting Diverse Samples from Foundation Models
·1594 words·8 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Texas at Austin
The Synthesize-Partition-Adapt (SPA) framework leverages synthetic data to generate diverse, high-quality responses from foundation models, enriching user experience.
Synatra: Turning Indirect Knowledge into Direct Demonstrations for Digital Agents at Scale
·2845 words·14 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Carnegie Mellon University
Synatra synthesizes high-quality digital agent training data from online tutorials and web pages, significantly improving agent performance on complex web-based tasks at a fraction of the cost of huma…
Symbolic Regression with a Learned Concept Library
·2112 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Texas at Austin
LASR, a novel symbolic regression method, uses zero-shot LLM queries to discover and evolve abstract concepts, substantially outperforming state-of-the-art approaches and discovering a new LLM scaling…