Natural Language Processing
Thinking Forward: Memory-Efficient Federated Finetuning of Language Models
·4828 words·23 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 University of Massachusetts Amherst
SPRY: A memory-efficient federated learning algorithm for finetuning LLMs on resource-constrained devices, achieving high accuracy and speed.
The Representation Landscape of Few-Shot Learning and Fine-Tuning in Large Language Models
·3617 words·17 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Area Science Park
LLMs use different internal structures for few-shot learning and fine-tuning, showing a transition in the middle network layers that impacts information encoding and task solving strategies.
The Mamba in the Llama: Distilling and Accelerating Hybrid Models
·2037 words·10 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Cornell University
This research dramatically accelerates and improves hybrid language models by distilling large Transformers into linear RNNs, achieving performance comparable to the original Transformer with signific…
The Importance of Online Data: Understanding Preference Fine-tuning via Coverage
·1878 words·9 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Carnegie Mellon University
Hybrid Preference Optimization (HyPO) outperforms existing offline methods for fine-tuning LLMs by leveraging both offline and online data, achieving better performance and efficiency.
The Impact of Initialization on LoRA Finetuning Dynamics
·2220 words·11 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 UC Berkeley
LoRA’s initialization significantly impacts finetuning; initializing matrix A randomly and B to zero yields better performance than vice-versa due to enabling larger learning rates.
The Fine-Grained Complexity of Gradient Computation for Training Large Language Models
·336 words·2 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Columbia University
New research precisely defines the computational limits of training large language models, revealing a sharp threshold based on parameter matrix entries, paving the way for faster algorithms.
The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More
·3501 words·17 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 MIT
Large language models (LLMs) struggle with factual inconsistencies (‘hallucinations’) and the ‘reversal curse,’ where information recall depends heavily on the input order. This work reframes the cur…
The Expressive Capacity of State Space Models: A Formal Language Perspective
·1723 words·9 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Saarland University
State-space models (SSMs) rival transformers in language modeling, but their capabilities remain unclear; this paper rigorously analyzes SSM expressivity, revealing unique strengths and limitations, i…
The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains
·2128 words·10 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Harvard University
Transformers learn to perform in-context learning of Markov chains hierarchically, progressing from simpler unigram strategies to more complex bigram solutions, with the presence of simpler solutions …
The Closeness of In-Context Learning and Weight Shifting for Softmax Regression
·2475 words·12 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Shanghai Jiao Tong University
Softmax regression reveals in-context learning’s surprising similarity to gradient descent in self-attention Transformers, showing the models’ remarkable learning capabilities.
Text2NKG: Fine-Grained N-ary Relation Extraction for N-ary relational Knowledge Graph Construction
·2178 words·11 mins·
loading
·
loading
AI Generated
Natural Language Processing
Information Extraction
🏢 School of Computer Science, Beijing University of Posts and Telecommunications, China
Text2NKG: a novel framework for building N-ary relational knowledge graphs by performing fine-grained n-ary relation extraction, supporting multiple schemas, and achieving state-of-the-art accuracy.
Temporal Sentence Grounding with Relevance Feedback in Videos
·2432 words·12 mins·
loading
·
loading
Natural Language Processing
Vision-Language Models
🏢 Peking University
RaTSG network tackles Temporal Sentence Grounding with Relevance Feedback (TSG-RF) by discerning query relevance at multiple granularities before selectively grounding segments.
Teach Better or Show Smarter? On Instructions and Exemplars in Automatic Prompt Optimization
·4856 words·23 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Google Cloud AI
Smart prompt engineering is key to unlocking LLMs’ full potential. This paper reveals that cleverly selecting examples (exemplar optimization) can outperform optimizing instructions alone, even with S…
Talking Heads: Understanding Inter-Layer Communication in Transformer Language Models
·3768 words·18 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Brown University
Transformer Language Models’ (LMs) sensitivity to seemingly arbitrary prompt changes is explained by identifying low-rank communication channels between layers. By decomposing attention heads, resear…
TAIA: Large Language Models are Out-of-Distribution Data Learners
·2712 words·13 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Fudan University
LLMs struggle with downstream tasks using mismatched data. TAIA, a novel inference-time method, solves this by selectively using only attention parameters during inference after training all parameter…
TableRAG: Million-Token Table Understanding with Language Models
·2446 words·12 mins·
loading
·
loading
Natural Language Processing
Question Answering
🏢 National Taiwan University
TableRAG, a novel Retrieval-Augmented Generation framework, achieves state-of-the-art performance in large-scale table understanding by efficiently integrating schema and cell retrieval with language …
Synthetic Programming Elicitation for Text-to-Code in Very Low-Resource Programming and Formal Languages
·1817 words·9 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 UC Berkeley
LLMs struggle with very low-resource programming languages. SPEAC, a novel synthetic programming elicitation and compilation approach, uses an intermediate language to enable LLMs to generate syntact…
Synthesize, Partition, then Adapt: Eliciting Diverse Samples from Foundation Models
·1594 words·8 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 University of Texas at Austin
The Synthesize-Partition-Adapt (SPA) framework leverages synthetic data to generate diverse, high-quality responses from foundation models, enriching user experience.
Synatra: Turning Indirect Knowledge into Direct Demonstrations for Digital Agents at Scale
·2845 words·14 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Carnegie Mellon University
Synatra synthesizes high-quality digital agent training data from online tutorials and web pages, significantly improving agent performance on complex web-based tasks at a fraction of the cost of huma…
Symbolic Regression with a Learned Concept Library
·2112 words·10 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 University of Texas at Austin
LASR, a novel symbolic regression method, uses zero-shot LLM queries to discover and evolve abstract concepts, substantially outperforming state-of-the-art approaches and discovering a new LLM scaling…