Natural Language Processing

TSDS: Data Selection for Task-Specific Model Finetuning

26 September 2024·2005 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Wisconsin-Madison

TSDS: A novel framework selects optimal training data for efficient large language model finetuning using only a few examples, boosting performance.

Truth is Universal: Robust Detection of Lies in LLMs

26 September 2024·4200 words·20 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Heidelberg University

LLM lie detectors fail to generalize; this paper presents a robust method achieving 94% accuracy by identifying a universal two-dimensional truth subspace, separating true/false statements across vari…

Tree of Attacks: Jailbreaking Black-Box LLMs Automatically

26 September 2024·1948 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Yale University

TAP: automated jailbreaking of black-box LLMs with high success rates, using fewer queries than previous methods.

TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation

26 September 2024·1866 words·9 mins· loading · loading

Natural Language Processing Machine Translation 🏢 Microsoft

TransVIP: groundbreaking speech-to-speech translation system preserving voice & isochrony, outperforming current state-of-the-art models!

Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models

26 September 2024·2720 words·13 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Carnegie Mellon University

MOHAWK: Distilling Transformers’ quadratic knowledge into faster subquadratic SSMs, achieving state-of-the-art performance with <1% of training data!

Transformers Represent Belief State Geometry in their Residual Stream

26 September 2024·1739 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Simplex

Transformers encode information beyond next-token prediction by linearly representing belief state geometry in their residual stream, even with complex fractal structures.

Transformers need glasses! Information over-squashing in language tasks

26 September 2024·3003 words·15 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 University of Oxford

Large language models (LLMs) suffer from information loss due to representational collapse and over-squashing, causing failures in simple tasks; this paper provides theoretical analysis and practical …

Transformers Can Do Arithmetic with the Right Embeddings

26 September 2024·3154 words·15 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Maryland

Researchers enhanced transformer performance on arithmetic tasks by introducing Abacus Embeddings, which encode each digit’s position, enabling improved generalization and unlocking multi-step reasoni…

Training Dynamics of Transformers to Recognize Word Co-occurrence via Gradient Flow Analysis

26 September 2024·426 words·2 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Princeton University

Researchers reveal how transformers learn word co-occurrence using a novel gradient flow analysis, uncovering a two-phase training process that leads to near-minimum loss and improved model performanc…

Train-Attention: Meta-Learning Where to Focus in Continual Knowledge Learning

26 September 2024·330 words·2 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Yonsei University

Train-Attention (TAALM) tackles catastrophic forgetting in LLMs by dynamically weighting tokens during training, boosting learning efficiency and knowledge retention, outperforming existing methods on…

Towards Understanding How Transformers Learn In-context Through a Representation Learning Lens

26 September 2024·2618 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Renmin University of China

Transformers’ in-context learning (ICL) is explained using representation learning, revealing its ICL process as gradient descent on a dual model and offering modifiable attention layers for enhanced …

Towards Robust Multimodal Sentiment Analysis with Incomplete Data

26 September 2024·3583 words·17 mins· loading · loading

AI Generated Natural Language Processing Sentiment Analysis 🏢 School of Data Science, the Chinese University of Hong Kong, Shenzhen

Robust Multimodal Sentiment Analysis (MSA) model, Language-dominated Noise-resistant Learning Network (LNLN), handles incomplete data by correcting dominant modality (language) and using a multimodal …

Towards Neuron Attributions in Multi-Modal Large Language Models

26 September 2024·1551 words·8 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Science and Technology of China

NAM: a novel neuron attribution method for MLLMs, revealing modality-specific semantic knowledge and enabling multi-modal knowledge editing.

Towards a theory of how the structure of language is acquired by deep neural networks

26 September 2024·3238 words·16 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 École Polytechnique Fédérale De Lausanne

Deep learning models learn language structure through next-token prediction, but the data requirements remain unclear. This paper reveals that the effective context window, determining learning capaci…

Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics

26 September 2024·2572 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 UC Berkeley

LLMs struggle with simple logical reasoning due to the ‘reversal curse.’ This paper reveals that weight asymmetry during training is the culprit, offering a new theoretical perspective and potential s…

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

26 September 2024·2046 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Tencent AI Lab

ALPHALLM boosts LLM performance in complex reasoning tasks by using imagination, search, and criticism to create a self-improving loop, eliminating the need for extra training data.

Toward Efficient Inference for Mixture of Experts

26 September 2024·2411 words·12 mins· loading · loading

Natural Language Processing Machine Translation 🏢 Duke University

Unlocking the speed and efficiency of Mixture-of-Expert models, this research unveils novel optimization techniques, achieving dramatic improvements in inference throughput and resource usage.

Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis

26 September 2024·2366 words·12 mins· loading · loading

Natural Language Processing Text Generation 🏢 Nankai University

ToMe: a novel training-free method dramatically improves semantic binding in text-to-image synthesis by intelligently merging related tokens, ensuring accurate alignment between generated images and t…

To Believe or Not to Believe Your LLM: IterativePrompting for Estimating Epistemic Uncertainty

26 September 2024·1940 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Google DeepMind

This paper introduces an innovative iterative prompting method for estimating epistemic uncertainty in LLMs, enabling reliable detection of hallucinations.

Thought of Search: Planning with Language Models Through The Lens of Efficiency

26 September 2024·282 words·2 mins· loading · loading

Natural Language Processing Large Language Models 🏢 IBM Research

This paper introduces ‘Thought of Search,’ a novel, efficient planning approach using LLMs that prioritizes soundness and completeness. It leverages LLMs to generate Python code for search components,…