Skip to main content

Natural Language Processing

TSDS: Data Selection for Task-Specific Model Finetuning
·2005 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏒 University of Wisconsin-Madison
TSDS: A novel framework selects optimal training data for efficient large language model finetuning using only a few examples, boosting performance.
Truth is Universal: Robust Detection of Lies in LLMs
·4200 words·20 mins· loading · loading
Natural Language Processing Large Language Models 🏒 Heidelberg University
LLM lie detectors fail to generalize; this paper presents a robust method achieving 94% accuracy by identifying a universal two-dimensional truth subspace, separating true/false statements across vari…
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
·1948 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏒 Yale University
TAP: automated jailbreaking of black-box LLMs with high success rates, using fewer queries than previous methods.
TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation
·1866 words·9 mins· loading · loading
Natural Language Processing Machine Translation 🏒 Microsoft
TransVIP: groundbreaking speech-to-speech translation system preserving voice & isochrony, outperforming current state-of-the-art models!
Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
·2720 words·13 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏒 Carnegie Mellon University
MOHAWK: Distilling Transformers’ quadratic knowledge into faster subquadratic SSMs, achieving state-of-the-art performance with <1% of training data!
Transformers Represent Belief State Geometry in their Residual Stream
·1739 words·9 mins· loading · loading
Natural Language Processing Large Language Models 🏒 Simplex
Transformers encode information beyond next-token prediction by linearly representing belief state geometry in their residual stream, even with complex fractal structures.
Transformers need glasses! Information over-squashing in language tasks
·3003 words·15 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏒 University of Oxford
Large language models (LLMs) suffer from information loss due to representational collapse and over-squashing, causing failures in simple tasks; this paper provides theoretical analysis and practical …
Transformers Can Do Arithmetic with the Right Embeddings
·3154 words·15 mins· loading · loading
Natural Language Processing Large Language Models 🏒 University of Maryland
Researchers enhanced transformer performance on arithmetic tasks by introducing Abacus Embeddings, which encode each digit’s position, enabling improved generalization and unlocking multi-step reasoni…
Training Dynamics of Transformers to Recognize Word Co-occurrence via Gradient Flow Analysis
·426 words·2 mins· loading · loading
Natural Language Processing Large Language Models 🏒 Princeton University
Researchers reveal how transformers learn word co-occurrence using a novel gradient flow analysis, uncovering a two-phase training process that leads to near-minimum loss and improved model performanc…
Train-Attention: Meta-Learning Where to Focus in Continual Knowledge Learning
·330 words·2 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏒 Yonsei University
Train-Attention (TAALM) tackles catastrophic forgetting in LLMs by dynamically weighting tokens during training, boosting learning efficiency and knowledge retention, outperforming existing methods on…
Towards Understanding How Transformers Learn In-context Through a Representation Learning Lens
·2618 words·13 mins· loading · loading
Natural Language Processing Large Language Models 🏒 Renmin University of China
Transformers’ in-context learning (ICL) is explained using representation learning, revealing its ICL process as gradient descent on a dual model and offering modifiable attention layers for enhanced …
Towards Robust Multimodal Sentiment Analysis with Incomplete Data
·3583 words·17 mins· loading · loading
AI Generated Natural Language Processing Sentiment Analysis 🏒 School of Data Science, the Chinese University of Hong Kong, Shenzhen
Robust Multimodal Sentiment Analysis (MSA) model, Language-dominated Noise-resistant Learning Network (LNLN), handles incomplete data by correcting dominant modality (language) and using a multimodal …
Towards Neuron Attributions in Multi-Modal Large Language Models
·1551 words·8 mins· loading · loading
Natural Language Processing Large Language Models 🏒 University of Science and Technology of China
NAM: a novel neuron attribution method for MLLMs, revealing modality-specific semantic knowledge and enabling multi-modal knowledge editing.
Towards a theory of how the structure of language is acquired by deep neural networks
·3238 words·16 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏒 Γ‰cole Polytechnique FΓ©dΓ©rale De Lausanne
Deep learning models learn language structure through next-token prediction, but the data requirements remain unclear. This paper reveals that the effective context window, determining learning capaci…
Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics
·2572 words·13 mins· loading · loading
Natural Language Processing Large Language Models 🏒 UC Berkeley
LLMs struggle with simple logical reasoning due to the ‘reversal curse.’ This paper reveals that weight asymmetry during training is the culprit, offering a new theoretical perspective and potential s…
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
·2046 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏒 Tencent AI Lab
ALPHALLM boosts LLM performance in complex reasoning tasks by using imagination, search, and criticism to create a self-improving loop, eliminating the need for extra training data.
Toward Efficient Inference for Mixture of Experts
·2411 words·12 mins· loading · loading
Natural Language Processing Machine Translation 🏒 Duke University
Unlocking the speed and efficiency of Mixture-of-Expert models, this research unveils novel optimization techniques, achieving dramatic improvements in inference throughput and resource usage.
Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis
·2366 words·12 mins· loading · loading
Natural Language Processing Text Generation 🏒 Nankai University
ToMe: a novel training-free method dramatically improves semantic binding in text-to-image synthesis by intelligently merging related tokens, ensuring accurate alignment between generated images and t…
To Believe or Not to Believe Your LLM: IterativePrompting for Estimating Epistemic Uncertainty
·1940 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏒 Google DeepMind
This paper introduces an innovative iterative prompting method for estimating epistemic uncertainty in LLMs, enabling reliable detection of hallucinations.
Thought of Search: Planning with Language Models Through The Lens of Efficiency
·282 words·2 mins· loading · loading
Natural Language Processing Large Language Models 🏒 IBM Research
This paper introduces ‘Thought of Search,’ a novel, efficient planning approach using LLMs that prioritizes soundness and completeness. It leverages LLMs to generate Python code for search components,…