Natural Language Processing
TSDS: Data Selection for Task-Specific Model Finetuning
·2005 words·10 mins·
loading
·
loading
Natural Language Processing
Large Language Models
π’ University of Wisconsin-Madison
TSDS: A novel framework selects optimal training data for efficient large language model finetuning using only a few examples, boosting performance.
Truth is Universal: Robust Detection of Lies in LLMs
·4200 words·20 mins·
loading
·
loading
Natural Language Processing
Large Language Models
π’ Heidelberg University
LLM lie detectors fail to generalize; this paper presents a robust method achieving 94% accuracy by identifying a universal two-dimensional truth subspace, separating true/false statements across vari…
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
·1948 words·10 mins·
loading
·
loading
Natural Language Processing
Large Language Models
π’ Yale University
TAP: automated jailbreaking of black-box LLMs with high success rates, using fewer queries than previous methods.
TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation
·1866 words·9 mins·
loading
·
loading
Natural Language Processing
Machine Translation
π’ Microsoft
TransVIP: groundbreaking speech-to-speech translation system preserving voice & isochrony, outperforming current state-of-the-art models!
Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
·2720 words·13 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
π’ Carnegie Mellon University
MOHAWK: Distilling Transformers’ quadratic knowledge into faster subquadratic SSMs, achieving state-of-the-art performance with <1% of training data!
Transformers Represent Belief State Geometry in their Residual Stream
·1739 words·9 mins·
loading
·
loading
Natural Language Processing
Large Language Models
π’ Simplex
Transformers encode information beyond next-token prediction by linearly representing belief state geometry in their residual stream, even with complex fractal structures.
Transformers need glasses! Information over-squashing in language tasks
·3003 words·15 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
π’ University of Oxford
Large language models (LLMs) suffer from information loss due to representational collapse and over-squashing, causing failures in simple tasks; this paper provides theoretical analysis and practical …
Transformers Can Do Arithmetic with the Right Embeddings
·3154 words·15 mins·
loading
·
loading
Natural Language Processing
Large Language Models
π’ University of Maryland
Researchers enhanced transformer performance on arithmetic tasks by introducing Abacus Embeddings, which encode each digit’s position, enabling improved generalization and unlocking multi-step reasoni…
Training Dynamics of Transformers to Recognize Word Co-occurrence via Gradient Flow Analysis
·426 words·2 mins·
loading
·
loading
Natural Language Processing
Large Language Models
π’ Princeton University
Researchers reveal how transformers learn word co-occurrence using a novel gradient flow analysis, uncovering a two-phase training process that leads to near-minimum loss and improved model performanc…
Train-Attention: Meta-Learning Where to Focus in Continual Knowledge Learning
·330 words·2 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
π’ Yonsei University
Train-Attention (TAALM) tackles catastrophic forgetting in LLMs by dynamically weighting tokens during training, boosting learning efficiency and knowledge retention, outperforming existing methods on…
Towards Understanding How Transformers Learn In-context Through a Representation Learning Lens
·2618 words·13 mins·
loading
·
loading
Natural Language Processing
Large Language Models
π’ Renmin University of China
Transformers’ in-context learning (ICL) is explained using representation learning, revealing its ICL process as gradient descent on a dual model and offering modifiable attention layers for enhanced …
Towards Robust Multimodal Sentiment Analysis with Incomplete Data
·3583 words·17 mins·
loading
·
loading
AI Generated
Natural Language Processing
Sentiment Analysis
π’ School of Data Science, the Chinese University of Hong Kong, Shenzhen
Robust Multimodal Sentiment Analysis (MSA) model, Language-dominated Noise-resistant Learning Network (LNLN), handles incomplete data by correcting dominant modality (language) and using a multimodal …
Towards Neuron Attributions in Multi-Modal Large Language Models
·1551 words·8 mins·
loading
·
loading
Natural Language Processing
Large Language Models
π’ University of Science and Technology of China
NAM: a novel neuron attribution method for MLLMs, revealing modality-specific semantic knowledge and enabling multi-modal knowledge editing.
Towards a theory of how the structure of language is acquired by deep neural networks
·3238 words·16 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
π’ Γcole Polytechnique FΓ©dΓ©rale De Lausanne
Deep learning models learn language structure through next-token prediction, but the data requirements remain unclear. This paper reveals that the effective context window, determining learning capaci…
Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics
·2572 words·13 mins·
loading
·
loading
Natural Language Processing
Large Language Models
π’ UC Berkeley
LLMs struggle with simple logical reasoning due to the ‘reversal curse.’ This paper reveals that weight asymmetry during training is the culprit, offering a new theoretical perspective and potential s…
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
·2046 words·10 mins·
loading
·
loading
Natural Language Processing
Large Language Models
π’ Tencent AI Lab
ALPHALLM boosts LLM performance in complex reasoning tasks by using imagination, search, and criticism to create a self-improving loop, eliminating the need for extra training data.
Toward Efficient Inference for Mixture of Experts
·2411 words·12 mins·
loading
·
loading
Natural Language Processing
Machine Translation
π’ Duke University
Unlocking the speed and efficiency of Mixture-of-Expert models, this research unveils novel optimization techniques, achieving dramatic improvements in inference throughput and resource usage.
Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis
·2366 words·12 mins·
loading
·
loading
Natural Language Processing
Text Generation
π’ Nankai University
ToMe: a novel training-free method dramatically improves semantic binding in text-to-image synthesis by intelligently merging related tokens, ensuring accurate alignment between generated images and t…
To Believe or Not to Believe Your LLM: IterativePrompting for Estimating Epistemic Uncertainty
·1940 words·10 mins·
loading
·
loading
Natural Language Processing
Large Language Models
π’ Google DeepMind
This paper introduces an innovative iterative prompting method for estimating epistemic uncertainty in LLMs, enabling reliable detection of hallucinations.
Thought of Search: Planning with Language Models Through The Lens of Efficiency
·282 words·2 mins·
loading
·
loading
Natural Language Processing
Large Language Models
π’ IBM Research
This paper introduces ‘Thought of Search,’ a novel, efficient planning approach using LLMs that prioritizes soundness and completeness. It leverages LLMs to generate Python code for search components,…