Skip to main content

Large Language Models

Unchosen Experts Can Contribute Too: Unleashing MoE Models’ Power by Self-Contrast
·2047 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏒 Tsinghua University
Self-Contrast Mixture-of-Experts (SCMoE) boosts MoE model reasoning by cleverly using ‘unchosen’ experts during inference. This training-free method contrasts outputs from strong and weak expert acti…
Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in LLMs
·3889 words·19 mins· loading · loading
Natural Language Processing Large Language Models 🏒 National University of Singapore
Uncertainty of Thoughts (UoT) algorithm significantly boosts LLMs’ information-seeking abilities, leading to substantial performance gains across diverse tasks.
Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging
·2935 words·14 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏒 Huazhong University of Science and Technology
Twin-Merging dynamically merges modular model expertise, significantly improving multitask performance without retraining, and adapting to diverse data.
TSDS: Data Selection for Task-Specific Model Finetuning
·2005 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏒 University of Wisconsin-Madison
TSDS: A novel framework selects optimal training data for efficient large language model finetuning using only a few examples, boosting performance.
Truth is Universal: Robust Detection of Lies in LLMs
·4200 words·20 mins· loading · loading
Natural Language Processing Large Language Models 🏒 Heidelberg University
LLM lie detectors fail to generalize; this paper presents a robust method achieving 94% accuracy by identifying a universal two-dimensional truth subspace, separating true/false statements across vari…
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
·1948 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏒 Yale University
TAP: automated jailbreaking of black-box LLMs with high success rates, using fewer queries than previous methods.
Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
·2720 words·13 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏒 Carnegie Mellon University
MOHAWK: Distilling Transformers’ quadratic knowledge into faster subquadratic SSMs, achieving state-of-the-art performance with <1% of training data!
Transformers Represent Belief State Geometry in their Residual Stream
·1739 words·9 mins· loading · loading
Natural Language Processing Large Language Models 🏒 Simplex
Transformers encode information beyond next-token prediction by linearly representing belief state geometry in their residual stream, even with complex fractal structures.
Transformers need glasses! Information over-squashing in language tasks
·3003 words·15 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏒 University of Oxford
Large language models (LLMs) suffer from information loss due to representational collapse and over-squashing, causing failures in simple tasks; this paper provides theoretical analysis and practical …
Transformers Can Do Arithmetic with the Right Embeddings
·3154 words·15 mins· loading · loading
Natural Language Processing Large Language Models 🏒 University of Maryland
Researchers enhanced transformer performance on arithmetic tasks by introducing Abacus Embeddings, which encode each digit’s position, enabling improved generalization and unlocking multi-step reasoni…
Training Dynamics of Transformers to Recognize Word Co-occurrence via Gradient Flow Analysis
·426 words·2 mins· loading · loading
Natural Language Processing Large Language Models 🏒 Princeton University
Researchers reveal how transformers learn word co-occurrence using a novel gradient flow analysis, uncovering a two-phase training process that leads to near-minimum loss and improved model performanc…
Training Compute-Optimal Protein Language Models
·3023 words·15 mins· loading · loading
Large Language Models 🏒 Tsinghua University
Compute-optimal protein language models are trained efficiently using scaling laws derived from a massive dataset, improving performance while optimizing compute budgets.
Train-Attention: Meta-Learning Where to Focus in Continual Knowledge Learning
·330 words·2 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏒 Yonsei University
Train-Attention (TAALM) tackles catastrophic forgetting in LLMs by dynamically weighting tokens during training, boosting learning efficiency and knowledge retention, outperforming existing methods on…
Toxicity Detection for Free
·2767 words·13 mins· loading · loading
Large Language Models 🏒 University of California, Berkeley
Moderation Using LLM Introspection (MULI) leverages the first response token’s logits from LLMs to create a highly accurate toxicity detector, surpassing state-of-the-art methods with minimal overhead…
Towards Understanding How Transformers Learn In-context Through a Representation Learning Lens
·2618 words·13 mins· loading · loading
Natural Language Processing Large Language Models 🏒 Renmin University of China
Transformers’ in-context learning (ICL) is explained using representation learning, revealing its ICL process as gradient descent on a dual model and offering modifiable attention layers for enhanced …
Towards Neuron Attributions in Multi-Modal Large Language Models
·1551 words·8 mins· loading · loading
Natural Language Processing Large Language Models 🏒 University of Science and Technology of China
NAM: a novel neuron attribution method for MLLMs, revealing modality-specific semantic knowledge and enabling multi-modal knowledge editing.
Towards a theory of how the structure of language is acquired by deep neural networks
·3238 words·16 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏒 Γ‰cole Polytechnique FΓ©dΓ©rale De Lausanne
Deep learning models learn language structure through next-token prediction, but the data requirements remain unclear. This paper reveals that the effective context window, determining learning capaci…
Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics
·2572 words·13 mins· loading · loading
Natural Language Processing Large Language Models 🏒 UC Berkeley
LLMs struggle with simple logical reasoning due to the ‘reversal curse.’ This paper reveals that weight asymmetry during training is the culprit, offering a new theoretical perspective and potential s…
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
·2046 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏒 Tencent AI Lab
ALPHALLM boosts LLM performance in complex reasoning tasks by using imagination, search, and criticism to create a self-improving loop, eliminating the need for extra training data.
TOPA: Extending Large Language Models for Video Understanding via Text-Only Pre-Alignment
·4872 words·23 mins· loading · loading
Large Language Models 🏒 Zhejiang University
TOPA: Extending LLMs for video understanding using only text data.