Large Language Models

Unchosen Experts Can Contribute Too: Unleashing MoE Models’ Power by Self-Contrast

26 September 2024·2047 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Tsinghua University

Self-Contrast Mixture-of-Experts (SCMoE) boosts MoE model reasoning by cleverly using ‘unchosen’ experts during inference. This training-free method contrasts outputs from strong and weak expert acti…

Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in LLMs

26 September 2024·3889 words·19 mins· loading · loading

Natural Language Processing Large Language Models 🏢 National University of Singapore

Uncertainty of Thoughts (UoT) algorithm significantly boosts LLMs’ information-seeking abilities, leading to substantial performance gains across diverse tasks.

Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging

26 September 2024·2935 words·14 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Huazhong University of Science and Technology

Twin-Merging dynamically merges modular model expertise, significantly improving multitask performance without retraining, and adapting to diverse data.

TSDS: Data Selection for Task-Specific Model Finetuning

26 September 2024·2005 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Wisconsin-Madison

TSDS: A novel framework selects optimal training data for efficient large language model finetuning using only a few examples, boosting performance.

Truth is Universal: Robust Detection of Lies in LLMs

26 September 2024·4200 words·20 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Heidelberg University

LLM lie detectors fail to generalize; this paper presents a robust method achieving 94% accuracy by identifying a universal two-dimensional truth subspace, separating true/false statements across vari…

Tree of Attacks: Jailbreaking Black-Box LLMs Automatically

26 September 2024·1948 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Yale University

TAP: automated jailbreaking of black-box LLMs with high success rates, using fewer queries than previous methods.

Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models

26 September 2024·2720 words·13 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Carnegie Mellon University

MOHAWK: Distilling Transformers’ quadratic knowledge into faster subquadratic SSMs, achieving state-of-the-art performance with <1% of training data!

Transformers Represent Belief State Geometry in their Residual Stream

26 September 2024·1739 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Simplex

Transformers encode information beyond next-token prediction by linearly representing belief state geometry in their residual stream, even with complex fractal structures.

Transformers need glasses! Information over-squashing in language tasks

26 September 2024·3003 words·15 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 University of Oxford

Large language models (LLMs) suffer from information loss due to representational collapse and over-squashing, causing failures in simple tasks; this paper provides theoretical analysis and practical …

Transformers Can Do Arithmetic with the Right Embeddings

26 September 2024·3154 words·15 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Maryland

Researchers enhanced transformer performance on arithmetic tasks by introducing Abacus Embeddings, which encode each digit’s position, enabling improved generalization and unlocking multi-step reasoni…

Training Dynamics of Transformers to Recognize Word Co-occurrence via Gradient Flow Analysis

26 September 2024·426 words·2 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Princeton University

Researchers reveal how transformers learn word co-occurrence using a novel gradient flow analysis, uncovering a two-phase training process that leads to near-minimum loss and improved model performanc…

Training Compute-Optimal Protein Language Models

26 September 2024·3023 words·15 mins· loading · loading

Large Language Models 🏢 Tsinghua University

Compute-optimal protein language models are trained efficiently using scaling laws derived from a massive dataset, improving performance while optimizing compute budgets.

Train-Attention: Meta-Learning Where to Focus in Continual Knowledge Learning

26 September 2024·330 words·2 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Yonsei University

Train-Attention (TAALM) tackles catastrophic forgetting in LLMs by dynamically weighting tokens during training, boosting learning efficiency and knowledge retention, outperforming existing methods on…

Toxicity Detection for Free

26 September 2024·2767 words·13 mins· loading · loading

Large Language Models 🏢 University of California, Berkeley

Moderation Using LLM Introspection (MULI) leverages the first response token’s logits from LLMs to create a highly accurate toxicity detector, surpassing state-of-the-art methods with minimal overhead…

Towards Understanding How Transformers Learn In-context Through a Representation Learning Lens

26 September 2024·2618 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Renmin University of China

Transformers’ in-context learning (ICL) is explained using representation learning, revealing its ICL process as gradient descent on a dual model and offering modifiable attention layers for enhanced …

Towards Neuron Attributions in Multi-Modal Large Language Models

26 September 2024·1551 words·8 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Science and Technology of China

NAM: a novel neuron attribution method for MLLMs, revealing modality-specific semantic knowledge and enabling multi-modal knowledge editing.

Towards a theory of how the structure of language is acquired by deep neural networks

26 September 2024·3238 words·16 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 École Polytechnique Fédérale De Lausanne

Deep learning models learn language structure through next-token prediction, but the data requirements remain unclear. This paper reveals that the effective context window, determining learning capaci…

Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics

26 September 2024·2572 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 UC Berkeley

LLMs struggle with simple logical reasoning due to the ‘reversal curse.’ This paper reveals that weight asymmetry during training is the culprit, offering a new theoretical perspective and potential s…

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

26 September 2024·2046 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Tencent AI Lab

ALPHALLM boosts LLM performance in complex reasoning tasks by using imagination, search, and criticism to create a self-improving loop, eliminating the need for extra training data.

TOPA: Extending Large Language Models for Video Understanding via Text-Only Pre-Alignment

26 September 2024·4872 words·23 mins· loading · loading

Large Language Models 🏢 Zhejiang University

TOPA: Extending LLMs for video understanding using only text data.