Large Language Models

DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging

26 September 2024·3222 words·16 mins· loading · loading

Natural Language Processing Large Language Models 🏢 EPFL

DenseFormer enhances transformers by adding a depth-weighted averaging step, improving data efficiency and outperforming baselines in memory and inference time without increasing model size.

Delving into the Reversal Curse: How Far Can Large Language Models Generalize?

26 September 2024·3631 words·18 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Zhejiang University

Large language models struggle to generalize knowledge when facing seemingly simple reversals, a phenomenon termed the ‘reversal curse.’ This study reveals that this limitation is strongly linked to t…

Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models

26 September 2024·2535 words·12 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Peking University

Delta-CoMe: Training-free mixed-precision delta compression boosts LLM deployment efficiency.

Deep Bayesian Active Learning for Preference Modeling in Large Language Models

26 September 2024·2339 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Oxford

BAL-PM, a novel active learning approach, drastically reduces human feedback in LLM preference modeling by leveraging both model uncertainty and prompt distribution diversity, achieving 33%-68% fewer …

Decoding-Time Language Model Alignment with Multiple Objectives

26 September 2024·3392 words·16 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Tsinghua University

Multi-objective decoding (MOD) efficiently aligns language models to diverse user needs by decoding the next token from a weighted combination of predictions from multiple base models trained on indiv…

Decision-Making Behavior Evaluation Framework for LLMs under Uncertain Context

26 September 2024·2519 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Illinois at Urbana-Champaign

New framework reveals LLMs’ human-like decision-making tendencies but highlights significant variations and biases influenced by demographic factors, underscoring ethical deployment needs.

DDK: Distilling Domain Knowledge for Efficient Large Language Models

26 September 2024·2140 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Taobao & Tmall Group of Alibaba

DDK: Dynamically Distilling Domain Knowledge for efficient LLMs.

Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum

26 September 2024·3234 words·16 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Apple

This paper introduces dataset decomposition (DD), a novel approach to accelerate LLM training while enhancing performance. DD significantly reduces training time by decomposing datasets into buckets …

Data-Efficient Learning with Neural Programs

26 September 2024·2234 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Pennsylvania

ISED: a novel, data-efficient algorithm learns neural programs by sampling from neural predictions to estimate gradients of black-box components, outperforming baselines on various benchmarks.

Data Mixture Inference Attack: BPE Tokenizers Reveal Training Data Compositions

26 September 2024·3904 words·19 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 University of Washington

Researchers uncover hidden training data secrets of large language models by analyzing their byte-pair encoding tokenizers, revealing the proportions of different languages and domains.

DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving

26 September 2024·1939 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Tsinghua University

DART-Math tackles LLM limitations in mathematical problem-solving by introducing Difficulty-Aware Rejection Tuning, a novel method that generates high-quality, bias-reduced datasets, resulting in supe…

DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph

26 September 2024·3306 words·16 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Dartmouth College

DARG dynamically evaluates LLMs via adaptive reasoning graphs, revealing performance drops with increased complexity and exposing model biases.

DAPE: Data-Adaptive Positional Encoding for Length Extrapolation

26 September 2024·3365 words·16 mins· loading · loading

Natural Language Processing Large Language Models 🏢 CUHK

DAPE: A novel data-adaptive positional encoding method dynamically adjusts positional information based on input context, improving transformer performance and length generalization.

DAGER: Exact Gradient Inversion for Large Language Models

26 September 2024·2286 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 INSAIT

DAGER: Exact gradient inversion for LLMs; recovers full input text batches precisely.

D-LLM: A Token Adaptive Computing Resource Allocation Strategy for Large Language Models

26 September 2024·2704 words·13 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Huawei Technologies Co., Ltd.

D-LLM dynamically allocates computing resources during LLM token processing, reducing computational costs and memory usage by up to 50% without sacrificing accuracy.

D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models

26 September 2024·3930 words·19 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Taobao & Tmall Group of Alibaba

New D-CPT Law optimizes continual pre-training for LLMs by predicting optimal data mixture ratios, drastically cutting training costs.

Customizing Language Models with Instance-wise LoRA for Sequential Recommendation

26 September 2024·1854 words·9 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 University of Science and Technology of China

Instance-wise LoRA (iLoRA) boosts LLM sequential recommendation accuracy by customizing model parameters for each user, mitigating negative transfer and improving performance.

CulturePark: Boosting Cross-cultural Understanding in Large Language Models

26 September 2024·2738 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Microsoft Research

CulturePark, a novel multi-agent communication framework, generates high-quality cross-cultural data to fine-tune LLMs, significantly reducing cultural bias and boosting cross-cultural understanding.

CultureLLM: Incorporating Cultural Differences into Large Language Models

26 September 2024·2507 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Microsoft Research

CultureLLM, a new approach, effectively incorporates cultural nuances into LLMs using semantic data augmentation, significantly outperforming existing models.

Cross-model Control: Improving Multiple Large Language Models in One-time Training

26 September 2024·1811 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 East China Normal University

One-time training improves multiple LLMs using a tiny portable model, drastically reducing costs and resource needs for model enhancement.