Skip to main content

Large Language Models

DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
·3222 words·16 mins· loading · loading
Natural Language Processing Large Language Models 🏢 EPFL
DenseFormer enhances transformers by adding a depth-weighted averaging step, improving data efficiency and outperforming baselines in memory and inference time without increasing model size.
Delving into the Reversal Curse: How Far Can Large Language Models Generalize?
·3631 words·18 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Zhejiang University
Large language models struggle to generalize knowledge when facing seemingly simple reversals, a phenomenon termed the ‘reversal curse.’ This study reveals that this limitation is strongly linked to t…
Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models
·2535 words·12 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Peking University
Delta-CoMe: Training-free mixed-precision delta compression boosts LLM deployment efficiency.
Deep Bayesian Active Learning for Preference Modeling in Large Language Models
·2339 words·11 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Oxford
BAL-PM, a novel active learning approach, drastically reduces human feedback in LLM preference modeling by leveraging both model uncertainty and prompt distribution diversity, achieving 33%-68% fewer …
Decoding-Time Language Model Alignment with Multiple Objectives
·3392 words·16 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Tsinghua University
Multi-objective decoding (MOD) efficiently aligns language models to diverse user needs by decoding the next token from a weighted combination of predictions from multiple base models trained on indiv…
Decision-Making Behavior Evaluation Framework for LLMs under Uncertain Context
·2519 words·12 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Illinois at Urbana-Champaign
New framework reveals LLMs’ human-like decision-making tendencies but highlights significant variations and biases influenced by demographic factors, underscoring ethical deployment needs.
DDK: Distilling Domain Knowledge for Efficient Large Language Models
·2140 words·11 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Taobao & Tmall Group of Alibaba
DDK: Dynamically Distilling Domain Knowledge for efficient LLMs.
Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum
·3234 words·16 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Apple
This paper introduces dataset decomposition (DD), a novel approach to accelerate LLM training while enhancing performance. DD significantly reduces training time by decomposing datasets into buckets …
Data-Efficient Learning with Neural Programs
·2234 words·11 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Pennsylvania
ISED: a novel, data-efficient algorithm learns neural programs by sampling from neural predictions to estimate gradients of black-box components, outperforming baselines on various benchmarks.
Data Mixture Inference Attack: BPE Tokenizers Reveal Training Data Compositions
·3904 words·19 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 University of Washington
Researchers uncover hidden training data secrets of large language models by analyzing their byte-pair encoding tokenizers, revealing the proportions of different languages and domains.
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving
·1939 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Tsinghua University
DART-Math tackles LLM limitations in mathematical problem-solving by introducing Difficulty-Aware Rejection Tuning, a novel method that generates high-quality, bias-reduced datasets, resulting in supe…
DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph
·3306 words·16 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Dartmouth College
DARG dynamically evaluates LLMs via adaptive reasoning graphs, revealing performance drops with increased complexity and exposing model biases.
DAPE: Data-Adaptive Positional Encoding for Length Extrapolation
·3365 words·16 mins· loading · loading
Natural Language Processing Large Language Models 🏢 CUHK
DAPE: A novel data-adaptive positional encoding method dynamically adjusts positional information based on input context, improving transformer performance and length generalization.
DAGER: Exact Gradient Inversion for Large Language Models
·2286 words·11 mins· loading · loading
Natural Language Processing Large Language Models 🏢 INSAIT
DAGER: Exact gradient inversion for LLMs; recovers full input text batches precisely.
D-LLM: A Token Adaptive Computing Resource Allocation Strategy for Large Language Models
·2704 words·13 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Huawei Technologies Co., Ltd.
D-LLM dynamically allocates computing resources during LLM token processing, reducing computational costs and memory usage by up to 50% without sacrificing accuracy.
D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models
·3930 words·19 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Taobao & Tmall Group of Alibaba
New D-CPT Law optimizes continual pre-training for LLMs by predicting optimal data mixture ratios, drastically cutting training costs.
Customizing Language Models with Instance-wise LoRA for Sequential Recommendation
·1854 words·9 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 University of Science and Technology of China
Instance-wise LoRA (iLoRA) boosts LLM sequential recommendation accuracy by customizing model parameters for each user, mitigating negative transfer and improving performance.
CulturePark: Boosting Cross-cultural Understanding in Large Language Models
·2738 words·13 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Microsoft Research
CulturePark, a novel multi-agent communication framework, generates high-quality cross-cultural data to fine-tune LLMs, significantly reducing cultural bias and boosting cross-cultural understanding.
CultureLLM: Incorporating Cultural Differences into Large Language Models
·2507 words·12 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Microsoft Research
CultureLLM, a new approach, effectively incorporates cultural nuances into LLMs using semantic data augmentation, significantly outperforming existing models.
Cross-model Control: Improving Multiple Large Language Models in One-time Training
·1811 words·9 mins· loading · loading
Natural Language Processing Large Language Models 🏢 East China Normal University
One-time training improves multiple LLMs using a tiny portable model, drastically reducing costs and resource needs for model enhancement.