Skip to main content

Large Language Models

Aligning Large Language Models with Representation Editing: A Control Perspective
·2249 words·11 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Cornell University
RE-Control: Aligning LLMs via dynamic representation editing using optimal control theory, achieving superior alignment with significantly fewer resources than fine-tuning.
Aligner: Efficient Alignment by Learning to Correct
·3091 words·15 mins· loading · loading
Large Language Models 🏢 Peking University
Aligner efficiently aligns LLMs by learning to correct initial responses, achieving significant improvements in helpfulness and harmlessness across various models with resource efficiency.
ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation
·2033 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏢 National University of Singapore
ALI-Agent uses LLM-powered agents for in-depth, adaptive assessment of LLMs’ alignment with human values, overcoming limitations of existing static benchmarks.
Algorithmic progress in language models
·4934 words·24 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 MIT FutureTech
Language model algorithms have improved drastically, halving compute needs every 8 months since 2012, surpassing Moore’s Law; however, compute scaling, not algorithms, drove most recent performance ga…
Algorithmic Capabilities of Random Transformers
·3079 words·15 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 MIT
Randomly initialized transformers, with only embedding layers optimized, surprisingly excel at various algorithmic tasks, revealing inherent capabilities even before training.
AlchemistCoder: Harmonizing and Eliciting Code Capability by Hindsight Tuning on Multi-source Data
·2613 words·13 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Tongji University
AlchemistCoder enhances code LLMs by pioneering hindsight tuning on multi-source data, harmonizing conflicting styles via AlchemistPrompts, and achieving state-of-the-art performance.
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases
·2659 words·13 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Chicago
AGENTPOISON: A novel backdoor attack compromises LLM agents by poisoning their memory or knowledge bases, achieving high success rates with minimal performance impact.
Agent Planning with World Knowledge Model
·2981 words·14 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Zhejiang University
This paper introduces a parametric World Knowledge Model (WKM) to improve AI agent planning by integrating both global task knowledge and dynamic state knowledge, thereby overcoming current LLMs’ limi…
Adversarial Representation Engineering: A General Model Editing Framework for Large Language Models
·1740 words·9 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Peking University
Adversarial Representation Engineering (ARE) offers a unified, interpretable approach for editing large language models (LLMs) by using a representation sensor as an editing oracle, enhancing model sa…
Adversarial Moment-Matching Distillation of Large Language Models
·2972 words·14 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 SI-TECH Information Technology
Boosting LLM efficiency, this study introduces adversarial moment-matching distillation, outperforming existing methods by matching action-value moments for superior knowledge transfer and achieving s…
Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees
·1987 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏢 National Key Laboratory for Novel Software Technology, Nanjing University
TP-LLaMA boosts tool-augmented LLMs by optimizing inference trajectories using preference learning from both successful and failed attempts, achieving superior performance and efficiency.
Adaptive Layer Sparsity for Large Language Models via Activation Correlation Assessment
·2979 words·14 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Birmingham
Adaptive Layer Sparsity (ALS) revolutionizes large language model (LLM) compression by intelligently pruning less important layers, achieving significant size reduction without performance loss. It o…
Adaptable Logical Control for Large Language Models
·2047 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏢 UCLA
Ctrl-G: A neuro-symbolic framework enables adaptable control of LLM generation by combining any LLM with a Hidden Markov Model (HMM), ensuring outputs adhere to logical constraints specified as determ…
Ad Auctions for LLMs via Retrieval Augmented Generation
·2337 words·11 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 University of Maryland
This paper introduces segment auctions, maximizing logarithmic social welfare, for integrating ads into LLM outputs via Retrieval Augmented Generation, balancing ad revenue and output quality.
Accuracy is Not All You Need
·5583 words·27 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Microsoft Research
LLM compression accuracy hides crucial behavioral changes; use % flips and KL-divergence for better evaluation.
Accelerating Greedy Coordinate Gradient and General Prompt Optimization via Probe Sampling
·2272 words·11 mins· loading · loading
Natural Language Processing Large Language Models 🏢 National University of Singapore
Probe sampling accelerates Greedy Coordinate Gradient (GCG) and other prompt optimization methods by up to 5.6x, achieving equal or better attack success rates, making LLM safety research faster and m…
Accelerating Blockwise Parallel Language Models with Draft Refinement
·2883 words·14 mins· loading · loading
Natural Language Processing Large Language Models 🏢 KAIST AI
Boost LLM inference speed by 3x! This paper refines blockwise parallel decoding (BPD) by cleverly refining draft predictions, resulting in faster text generation for large language models.
Abrupt Learning in Transformers: A Case Study on Matrix Completion
·5285 words·25 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 University of Michigan
Transformers exhibit abrupt learning: training loss plateaus, then suddenly drops. This study uses matrix completion to demonstrate this phenomenon, providing insights into the model’s algorithmic sh…
A Theoretical Understanding of Self-Correction through In-context Alignment
·1997 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏢 MIT CSAIL
LLMs improve through self-correction, but the mechanisms are unclear. This paper provides a theoretical framework and empirical evidence demonstrating that self-correction arises from in-context align…
A Theoretical Perspective for Speculative Decoding Algorithm
·1873 words·9 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Princeton University
This paper theoretically analyzes speculative decoding, revealing its optimality and providing formulas for expected rejections, paving the way for more efficient large language model inference.