Large Language Models

Aligning Large Language Models with Representation Editing: A Control Perspective

26 September 2024·2249 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Cornell University

RE-Control: Aligning LLMs via dynamic representation editing using optimal control theory, achieving superior alignment with significantly fewer resources than fine-tuning.

Aligner: Efficient Alignment by Learning to Correct

26 September 2024·3091 words·15 mins· loading · loading

Large Language Models 🏢 Peking University

Aligner efficiently aligns LLMs by learning to correct initial responses, achieving significant improvements in helpfulness and harmlessness across various models with resource efficiency.

ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation

26 September 2024·2033 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 National University of Singapore

ALI-Agent uses LLM-powered agents for in-depth, adaptive assessment of LLMs’ alignment with human values, overcoming limitations of existing static benchmarks.

Algorithmic progress in language models

26 September 2024·4934 words·24 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 MIT FutureTech

Language model algorithms have improved drastically, halving compute needs every 8 months since 2012, surpassing Moore’s Law; however, compute scaling, not algorithms, drove most recent performance ga…

Algorithmic Capabilities of Random Transformers

26 September 2024·3079 words·15 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 MIT

Randomly initialized transformers, with only embedding layers optimized, surprisingly excel at various algorithmic tasks, revealing inherent capabilities even before training.

AlchemistCoder: Harmonizing and Eliciting Code Capability by Hindsight Tuning on Multi-source Data

26 September 2024·2613 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Tongji University

AlchemistCoder enhances code LLMs by pioneering hindsight tuning on multi-source data, harmonizing conflicting styles via AlchemistPrompts, and achieving state-of-the-art performance.

AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases

26 September 2024·2659 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Chicago

AGENTPOISON: A novel backdoor attack compromises LLM agents by poisoning their memory or knowledge bases, achieving high success rates with minimal performance impact.

Agent Planning with World Knowledge Model

26 September 2024·2981 words·14 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Zhejiang University

This paper introduces a parametric World Knowledge Model (WKM) to improve AI agent planning by integrating both global task knowledge and dynamic state knowledge, thereby overcoming current LLMs’ limi…

Adversarial Representation Engineering: A General Model Editing Framework for Large Language Models

26 September 2024·1740 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Peking University

Adversarial Representation Engineering (ARE) offers a unified, interpretable approach for editing large language models (LLMs) by using a representation sensor as an editing oracle, enhancing model sa…

Adversarial Moment-Matching Distillation of Large Language Models

26 September 2024·2972 words·14 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 SI-TECH Information Technology

Boosting LLM efficiency, this study introduces adversarial moment-matching distillation, outperforming existing methods by matching action-value moments for superior knowledge transfer and achieving s…

Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees

26 September 2024·1987 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 National Key Laboratory for Novel Software Technology, Nanjing University

TP-LLaMA boosts tool-augmented LLMs by optimizing inference trajectories using preference learning from both successful and failed attempts, achieving superior performance and efficiency.

Adaptive Layer Sparsity for Large Language Models via Activation Correlation Assessment

26 September 2024·2979 words·14 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Birmingham

Adaptive Layer Sparsity (ALS) revolutionizes large language model (LLM) compression by intelligently pruning less important layers, achieving significant size reduction without performance loss. It o…

Adaptable Logical Control for Large Language Models

26 September 2024·2047 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 UCLA

Ctrl-G: A neuro-symbolic framework enables adaptable control of LLM generation by combining any LLM with a Hidden Markov Model (HMM), ensuring outputs adhere to logical constraints specified as determ…

Ad Auctions for LLMs via Retrieval Augmented Generation

26 September 2024·2337 words·11 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 University of Maryland

This paper introduces segment auctions, maximizing logarithmic social welfare, for integrating ads into LLM outputs via Retrieval Augmented Generation, balancing ad revenue and output quality.

Accuracy is Not All You Need

26 September 2024·5583 words·27 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Microsoft Research

LLM compression accuracy hides crucial behavioral changes; use % flips and KL-divergence for better evaluation.

Accelerating Greedy Coordinate Gradient and General Prompt Optimization via Probe Sampling

26 September 2024·2272 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 National University of Singapore

Probe sampling accelerates Greedy Coordinate Gradient (GCG) and other prompt optimization methods by up to 5.6x, achieving equal or better attack success rates, making LLM safety research faster and m…

Accelerating Blockwise Parallel Language Models with Draft Refinement

26 September 2024·2883 words·14 mins· loading · loading

Natural Language Processing Large Language Models 🏢 KAIST AI

Boost LLM inference speed by 3x! This paper refines blockwise parallel decoding (BPD) by cleverly refining draft predictions, resulting in faster text generation for large language models.

Abrupt Learning in Transformers: A Case Study on Matrix Completion

26 September 2024·5285 words·25 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 University of Michigan

Transformers exhibit abrupt learning: training loss plateaus, then suddenly drops. This study uses matrix completion to demonstrate this phenomenon, providing insights into the model’s algorithmic sh…

A Theoretical Understanding of Self-Correction through In-context Alignment

26 September 2024·1997 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 MIT CSAIL

LLMs improve through self-correction, but the mechanisms are unclear. This paper provides a theoretical framework and empirical evidence demonstrating that self-correction arises from in-context align…

A Theoretical Perspective for Speculative Decoding Algorithm

26 September 2024·1873 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Princeton University

This paper theoretically analyzes speculative decoding, revealing its optimality and providing formulas for expected rejections, paving the way for more efficient large language model inference.