Natural Language Processing

A Theoretical Understanding of Self-Correction through In-context Alignment

26 September 2024·1997 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 MIT CSAIL

LLMs improve through self-correction, but the mechanisms are unclear. This paper provides a theoretical framework and empirical evidence demonstrating that self-correction arises from in-context align…

A Theoretical Perspective for Speculative Decoding Algorithm

26 September 2024·1873 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Princeton University

This paper theoretically analyzes speculative decoding, revealing its optimality and providing formulas for expected rejections, paving the way for more efficient large language model inference.

A teacher-teacher framework for clinical language representation learning

26 September 2024·1643 words·8 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Harvard University

A lightweight knowledge alignment module enables two pre-trained LLMs to mutually learn and improve clinical language representation, exceeding individual model performance on various downstream tasks…

A Prompt-Based Knowledge Graph Foundation Model for Universal In-Context Reasoning

26 September 2024·2451 words·12 mins· loading · loading

Natural Language Processing Question Answering 🏢 State Key Laboratory for Novel Software Technology, Nanjing University

KG-ICL, a novel prompt-based knowledge graph foundation model, achieves universal in-context reasoning by leveraging in-context learning and a unified tokenizer, outperforming various baselines on 43 …

A Polar coordinate system represents syntax in large language models

26 September 2024·1633 words·8 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Meta AI

LLMs spontaneously encode syntax using a polar coordinate system, representing syntactic relations via relative direction and distance of word embeddings.

A Gradient Accumulation Method for Dense Retriever under Memory Constraint

26 September 2024·1813 words·9 mins· loading · loading

Natural Language Processing Question Answering 🏢 Seoul National University

CONTACCUM: Stable, efficient memory reduction for dense retrievers using dual memory banks, surpassing high-resource baselines.

A Full-duplex Speech Dialogue Scheme Based On Large Language Model

26 September 2024·2100 words·10 mins· loading · loading

Natural Language Processing Dialogue Systems 🏢 MThreads AI

This paper introduces a novel full-duplex speech dialogue system based on LLMs, achieving significantly reduced response latency and higher interruption precision compared to half-duplex systems.

A distributional simplicity bias in the learning dynamics of transformers

26 September 2024·2474 words·12 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 International School for Advanced Studies

Transformers learn increasingly complex language patterns sequentially, starting with simpler interactions before mastering higher-order ones.

A Critical Evaluation of AI Feedback for Aligning Large Language Models

26 September 2024·2724 words·13 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Stanford University

Contrary to popular belief, simple supervised fine-tuning with strong language models outperforms complex reinforcement learning in aligning large language models, significantly improving efficiency.

3-in-1: 2D Rotary Adaptation for Efficient Finetuning, Efficient Batching and Composability

26 September 2024·2315 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Language Technology Lab, University of Amsterdam

RoAd: a novel parameter-efficient finetuning method uses 2D rotation to adapt LLMs, enabling efficient batching, composability, and improved interpretability.

$eta$-DPO: Direct Preference Optimization with Dynamic $eta$

26 September 2024·2106 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Alibaba Group

β-DPO dynamically adjusts a key parameter in Direct Preference Optimization, significantly improving LLM alignment with human preferences.

$ extit{Trans-LoRA}$: towards data-free Transferable Parameter Efficient Finetuning

26 September 2024·3529 words·17 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 MIT-IBM Watson AI Lab

Trans-LoRA enables near data-free transfer of fine-tuned LLMs across models!

$ extit{Read-ME}$: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design

26 September 2024·2049 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Texas at Austin

Read-ME refactors pre-trained dense LLMs into efficient, router-decoupled Mixture-of-Experts (MoEs) via activation sparsity, achieving up to 10.1% improvement on MMLU and 6.1% reduction in latency.