Large Language Models

Large Language Model Unlearning

26 September 2024·6002 words·29 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Meta GenAI

This paper presents a novel method for large language model (LLM) unlearning, enabling LLMs to ‘forget’ undesirable behaviors by using only negative examples. This computationally efficient approach o…

Language Models as Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Models

26 September 2024·2064 words·10 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 CISPA Helmholtz Center for Information Security

Large language models (LLMs) achieve lossless gradient compression, surpassing existing methods by up to 17.2%, thereby advancing distributed learning efficiency.

Language Models as Hierarchy Encoders

26 September 2024·2232 words·11 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 University of Oxford

Language models struggle with hierarchical information. This work introduces Hierarchy Transformer Encoders (HITs), a novel method to retrain transformer encoders using hyperbolic geometry and special…

LACIE: Listener-Aware Finetuning for Calibration in Large Language Models

26 September 2024·2396 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 UNC Chapel Hill

LACIE: Listener-aware finetuning improves LLM confidence calibration, reducing incorrect answers accepted by human listeners by 47% while maintaining correct answer acceptance.

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

26 September 2024·5270 words·25 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 UC Berkeley

KVQuant achieves <0.1 perplexity degradation with 3-bit quantization in LLMs by using per-channel key quantization, pre-RoPE quantization, and non-uniform quantization, enabling 10M context length inf…

KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization

26 September 2024·3037 words·15 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Dept. of Computer Science, Rice University

Boost LLM inference speed 1.4-3.5x by using Coupled Quantization (CQ) to compress KV cache down to 1 bit per channel, while preserving model accuracy.

Kraken: Inherently Parallel Transformers For Efficient Multi-Device Inference

26 September 2024·2061 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Princeton University

Kraken: A new Transformer architecture boosts multi-device inference speed by 35.6% by cleverly overlapping communication with computation.

KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension

26 September 2024·1673 words·8 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Hong Kong

KptLLM: A novel multimodal model leverages LLMs for superior keypoint comprehension, outperforming existing methods in various benchmarks.

Knowledge Circuits in Pretrained Transformers

26 September 2024·3083 words·15 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Zhejiang University

Researchers unveil ‘knowledge circuits’ within LLMs, revealing how knowledge is collaboratively encoded and utilized, leading to improved LLM design and interpretations of model behavior.

KG-FIT: Knowledge Graph Fine-Tuning Upon Open-World Knowledge

26 September 2024·3104 words·15 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Illinois at Urbana-Champaign

KG-FIT boosts knowledge graph embedding by smartly integrating open-world knowledge from LLMs, achieving significant performance gains.

Kangaroo: Lossless Self-Speculative Decoding for Accelerating LLMs via Double Early Exiting

26 September 2024·2148 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Huawei Noah's Ark Lab

Kangaroo: Double early exiting boosts LLM speed!

JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models

26 September 2024·1722 words·9 mins· loading · loading

Natural Language Processing Large Language Models 🏢 School of Information, Renmin University of China

JiuZhang3.0 efficiently enhances LLMs’ mathematical reasoning by training a small model to synthesize high-quality training data, drastically reducing costs.

Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters

26 September 2024·2559 words·13 mins· loading · loading

Natural Language Processing Large Language Models 🏢 School of Information Sciences, University of Illinois at Urbana-Champaign

New benchmark and jailbreak method exposes vulnerabilities of LLM moderation, achieving significantly higher success rates than existing methods.

Iterative Reasoning Preference Optimization

26 September 2024·1561 words·8 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Meta FAIR

Iterative Reasoning Preference Optimization boosts large language model reasoning by iteratively refining preferences between generated reasoning steps, achieving significant accuracy gains on benchma…

Iteration Head: A Mechanistic Study of Chain-of-Thought

26 September 2024·2483 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Meta AI

Researchers reveal how Chain-of-Thought reasoning emerges in transformers via specialized ‘iteration heads’, improving LLM performance and offering insights into mechanistic interpretability.

Is Programming by Example solved by LLMs?

26 September 2024·2523 words·12 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Cornell University

Large Language Models (LLMs) surprisingly improve the challenging task of Programming by Example (PBE) when fine-tuned on problem-specific data, outperforming classic symbolic methods and even surpass…

IRCAN: Mitigating Knowledge Conflicts in LLM Generation via Identifying and Reweighting Context-Aware Neurons

26 September 2024·2251 words·11 mins· loading · loading

Natural Language Processing Large Language Models 🏢 College of Intelligence and Computing, Tianjin University

IRCAN tackles LLM knowledge conflicts by identifying and reweighting context-aware neurons, significantly improving context-sensitive outputs.

InversionView: A General-Purpose Method for Reading Information from Neural Activations

26 September 2024·10684 words·51 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Saarland University

InversionView unveils neural network inner workings by decoding information from activations. It identifies inputs producing similar activations, revealing the information content. Case studies on v…

Invariant Tokenization of Crystalline Materials for Language Model Enabled Generation

26 September 2024·2045 words·10 mins· loading · loading

Natural Language Processing Large Language Models 🏢 Texas A&M University

Mat2Seq revolutionizes crystal structure generation using language models by creating unique, invariant 1D sequences from 3D crystal structures, enabling accurate and efficient crystal discovery with …

Interpreting Learned Feedback Patterns in Large Language Models

26 September 2024·2900 words·14 mins· loading · loading

Natural Language Processing Large Language Models 🏢 University of Oxford

Researchers developed methods to measure and interpret the divergence between learned feedback patterns (LFPs) in LLMs and human preferences, helping minimize discrepancies between LLM behavior and tr…