Skip to main content

Large Language Models

Large Language Model Unlearning
·6002 words·29 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Meta GenAI
This paper presents a novel method for large language model (LLM) unlearning, enabling LLMs to ‘forget’ undesirable behaviors by using only negative examples. This computationally efficient approach o…
Language Models as Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Models
·2064 words·10 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 CISPA Helmholtz Center for Information Security
Large language models (LLMs) achieve lossless gradient compression, surpassing existing methods by up to 17.2%, thereby advancing distributed learning efficiency.
Language Models as Hierarchy Encoders
·2232 words·11 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 University of Oxford
Language models struggle with hierarchical information. This work introduces Hierarchy Transformer Encoders (HITs), a novel method to retrain transformer encoders using hyperbolic geometry and special…
LACIE: Listener-Aware Finetuning for Calibration in Large Language Models
·2396 words·12 mins· loading · loading
Natural Language Processing Large Language Models 🏢 UNC Chapel Hill
LACIE: Listener-aware finetuning improves LLM confidence calibration, reducing incorrect answers accepted by human listeners by 47% while maintaining correct answer acceptance.
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
·5270 words·25 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 UC Berkeley
KVQuant achieves <0.1 perplexity degradation with 3-bit quantization in LLMs by using per-channel key quantization, pre-RoPE quantization, and non-uniform quantization, enabling 10M context length inf…
KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization
·3037 words·15 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Dept. of Computer Science, Rice University
Boost LLM inference speed 1.4-3.5x by using Coupled Quantization (CQ) to compress KV cache down to 1 bit per channel, while preserving model accuracy.
Kraken: Inherently Parallel Transformers For Efficient Multi-Device Inference
·2061 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Princeton University
Kraken: A new Transformer architecture boosts multi-device inference speed by 35.6% by cleverly overlapping communication with computation.
KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension
·1673 words·8 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Hong Kong
KptLLM: A novel multimodal model leverages LLMs for superior keypoint comprehension, outperforming existing methods in various benchmarks.
Knowledge Circuits in Pretrained Transformers
·3083 words·15 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Zhejiang University
Researchers unveil ‘knowledge circuits’ within LLMs, revealing how knowledge is collaboratively encoded and utilized, leading to improved LLM design and interpretations of model behavior.
KG-FIT: Knowledge Graph Fine-Tuning Upon Open-World Knowledge
·3104 words·15 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Illinois at Urbana-Champaign
KG-FIT boosts knowledge graph embedding by smartly integrating open-world knowledge from LLMs, achieving significant performance gains.
Kangaroo: Lossless Self-Speculative Decoding for Accelerating LLMs via Double Early Exiting
·2148 words·11 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Huawei Noah's Ark Lab
Kangaroo: Double early exiting boosts LLM speed!
JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models
·1722 words·9 mins· loading · loading
Natural Language Processing Large Language Models 🏢 School of Information, Renmin University of China
JiuZhang3.0 efficiently enhances LLMs’ mathematical reasoning by training a small model to synthesize high-quality training data, drastically reducing costs.
Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters
·2559 words·13 mins· loading · loading
Natural Language Processing Large Language Models 🏢 School of Information Sciences, University of Illinois at Urbana-Champaign
New benchmark and jailbreak method exposes vulnerabilities of LLM moderation, achieving significantly higher success rates than existing methods.
Iterative Reasoning Preference Optimization
·1561 words·8 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Meta FAIR
Iterative Reasoning Preference Optimization boosts large language model reasoning by iteratively refining preferences between generated reasoning steps, achieving significant accuracy gains on benchma…
Iteration Head: A Mechanistic Study of Chain-of-Thought
·2483 words·12 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Meta AI
Researchers reveal how Chain-of-Thought reasoning emerges in transformers via specialized ‘iteration heads’, improving LLM performance and offering insights into mechanistic interpretability.
Is Programming by Example solved by LLMs?
·2523 words·12 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Cornell University
Large Language Models (LLMs) surprisingly improve the challenging task of Programming by Example (PBE) when fine-tuned on problem-specific data, outperforming classic symbolic methods and even surpass…
IRCAN: Mitigating Knowledge Conflicts in LLM Generation via Identifying and Reweighting Context-Aware Neurons
·2251 words·11 mins· loading · loading
Natural Language Processing Large Language Models 🏢 College of Intelligence and Computing, Tianjin University
IRCAN tackles LLM knowledge conflicts by identifying and reweighting context-aware neurons, significantly improving context-sensitive outputs.
InversionView: A General-Purpose Method for Reading Information from Neural Activations
·10684 words·51 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Saarland University
InversionView unveils neural network inner workings by decoding information from activations. It identifies inputs producing similar activations, revealing the information content. Case studies on v…
Invariant Tokenization of Crystalline Materials for Language Model Enabled Generation
·2045 words·10 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Texas A&M University
Mat2Seq revolutionizes crystal structure generation using language models by creating unique, invariant 1D sequences from 3D crystal structures, enabling accurate and efficient crystal discovery with …
Interpreting Learned Feedback Patterns in Large Language Models
·2900 words·14 mins· loading · loading
Natural Language Processing Large Language Models 🏢 University of Oxford
Researchers developed methods to measure and interpret the divergence between learned feedback patterns (LFPs) in LLMs and human preferences, helping minimize discrepancies between LLM behavior and tr…