Skip to main content

🏢 Dept. of Computer Science, Rice University

KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization
·3037 words·15 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Dept. of Computer Science, Rice University
Boost LLM inference speed 1.4-3.5x by using Coupled Quantization (CQ) to compress KV cache down to 1 bit per channel, while preserving model accuracy.