Skip to main content

🏢 Department of Electrical and Computer Engineering, Seoul National University

FastKV: KV Cache Compression for Fast Long-Context Processing with Token-Selective Propagation
·4575 words·22 mins· loading · loading
AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Department of Electrical and Computer Engineering, Seoul National University
FastKV: A novel KV cache compression method speeds up long-context LLM processing 2x by selectively propagating tokens and using GQA-aware compression, maintaining accuracy.