↓Skip to main content

🏢 Department of Electrical and Computer Engineering, Seoul National University

FastKV: KV Cache Compression for Fast Long-Context Processing with Token-Selective Propagation

3 February 2025·4575 words·22 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Department of Electrical and Computer Engineering, Seoul National University

FastKV: A novel KV cache compression method speeds up long-context LLM processing 2x by selectively propagating tokens and using GQA-aware compression, maintaining accuracy.