🏢 Department of Electrical and Computer Engineering, Seoul National University
FastKV: KV Cache Compression for Fast Long-Context Processing with Token-Selective Propagation
·4575 words·22 mins·
loading
·
loading
AI Generated
🤗 Daily Papers
Natural Language Processing
Large Language Models
🏢 Department of Electrical and Computer Engineering, Seoul National University
FastKV: A novel KV cache compression method speeds up long-context LLM processing 2x by selectively propagating tokens and using GQA-aware compression, maintaining accuracy.