↓Skip to main content

🏢 Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences

Speculative Decoding with CTC-based Draft Model for LLM Inference Acceleration

26 September 2024·1644 words·8 mins· loading · loading

AI Generated Natural Language Processing Large Language Models 🏢 Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences

Boosting LLM inference speed, a CTC-based draft model significantly improves speculative decoding’s acceptance rate, leading to faster inference.