Skip to main content

🏢 Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences

Speculative Decoding with CTC-based Draft Model for LLM Inference Acceleration
·1644 words·8 mins· loading · loading
AI Generated Natural Language Processing Large Language Models 🏢 Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences
Boosting LLM inference speed, a CTC-based draft model significantly improves speculative decoding’s acceptance rate, leading to faster inference.