🏢 Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences
Speculative Decoding with CTC-based Draft Model for LLM Inference Acceleration
·1644 words·8 mins·
loading
·
loading
AI Generated
Natural Language Processing
Large Language Models
🏢 Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences
Boosting LLM inference speed, a CTC-based draft model significantly improves speculative decoding’s acceptance rate, leading to faster inference.