🏢 Yandex HSE University
SpecExec: Massively Parallel Speculative Decoding For Interactive LLM Inference on Consumer Devices
·2263 words·11 mins·
loading
·
loading
Natural Language Processing
Large Language Models
🏢 Yandex HSE University
SpecExec achieves massively parallel speculative decoding, enabling interactive 50B+ parameter LLM inference on consumer devices at 4-6 tokens/second.