Skip to main content

🏢 Yandex HSE University

SpecExec: Massively Parallel Speculative Decoding For Interactive LLM Inference on Consumer Devices
·2263 words·11 mins· loading · loading
Natural Language Processing Large Language Models 🏢 Yandex HSE University
SpecExec achieves massively parallel speculative decoding, enabling interactive 50B+ parameter LLM inference on consumer devices at 4-6 tokens/second.