← Library · Frontier

DeepSeek Releases DSpark, Accelerating LLM Inference with Speculative Decoding

DeepSeek has launched DSpark, a speculative decoding framework designed to accelerate per-user generation for its DeepSeek-V4 models. DSpark pairs a parallel draft backbone with a tiny sequential head and incorporates a confidence head and a load-aware scheduler to verify more tokens when GPUs are idle and fewer when busy. This serving optimization results in 60-85% faster per-user generation for DeepSeek-V4-Flash and 57-78% for V4-Pro compared to their MTP-1 baseline, without compromising output quality. The checkpoints and training code are open-source.

Why it matters

DSpark significantly boosts the inference speed of large language models, making them more efficient and cost-effective for production environments.

Learn one new AI thing every day.

Daily Deck sends you seven plain-English cards like this every morning. Free.

Start free