← Library · Frontier

DFlash Speculative Decoding Boosts LLM Inference on NVIDIA Blackwell by up to 15x

NVIDIA has released DFlash, an open-source block diffusion model for speculative decoding that significantly accelerates LLM inference on NVIDIA Blackwell GPUs. DFlash speeds up token generation by drafting entire blocks of tokens in parallel, then efficiently verifying them with the target model, rather than generating tokens one by one. It can achieve up to a 15x throughput increase for gpt-oss-120b and nearly doubles interactivity for Llama 3.1 8B compared to prior methods.

Why it matters

This technology dramatically improves the efficiency and interactivity of LLM inference, making large language models faster and more responsive for users and enabling more concurrent operations.

Learn one new AI thing every day.

Daily Deck sends you seven plain-English cards like this every morning. Free.

Start free