← Library · Core concept

Model Quantization

Model quantization is a technique used to shrink the size of AI models and speed up inference by converting their numerical representations (e.g., weights and activations) from high-precision floating-point numbers to lower-precision integers. This process reduces memory footprint and computational requirements, making models more suitable for deployment on resource-constrained devices like mobile phones or embedded systems, often with minimal impact on accuracy.

In plain terms

Model quantization is like replacing a detailed, high-resolution painting with a slightly pixelated but still recognizable version, to make it easier to display on a smaller screen or quicker to send over a slow connection.

Why it matters

It makes powerful AI models practical for real-world edge devices and low-power environments, expanding their accessibility and utility beyond cloud servers.

Learn one new AI thing every day.

Daily Deck sends you seven plain-English cards like this every morning. Free.

Start free