← Library · Core concept

Quantization

Quantization is a technique that reduces the precision of the numerical representations used in a machine learning model, often from 32-bit floating point numbers to 8-bit integers. This process makes models smaller and faster, requiring less memory and computational power for inference. While it can introduce a small drop in accuracy, the gains in efficiency are often significant.

In plain terms

It's like converting a high-resolution photograph to a lower-resolution one; you lose some detail but gain a much smaller file size and faster loading time.

Why it matters

Quantization enables AI models to run efficiently on resource-constrained devices like mobile phones or embedded systems, significantly broadening their practical applications.

Learn one new AI thing every day.

Daily Deck sends you seven plain-English cards like this every morning. Free.

Start free