← Library · Definition

Tokenization

Tokenization is the process of breaking down a piece of text into smaller units called tokens. These tokens can be words, subwords, or even characters, and they are the basic building blocks that machine learning models, especially large language models, use to understand and process human language.

Learn one new AI thing every day.

Daily Deck sends you seven plain-English cards like this every morning. Free.

Start free