A neural network architecture, or blueprint, especially good at understanding sequences of data like text or speech. It processes all parts of a sequence at once, rather than one piece at a time.

What else is transformer called?

transformer is also referred to as Transformers.

What is transformer? A neural network architecture, or…

The Transformer is a neural network architecture introduced in a 2017 paper by Google researchers titled 'Attention Is All You Need.' Before the Transformer, many AI models processed sequences of data, like words in a sentence, one after another. This made it difficult for them to connect parts of a sequence that were far apart. The Transformer changed this by introducing a mechanism called 'self-attention.'

Self-attention allows the model to weigh the importance of different parts of the input sequence when processing each element. For example, when reading a sentence, it can decide to pay more attention to specific words that are key to understanding the meaning of another word. This ability to assess context across an entire sequence simultaneously, rather than step-by-step, made Transformers incredibly powerful for tasks like language translation and text generation.

Transformers laid the foundation for many of today's most advanced large language models (LLMs), including popular ones like OpenAI's GPT series and Google's BERT and T5 models. Its design fundamentally changed how AI approaches sequence-to-sequence problems, leading to significant breakthroughs in natural language processing and other fields.

Learn AI in 5 minutes a day.

Daily Deck explains terms like transformer as part of a free seven-card daily brief. No jargon. No fluff.

Start free

Related terms

Learn AI in 5 minutes a day.