The Magic of Prediction

Generative AI is essentially a powerful prediction machine. It learns the patterns in data to predict the next element in a sequence—be it a word in a sentence, a pixel in an image, or a note in a song. This guide, inspired by Andrej Karpathy's first-principles approach, demystifies AI by splitting it into two parts: how a model is trained, and how it's used.

PART 1

How a Generative AI Model Learns (Training)

This is the "schooling" phase for the AI—a massive, complex, and computationally expensive process of creating and training a model from the ground up.

Neural Networks & Backpropagation

A neural network is the AI's brain, made of interconnected "neurons" (mathematical functions). It learns through backpropagation, a process of trial and error. The network makes a prediction, checks its mistake, and adjusts its parameters to improve. Click the button below to simulate this learning process.

The Transformer & Self-Attention

The revolutionary Transformer architecture allows a model to understand context. Using self-attention, it weighs the importance of every word relative to all other words in a sentence. Hover over a word in the example below to see what other words it "pays attention" to.

The robot picked up the ball because it was heavy.
Hover over a word. For example, "it" pays high attention to "robot" and "ball" to determine its meaning.

The Curriculum: Pre-training & Fine-tuning

1. Pre-training

The model is trained on a massive dataset from the internet with a simple goal: predict the next word. This is where it learns grammar, facts, and reasoning.

  • Data Size: Massive (Internet-scale)
  • Cost: Very High
  • Goal: General Knowledge

2. Fine-tuning

The pre-trained model is then trained on a smaller, curated dataset to align with human preferences, making it a helpful and harmless assistant.

  • Data Size: Specific & Curated
  • Cost: Lower
  • Goal: Alignment & Safety
PART 2

How We Use a Trained Model (Inference)

Once trained, the model is ready to generate content. This is the inference process. Enter a prompt below and see how the AI generates a response, one step at a time.

1️⃣

Tokenization

Input text is broken into pieces (tokens).

2️⃣

Embedding

Tokens are converted into numerical representations.

3️⃣

Transformer

The model processes the numbers to understand context.

4️⃣

Prediction

It outputs probabilities for the next possible token.

5️⃣

Sampling

A token is selected based on the probabilities.

6️⃣

Repetition

The new token is added, and the process repeats.