The Magic of Prediction
Generative AI is essentially a powerful prediction machine. It learns the patterns in data to predict the next element in a sequence—be it a word in a sentence, a pixel in an image, or a note in a song. This guide, inspired by Andrej Karpathy's first-principles approach, demystifies AI by splitting it into two parts: how a model is trained, and how it's used.
How a Generative AI Model Learns (Training)
This is the "schooling" phase for the AI—a massive, complex, and computationally expensive process of creating and training a model from the ground up.
Neural Networks & Backpropagation
A neural network is the AI's brain, made of interconnected "neurons" (mathematical functions). It learns through backpropagation, a process of trial and error. The network makes a prediction, checks its mistake, and adjusts its parameters to improve. Click the button below to simulate this learning process.
The Transformer & Self-Attention
The revolutionary Transformer architecture allows a model to understand context. Using self-attention, it weighs the importance of every word relative to all other words in a sentence. Hover over a word in the example below to see what other words it "pays attention" to.
The Curriculum: Pre-training & Fine-tuning
1. Pre-training
The model is trained on a massive dataset from the internet with a simple goal: predict the next word. This is where it learns grammar, facts, and reasoning.
- Data Size: Massive (Internet-scale)
- Cost: Very High
- Goal: General Knowledge
2. Fine-tuning
The pre-trained model is then trained on a smaller, curated dataset to align with human preferences, making it a helpful and harmless assistant.
- Data Size: Specific & Curated
- Cost: Lower
- Goal: Alignment & Safety
How We Use a Trained Model (Inference)
Once trained, the model is ready to generate content. This is the inference process. Enter a prompt below and see how the AI generates a response, one step at a time.
Tokenization
Input text is broken into pieces (tokens).
Embedding
Tokens are converted into numerical representations.
Transformer
The model processes the numbers to understand context.
Prediction
It outputs probabilities for the next possible token.
Sampling
A token is selected based on the probabilities.
Repetition
The new token is added, and the process repeats.