Curriculum
Intermediate
Technical Encyclopedia
A high-fidelity guide covering the full spectrum of Large Language Model theory, from foundational principles to production architecture.
AI vs ML vs DL
Technical Specification
Understanding the hierarchy of Artificial Intelligence.
Conceptual Analogy
"AI is the planet, Machine Learning is a continent on that planet, and Deep Learning is a specific city on that continent."
Implementation Details
Artificial Intelligence is the broad discipline of creating intelligent machines. Machine Learning is a subset where machines learn from data without explicit programming. Deep Learning is a subset of ML using multi-layered neural networks.
Supervised Learning
Technical Specification
Learning with a teacher (labeled data).
Conceptual Analogy
"Like a teacher showing flashcards: 'This is a Cat', 'This is a Dog'. Eventually, the student learns to identify them alone."
Implementation Details
A training paradigm where the model learns to map inputs (features) to outputs (labels) based on example input-output pairs.
Neural Network
Technical Specification
A computer system inspired by the human brain.
Conceptual Analogy
"A team of people solving a puzzle. Each person focuses on one small piece, passes their finding to the next person, until the full picture is revealed."
Implementation Details
A series of algorithms that endeavor to recognize underlying relationships in a set of data through a process that mimics the way the human brain operates.
Training
Technical Specification
Teaching the model by showing it examples and correcting mistakes.
Conceptual Analogy
"Like school. The model takes a test, gets a grade (Loss), and studies to do better next time (Gradient Descent)."
Implementation Details
The process of optimizing the model's parameters (weights and biases) to minimize the loss function on a dataset.
Epoch
Technical Specification
One complete pass through the entire training dataset.
Conceptual Analogy
"One full reading of the textbook."
Implementation Details
A hyperparameter defining the number of times the learning algorithm will work through the entire training dataset.
Inference
Technical Specification
Using the trained model to make predictions.
Conceptual Analogy
"Taking the final exam using what you learned."
Implementation Details
The stage where a trained model is deployed to generate predictions on new, unseen data.
Artificial Neuron
Technical Specification
The atomic unit of a neural network.
Conceptual Analogy
"A tiny decision switch. It takes signals in, weighs them, and if the signal is strong enough, it fires."
Implementation Details
Computes y = f(∑(w * x) + b). It performs a weighted sum of inputs, adds a bias, and passes it through an activation function.
Tokenization
Technical Specification
Breaking text into chunks (tokens) for the model.
Conceptual Analogy
"Chopping a sentence into Lego bricks. 'I love AI' -> ['I', 'love', 'AI']."
Implementation Details
The process of converting raw text into specific IDs from a fixed vocabulary. BPE (Byte Pair Encoding) is commonly used to balance word and character level splitting.
Embedding
Technical Specification
Converting tokens into meaningful number lists (vectors).
Conceptual Analogy
"GPS coordinates for words. 'King' and 'Queen' are close together on the map."
Implementation Details
Dense vector representations where semantically similar words map to nearby points in high-dimensional space.
Gradient Descent
Technical Specification
The algorithm used to minimize errors.
Conceptual Analogy
"Descending a misty mountain. You feel the slope with your feet and execute a step downwards."
Implementation Details
An iterative optimization algorithm for finding a local minimum of a differentiable function.
Bias
Technical Specification
An adjustable threshold for a neuron.
Conceptual Analogy
"Like a personal preference. Even if the input arguments are weak, a high bias might make you say 'Yes' anyway."
Implementation Details
A learnable parameter added to the weighted sum, allowing the activation function to be shifted left or right.
Weight
Technical Specification
The strength of a connection between neurons.
Conceptual Analogy
"Like the volume knob on a radio channel. High weight means the signal is loud and important; low weight means it's ignored."
Implementation Details
A learnable parameter that scales the input signal. Training involves adjusting these weights to minimize error.
Activation Function
Technical Specification
Decides if a neuron should 'fire' or stay silent.
Conceptual Analogy
"The rule for the switch. 'If the total pressure is above 50, open the floodgate.'"
Implementation Details
A non-linear function (like ReLU or GELU) applied to the neuron's output, enabling the network to learn complex patterns.
Loss Function
Technical Specification
A score of how wrong the model is.
Conceptual Analogy
"The difference between your answer and the correct answer on a test. Lower score is better!"
Implementation Details
A mathematical function (e.g., Cross-Entropy) that quantifies the discrepancy between the predicted output and the actual target.
Backpropagation
Technical Specification
Calculating who is to blame for an error.
Conceptual Analogy
"If a company loses money, you trace back from the CEO -> Manager -> Worker to find where the mistake happened."
Implementation Details
The algorithm for computing the gradient of the loss function with respect to the weights using the chain rule.
Batch Size
Technical Specification
Number of examples the model sees before updating itself.
Conceptual Analogy
"Do you grade homework one by one, or collect 32 of them and grade them all at once?"
Implementation Details
The number of training samples to work through before the model's internal parameters are updated.
Top-K Sampling
Technical Specification
Limiting choices to the top K best options.
Conceptual Analogy
"Instead of picking from every word in the dictionary, only consider the top 5 most likely next words."
Implementation Details
A decoding strategy that filters the distribution to only the top K most probable next tokens.
Determinism
Technical Specification
Getting the exact same result every time.
Conceptual Analogy
"Calculating 2+2 (always 4) vs asking a friend for a story (always different)."
Implementation Details
A property where the model produces the same output for a given input, usually achieved by setting temperature to 0.
Parameters
Technical Specification
The total number of adjustable weights and biases in the model.
Conceptual Analogy
"The number of synapses in a brain. More parameters generally mean more knowledge and intelligence."
Implementation Details
The sum of all weights and biases in the neural network. Larger models generally have higher capacity but require more compute.
Temperature
Technical Specification
Controls the creativity/randomness of the model.
Conceptual Analogy
"Low temp = Robot, always predictable. High temp = Poet, creative but maybe crazy."
Implementation Details
A hyperparameter used to scale the logits before applying softmax. Higher values flatten the distribution (more random), lower values sharpen it (more deterministic).
Context Window
Technical Specification
How much text the model can remember at once.
Conceptual Analogy
"Short-term memory. If it's too full, the model forgets the beginning of the conversation."
Implementation Details
The maximum number of tokens the model can process in a single forward pass.
Transformer Architecture
Technical Specification
The architecture behind GPT, BERT, and modern LLMs.
Conceptual Analogy
"An assembly line that processes the entire sentence at once (Parallel) rather than word-by-word (Sequential/RNN)."
Implementation Details
Introduced in 'Attention Is All You Need' (2017). Relies entirely on self-attention mechanisms to draw global dependencies between input and output.
Self-Attention Mechanism
Technical Specification
Allows the model to focus on relevant parts of the input.
Conceptual Analogy
"Reading a sentence and looking back at previous words to understand 'it' or 'they'."
Implementation Details
Computes attention scores using Query (Q), Key (K), and Value (V) matrices: Attention(Q, K, V) = softmax(QK^T / sqrt(d_k))V.
Query (Q)
Technical Specification
What a token is looking for.
Conceptual Analogy
"Like a Google search bar. The word 'Bank' sends out a query: 'Are we talking about money or rivers?'"
Implementation Details
A vector derived from the input embedding representing the current token's search intent.
Key (K)
Technical Specification
What a token contains/offers.
Conceptual Analogy
"Like the tag on a file folder. 'River' has a key saying 'I am about nature/water'."
Implementation Details
A vector derived from the input embedding that serves as a label or index for matching with Queries.
Value (V)
Technical Specification
The actual information passed along.
Conceptual Analogy
"The content inside the folder. If the Query matches the Key, you get the Value."
Implementation Details
A vector derived from the input embedding containing the actual information to be aggregated if the attention score is high.
Encoder-Decoder
Technical Specification
A common architecture for sequence-to-sequence tasks.
Conceptual Analogy
"Like a translator. The encoder understands the input language, and the decoder generates the output in another language."
Implementation Details
An architecture where an encoder processes the input sequence into a fixed-size context vector, and a decoder generates an output sequence from that context vector. Used in machine translation and summarization.
Softmax Function
Technical Specification
Converts numbers into probabilities that sum to one.
Conceptual Analogy
"Like a popularity contest. It takes raw scores and turns them into percentages, showing how popular each option is."
Implementation Details
A function that takes a vector of arbitrary real-valued scores and squashes them to a vector of probabilities, where the probabilities of each value are between 0 and 1 and sum to 1.
Visual Encoder (ViT)
Technical Specification
The part of the model that learns to 'see' images.
Conceptual Analogy
"The eyes and visual cortex of the AI. It turns pixels into understanding."
Implementation Details
Usually a Vision Transformer (ViT) that chops an image into patches and processes them similarly to text tokens to extract visual features.
Spectrogram
Technical Specification
A visual picture of sound frequencies.
Conceptual Analogy
"Sheet music for a computer. It shows low and high notes over time."
Implementation Details
A visual representation of the spectrum of frequencies of a signal as it varies with time.
Multimodal
Technical Specification
Capable of understanding Text, Audio, and Images.
Conceptual Analogy
"A genius who can read, listen, and see, instead of just reading."
Implementation Details
The ability of a single model to process and relate information from multiple modalities (text, vision, audio) in a shared embedding space.
Quantization
Technical Specification
Reducing the precision of numbers to make the model smaller and faster.
Conceptual Analogy
"Like lowering the resolution of an image. It looks almost the same but takes up way less space."
Implementation Details
The process of mapping continuous infinite values to a smaller set of discrete finite values, e.g., converting 32-bit floats to 4-bit integers.
End of Technical Documentation