LLM Lab
Search

AI/ML Encyclopedia

A comprehensive guide from Beginner concepts to Expert architecture details.

Level: Beginner

AI vs ML vs DL

Definition

Understanding the hierarchy of Artificial Intelligence.

Analogy

"AI is the planet, Machine Learning is a continent on that planet, and Deep Learning is a specific city on that continent."

Visual Model
graph TD A[Artificial Intelligence] --> B[Machine Learning] B --> C[Deep Learning] C --> D[LLMs / Transformers]

Technical Deep Dive

Artificial Intelligence is the broad discipline of creating intelligent machines. Machine Learning is a subset where machines learn from data without explicit programming. Deep Learning is a subset of ML using multi-layered neural networks.

Supervised Learning

Definition

Learning with a teacher (labeled data).

Analogy

"Like a teacher showing flashcards: 'This is a Cat', 'This is a Dog'. Eventually, the student learns to identify them alone."

Visual Model
flowchart LR A[Input Image] --> B[Model] B --> C[Prediction: Cat] C -- Compare --> D[Label: Cat] D -- Correct! --> B

Technical Deep Dive

A training paradigm where the model learns to map inputs (features) to outputs (labels) based on example input-output pairs.

Neural Network

Definition

A computer system inspired by the human brain.

Analogy

"A team of people solving a puzzle. Each person focuses on one small piece, passes their finding to the next person, until the full picture is revealed."

Visual Model
graph LR A[Input Layer] --> B[Hidden Layer 1] B --> C[Hidden Layer 2] C --> D[Output Layer]

Technical Deep Dive

A series of algorithms that endeavor to recognize underlying relationships in a set of data through a process that mimics the way the human brain operates.

Training

Definition

Teaching the model by showing it examples and correcting mistakes.

Analogy

"Like school. The model takes a test, gets a grade (Loss), and studies to do better next time (Gradient Descent)."

Technical Deep Dive

The process of optimizing the model's parameters (weights and biases) to minimize the loss function on a dataset.

Epoch

Definition

One complete pass through the entire training dataset.

Analogy

"One full reading of the textbook."

Technical Deep Dive

A hyperparameter defining the number of times the learning algorithm will work through the entire training dataset.

Inference

Definition

Using the trained model to make predictions.

Analogy

"Taking the final exam using what you learned."

Technical Deep Dive

The stage where a trained model is deployed to generate predictions on new, unseen data.

Level: Intermediate

Artificial Neuron

Definition

The atomic unit of a neural network.

Analogy

"A tiny decision switch. It takes signals in, weighs them, and if the signal is strong enough, it fires."

Visual Model
graph LR X1[Input 1] -- w1 --> S((Sum)) X2[Input 2] -- w2 --> S B[Bias] --> S S --> A[Activation] A --> Y[Output]

Technical Deep Dive

Computes y = f(∑(w * x) + b). It performs a weighted sum of inputs, adds a bias, and passes it through an activation function.

Tokenization

Definition

Breaking text into chunks (tokens) for the model.

Analogy

"Chopping a sentence into Lego bricks. 'I love AI' -> ['I', 'love', 'AI']."

Visual Model
graph TD A["Raw Text: 'Learning AI'"] --> B[Tokenizer] B --> C["Tokens: [1024, 4522]"]

Technical Deep Dive

The process of converting raw text into specific IDs from a fixed vocabulary. BPE (Byte Pair Encoding) is commonly used to balance word and character level splitting.

Embedding

Definition

Converting tokens into meaningful number lists (vectors).

Analogy

"GPS coordinates for words. 'King' and 'Queen' are close together on the map."

Visual Model
graph TD Token[Token: 'Cat'] --> Embed[Embedding Layer] Embed --> Vector["Vector: [0.1, 0.9, -0.4...]"]

Technical Deep Dive

Dense vector representations where semantically similar words map to nearby points in high-dimensional space.

Gradient Descent

Definition

The algorithm used to minimize errors.

Analogy

"Descending a misty mountain. You feel the slope with your feet and execute a step downwards."

Visual Model
graph TD A[Calculate Loss] --> B[Compute Gradient] B --> C[Update Weights] C --> A

Technical Deep Dive

An iterative optimization algorithm for finding a local minimum of a differentiable function.

Bias

Definition

An adjustable threshold for a neuron.

Analogy

"Like a personal preference. Even if the input arguments are weak, a high bias might make you say 'Yes' anyway."

Technical Deep Dive

A learnable parameter added to the weighted sum, allowing the activation function to be shifted left or right.

Weight

Definition

The strength of a connection between neurons.

Analogy

"Like the volume knob on a radio channel. High weight means the signal is loud and important; low weight means it's ignored."

Technical Deep Dive

A learnable parameter that scales the input signal. Training involves adjusting these weights to minimize error.

Activation Function

Definition

Decides if a neuron should 'fire' or stay silent.

Analogy

"The rule for the switch. 'If the total pressure is above 50, open the floodgate.'"

Technical Deep Dive

A non-linear function (like ReLU or GELU) applied to the neuron's output, enabling the network to learn complex patterns.

Loss Function

Definition

A score of how wrong the model is.

Analogy

"The difference between your answer and the correct answer on a test. Lower score is better!"

Technical Deep Dive

A mathematical function (e.g., Cross-Entropy) that quantifies the discrepancy between the predicted output and the actual target.

Backpropagation

Definition

Calculating who is to blame for an error.

Analogy

"If a company loses money, you trace back from the CEO -> Manager -> Worker to find where the mistake happened."

Technical Deep Dive

The algorithm for computing the gradient of the loss function with respect to the weights using the chain rule.

Batch Size

Definition

Number of examples the model sees before updating itself.

Analogy

"Do you grade homework one by one, or collect 32 of them and grade them all at once?"

Technical Deep Dive

The number of training samples to work through before the model's internal parameters are updated.

Top-K Sampling

Definition

Limiting choices to the top K best options.

Analogy

"Instead of picking from every word in the dictionary, only consider the top 5 most likely next words."

Technical Deep Dive

A decoding strategy that filters the distribution to only the top K most probable next tokens.

Determinism

Definition

Getting the exact same result every time.

Analogy

"Calculating 2+2 (always 4) vs asking a friend for a story (always different)."

Technical Deep Dive

A property where the model produces the same output for a given input, usually achieved by setting temperature to 0.

Parameters

Definition

The total number of adjustable weights and biases in the model.

Analogy

"The number of synapses in a brain. More parameters generally mean more knowledge and intelligence."

Technical Deep Dive

The sum of all weights and biases in the neural network. Larger models generally have higher capacity but require more compute.

Temperature

Definition

Controls the creativity/randomness of the model.

Analogy

"Low temp = Robot, always predictable. High temp = Poet, creative but maybe crazy."

Technical Deep Dive

A hyperparameter used to scale the logits before applying softmax. Higher values flatten the distribution (more random), lower values sharpen it (more deterministic).

Context Window

Definition

How much text the model can remember at once.

Analogy

"Short-term memory. If it's too full, the model forgets the beginning of the conversation."

Technical Deep Dive

The maximum number of tokens the model can process in a single forward pass.

Level: Expert

Transformer Architecture

Definition

The architecture behind GPT, BERT, and modern LLMs.

Analogy

"An assembly line that processes the entire sentence at once (Parallel) rather than word-by-word (Sequential/RNN)."

Visual Model
graph TD Input --> Embed Embed --> Enc[Encoder Blocks] Enc --> Dec[Decoder Blocks] Dec --> Output

Technical Deep Dive

Introduced in 'Attention Is All You Need' (2017). Relies entirely on self-attention mechanisms to draw global dependencies between input and output.

Self-Attention Mechanism

Definition

Allows the model to focus on relevant parts of the input.

Analogy

"Reading a sentence and looking back at previous words to understand 'it' or 'they'."

Visual Model
graph TD X[Input] --> Q[Query] X --> K[Key] X --> V[Value] Q -- MatMul --> S[Scores] K -- MatMul --> S S -- Softmax --> W[Weights] W -- MatMul --> Out[Output] V -- MatMul --> Out

Technical Deep Dive

Computes attention scores using Query (Q), Key (K), and Value (V) matrices: Attention(Q, K, V) = softmax(QK^T / sqrt(d_k))V.

Query (Q)

Definition

What a token is looking for.

Analogy

"Like a Google search bar. The word 'Bank' sends out a query: 'Are we talking about money or rivers?'"

Technical Deep Dive

A vector derived from the input embedding representing the current token's search intent.

Key (K)

Definition

What a token contains/offers.

Analogy

"Like the tag on a file folder. 'River' has a key saying 'I am about nature/water'."

Technical Deep Dive

A vector derived from the input embedding that serves as a label or index for matching with Queries.

Value (V)

Definition

The actual information passed along.

Analogy

"The content inside the folder. If the Query matches the Key, you get the Value."

Technical Deep Dive

A vector derived from the input embedding containing the actual information to be aggregated if the attention score is high.

Encoder-Decoder

Definition

A common architecture for sequence-to-sequence tasks.

Analogy

"Like a translator. The encoder understands the input language, and the decoder generates the output in another language."

Visual Model
graph TD Input --> Encoder Encoder --> Context Context --> Decoder Decoder --> Output

Technical Deep Dive

An architecture where an encoder processes the input sequence into a fixed-size context vector, and a decoder generates an output sequence from that context vector. Used in machine translation and summarization.

Softmax Function

Definition

Converts numbers into probabilities that sum to one.

Analogy

"Like a popularity contest. It takes raw scores and turns them into percentages, showing how popular each option is."

Visual Model
graph TD Input[Logits: -1, 0, 3] --> Softmax Softmax --> Output[Probabilities: 0.04, 0.10, 0.86]

Technical Deep Dive

A function that takes a vector of arbitrary real-valued scores and squashes them to a vector of probabilities, where the probabilities of each value are between 0 and 1 and sum to 1.

Visual Encoder (ViT)

Definition

The part of the model that learns to 'see' images.

Analogy

"The eyes and visual cortex of the AI. It turns pixels into understanding."

Technical Deep Dive

Usually a Vision Transformer (ViT) that chops an image into patches and processes them similarly to text tokens to extract visual features.

Spectrogram

Definition

A visual picture of sound frequencies.

Analogy

"Sheet music for a computer. It shows low and high notes over time."

Technical Deep Dive

A visual representation of the spectrum of frequencies of a signal as it varies with time.

Multimodal

Definition

Capable of understanding Text, Audio, and Images.

Analogy

"A genius who can read, listen, and see, instead of just reading."

Technical Deep Dive

The ability of a single model to process and relate information from multiple modalities (text, vision, audio) in a shared embedding space.

Quantization

Definition

Reducing the precision of numbers to make the model smaller and faster.

Analogy

"Like lowering the resolution of an image. It looks almost the same but takes up way less space."

Technical Deep Dive

The process of mapping continuous infinite values to a smaller set of discrete finite values, e.g., converting 32-bit floats to 4-bit integers.

End of Knowledge Base