AI/ML Dictionary
The comprehensive A-Z guide to Artificial Intelligence concepts, formulas, and terminology. Demystifying the jargon one term at a time.
A
Attention Mechanism
A mechanism that allows neural networks to focus on specific parts of the input sequence when generating output, enabling long-range dependencies.
Activation Function
A mathematical gate that decides a neuron's output, introducing non-linearity to the network.
Autoencoder
A neural network that learns to compress input data into a latent representation and then reconstruct it.
B
Backpropagation
The algorithm used to train neural networks by calculating independent gradients for each weight via the chain rule.
Batch Normalization
A technique to standardize inputs to a layer for each mini-batch, stabilizing the learning process.
BERT (Bidirectional Encoder Representations from Transformers)
A transformer-based model focusing on understanding context from both left and right directions.
Bias (Inductive)
Assumptions built into a model to help it learn effectively (e.g., CNNs assume spatial locality).
C
Chain of Thought (CoT)
A prompting technique enabling LLMs to decompose complex problems into intermediate reasoning steps.
Convolutional Neural Network (CNN)
A network architecture specialized for processing grid-like data such as images.
Cross-Entropy Loss
A loss function typically used in classification tasks, measuring the difference between true and predicted distributions.
D
Data Augmentation
Increasing the diversity of training data by modifying existing samples.
Decoder
The component of a Transformer that generates the output sequence.
Diffusion Model
Generative models that create data by reversing a noise addition process.
Dropout
Regularization technique dropping random neurons during training.
E
Embedding
A continuous vector representation of discrete variables (words, images) where semantic similarity translates to geometric proximity.
Encoder
The part of a model that processes input data into a context vector or embedding.
Epoch
One full cycle through the entire training dataset.
F
Few-Shot Learning
Providing a model with a small number of examples (shots) to guide its performance on a new task.
Fine-Tuning
Taking a pre-trained model and training it further on a specific dataset.
G
Gradient Descent
An optimization algorithm used to minimize the loss function by iteratively moving in the direction of steepest descent.
GAN (Generative Adversarial Network)
Framework where a Generator and Discriminator compete to create realistic data.
GPT (Generative Pre-trained Transformer)
A series of decoder-only transformer models developed by OpenAI.
H
Hallucination
Confident but incorrect outputs generated by an AI.
Hyperparameter
Parameters set before training (e.g., learning rate) rather than learned.
I
Inference
Using a trained model to make predictions.
K
Knowledge Distillation
transferring knowledge from a large 'teacher' model to a smaller 'student' model.
L
LoRA (Low-Rank Adaptation)
A parameter-efficient fine-tuning technique that freezes pre-trained weights and injects trainable rank decomposition matrices.
Latent Space
A compressed, abstract representation of data within a model.
Learning Rate
Step size for the optimization algorithm.
Logits
The raw, unnormalized predictions generated by the last layer of a neural network before Softmax.
LSTM (Long Short-Term Memory)
A type of RNN capable of learning long-term dependencies, resolving the vanishing gradient problem.
M
Model Collapse
Degradation of generative models when trained on AI-generated data recursively.
N
NLP (Natural Language Processing)
AI focused on interaction with human language.
O
Objective Function
The function the model aims to maximize or minimize (also Loss Function).
One-Hot Encoding
Representing categorical variables as binary vectors.
Overfitting
Memorizing training data instead of generalizing.
P
Parameter
Internal variables (weights/biases) learned by the model.
Perceptron
The simplest type of feedforward neural network classifier.
Prompt Engineering
The art of crafting inputs (prompts) to guide Large Language Models to produce desired outputs.
Q
Quantization
The process of reducing the precision of a model's weights (e.g., from 32-bit float to 8-bit integer) to reduce memory usage and increase speed.
R
Reinforcement Learning from Human Feedback (RLHF)
A method to align language models with human values by fine-tuning them using a reward model trained on human preferences.
RAG (Retrieval-Augmented Generation)
A technique that retrieves relevant external knowledge and feeds it to an LLM to generate more accurate and up-to-date responses.
ResNet (Residual Network)
A CNN architecture using 'skip connections' that allow gradients to flow easily during training, enabling extremely deep networks.
Regularization
A set of techniques used to prevent overfitting by penalizing complex models.
RNN (Recurrent Neural Network)
Network for sequential data processing.
S
Softmax
A function that converts a vector of raw scores (logits) into a probability distribution summing to 1.
Scaling Laws
Empirical observation that model performance improves predictably with model size, data size, and compute.
Self-Attention
Mechanism relating different positions of a single sequence to compute a representation of the sequence.
Sigmoid
Activation function mapping predictions to 0-1.
Supervised Learning
Training on labelled data.
T
Transformer
A deep learning architecture based entirely on attention mechanisms, dispensing with recurrence and convolutions.
Temperature
Hyperparameter controlling randomness in LLM generation (High = creative, Low = deterministic).
Tensor
A multi-dimensional array, the fundamental data structure in ML frameworks like PyTorch/TensorFlow.
Token
The basic unit of text for an LLM (roughly 0.75 words).
Tokenization
Splitting text into tokens.
Transfer Learning
Applying knowledge from one task to a related one.
U
Underfitting
Model is too simple to learn the data.
Unsupervised Learning
Finding patterns in unlabeled data.
V
Validation Set
Data used to tuning hyperparameters, separate from training and test sets.
Vanishing Gradient
Gradients becoming too small to train deep networks efficiently.
Vector Database
Storage for high-dimensional embeddings.
W
Weights
The strength of connections between neurons.
Z
Zero-Shot Learning
Performing tasks without specific training examples.