AI Blog

Proportion of correct predictions out of total predictions made.

Activation Function

/ˌæktɪˈveɪʃən ˈfʌŋkʃən/

A mathematical function applied to a neuron's output to introduce non-linearity into the network.

Activation Layer

/ˌæktɪˈveɪʃən ˈleɪər/

A layer that applies a non-linear activation function element-wise to its input.

Adagrad

Adaptive learning rate optimizer that adapts rates per parameter based on historical gradients.

Adam Optimizer

/ˈædəm/

An adaptive learning rate optimization algorithm combining momentum and RMSprop.

Adversarial Training

Training technique improving model robustness by including adversarial examples.

AlexNet

Deep CNN that won ImageNet 2012, pioneering deep learning in computer vision.

Algorithm

A step-by-step procedure or formula for solving a problem or performing a task.

Artificial Intelligence (AI)

Field of computer science focused on creating systems capable of performing tasks requiring human intelligence.

Attention Mechanism

/əˈtɛnʃən ˈmɛkənɪzəm/

A technique allowing models to focus on specific parts of the input when producing output.

Attention Score

Computed similarity between query and key vectors before softmax normalization in attention.

Attention Weight

/əˈtɛnʃən weɪt/

A scalar value indicating how much focus to place on a specific part of the input when producing output.

AUC-ROC

Area under ROC curve, measuring binary classifier quality across all thresholds.

Autoencoder

/ˈɔːtoʊɪnˌkoʊdər/

A neural network trained to reconstruct its input, learning compressed representations in the process.

Average Pooling

/ˈævərɪdʒ ˈpuːlɪŋ/

A pooling operation that computes the average value from each window of the feature map.

B

Backpropagation

/ˌbækprɒpəˈɡeɪʃən/

An algorithm for training neural networks by calculating gradients of the loss function with respect to weights.

Batch Gradient Descent

/bætʃ ˈɡreɪdiənt dɪˈsɛnt/

A gradient descent variant that computes gradients using the entire training dataset in each iteration.

Batch Normalization

Normalizes layer inputs using batch statistics to stabilize and accelerate training.

Batch Size

/bætʃ saɪz/

The number of training examples used in one iteration of model training.

Batch Size

Number of training examples processed together in one forward/backward pass.

Beam Search

Decoding algorithm keeping top B most likely sequences at each step.

Bias Term

/ˈbaɪəs tɜːrm/

An additional learnable parameter in neural networks that allows shifting the activation function.

BLEU Score

Metric for machine translation quality comparing n-gram overlap between generated and reference translations.

C

Causal Masking

Masking technique preventing attention to future positions in autoregressive models.

Chain-of-Thought Prompting

Prompting technique encouraging LLMs to show intermediate reasoning steps before answering.

Classification

/ˌklæsɪfɪˈkeɪʃən/

A supervised learning task where the goal is to predict discrete class labels for input data.

CLIP (Contrastive Language-Image Pre-training)

Model

/klɪp/

A multimodal model trained to understand relationships between images and text.

Clustering

/ˈklʌstərɪŋ/

An unsupervised learning task that groups similar data points together based on their features.

CNN (Convolutional Neural Network)

/siː ɛn ɛn/

A deep learning architecture specialized for processing grid-like data such as images, using convolutional layers.

COCO (Common Objects in Context)

Dataset

Large-scale dataset for object detection, segmentation, and captioning with 330k images.

Computer Vision

Field

/kəmˈpjuːtər ˈvɪʒən/

A field of AI enabling computers to derive meaningful information from visual inputs like images and videos.

Confusion Matrix

/kənˈfjuːʒən ˈmeɪtrɪks/

A table used to evaluate classification model performance by showing true vs predicted classes.

Contrastive Learning

Self-supervised learning contrasting positive pairs against negative pairs.

Convolution

/ˌkɒnvəˈluːʃən/

A mathematical operation that slides a filter/kernel over input data to extract features.

Convolutional Layer

/ˌkɒnvəˈluːʃənəl ˈleɪər/

A layer in CNNs that applies convolution operations to extract spatial features from input data.

Cross-Validation

Resampling technique evaluating model performance by splitting data into multiple train-test folds.

D

Data Augmentation

/ˈdeɪtə ˌɔːɡmɛnˈteɪʃən/

Techniques to artificially increase training data size by creating modified versions of existing data.

Dataset

A collection of data examples used for training, validation, or testing machine learning models.

Deep Learning

/diːp ˈlɜːrnɪŋ/

A subset of machine learning using neural networks with multiple layers to learn hierarchical representations of data.

Depthwise Separable Convolution

Factorizes standard convolution into depthwise and pointwise convolutions for efficiency.

Dice Loss

Loss function based on Dice coefficient, commonly used for segmentation tasks.

Diffusion Models

Generative models that learn to create data by reversing a gradual noising process.

Dilated Convolution

Convolution with gaps between kernel elements, increasing receptive field without adding parameters.

Dimensionality Reduction

/dɪˌmɛnʃəˈnælɪti rɪˈdʌkʃən/

Techniques to reduce the number of features in data while preserving important information.

Dropout

A regularization technique that randomly deactivates neurons during training to prevent overfitting.

E

Early Stopping

/ˈɜːrli ˈstɒpɪŋ/

A regularization technique that stops training when validation performance stops improving.

Embedding

A dense vector representation of discrete entities (words, images) in a continuous space.

Encoder-Decoder

/ɪnˈkoʊdər dɪˈkoʊdər/

An architecture where an encoder processes input into a representation and a decoder generates output from it.

Epoch

/ˈɛpɒk/

One complete pass through the entire training dataset during model training.

Epoch

One complete pass through the entire training dataset during training.

Error

The difference between predicted output and true label, indicating model's mistakes.

Exploding Gradient

/ɪkˈsploʊdɪŋ ˈɡreɪdiənt/

A problem where gradients become extremely large during training, causing unstable updates and divergence.

F

F1 Score

The harmonic mean of precision and recall, providing a single balanced metric.

Feature

An individual measurable property or characteristic of data used as input to a model.

Feature Extraction

/ˈfiːtʃər ɪkˈstrækʃən/

The process of transforming raw data into numerical features that machine learning models can process.

Feature Map

/ˈfiːtʃər mæp/

The output of applying a convolution filter to an input, representing detected features.

Few-Shot Learning

/fjuː ʃɒt ˈlɜːrnɪŋ/

A model's ability to learn from a small number of examples, typically 1-10 examples per class.

Filter / Kernel

/ˈfɪltər ˈkɜːrnəl/

A small matrix of learnable weights that slides over input during convolution to detect specific features.

Fine-tuning

/faɪn ˈtjuːnɪŋ/

The process of adapting a pre-trained model to a specific task by continuing training on task-specific data.

Focal Loss

Loss function addressing class imbalance by down-weighting easy examples.

Foundation Model

/faʊnˈdeɪʃən ˈmɒdəl/

Large-scale models trained on broad data that can be adapted to a wide range of downstream tasks.

Fully Connected Layer

/ˈfʊli kəˈnɛktɪd ˈleɪər/

A neural network layer where every neuron is connected to every neuron in the previous and next layers.

G

GAN (Generative Adversarial Network)

/ɡæn/

A framework where two neural networks compete: a generator creates fake data and a discriminator tries to distinguish real from fake.

GPT (Generative Pre-trained Transformer)

Model

/dʒiː piː tiː/

A family of large language models developed by OpenAI that use transformer architecture for text generation.

Gradient

Direction and magnitude of steepest increase in loss function with respect to parameters.

Gradient Descent

/ˈɡreɪdiənt dɪˈsɛnt/

An optimization algorithm that iteratively adjusts parameters to minimize a loss function by following the gradient.

Group Normalization

Normalizes by dividing channels into groups and normalizing within each group.

GRU (Gated Recurrent Unit)

Simplified RNN variant with gating mechanisms, similar to LSTM but fewer parameters.

H

Hidden Layer

/ˈhɪdən ˈleɪər/

Layers in a neural network between input and output that learn intermediate representations.

Hyperparameter

/ˌhaɪpərpəˈræmɪtər/

A configuration variable that is set before training and controls the learning process.

I

ImageNet

Dataset

Large-scale image dataset with 14M images across 20k categories, used for ILSVRC competition.

Inception Module

CNN building block applying multiple filter sizes in parallel and concatenating results.

Inference

The process of using a trained model to make predictions on new data.

Input

Data fed into a model or neural network for processing.

Input Layer

/ˈɪnpʊt ˈleɪər/

The first layer of a neural network that receives raw input data.

Instance Normalization

Normalizes each sample independently, commonly used in style transfer.

L

Label

The target output or ground truth associated with a training example in supervised learning.

Learning Rate

/ˈlɜːrnɪŋ reɪt/

A hyperparameter controlling how much model weights are updated during training.

Learning Rate

Hyperparameter controlling how much to adjust model weights during training.

Learning Rate Schedule

/ˈlɜːrnɪŋ reɪt ˈʃɛdjuːl/

A strategy for adjusting the learning rate during training to improve convergence and performance.

LLM (Large Language Model)

Model

/ɛl ɛl ɛm/

A neural network with billions of parameters trained on massive text datasets to understand and generate human language.

LoRA (Low-Rank Adaptation)

Parameter-efficient fine-tuning adding trainable low-rank matrices to frozen weights.

Loss

A measure of how wrong the model's predictions are, used to guide training.

Loss Function

/lɒs ˈfʌŋkʃən/

A function that measures the difference between predicted and actual values, guiding model optimization.

M

Machine Translation

Automatic translation of text from one language to another.

MAE (Mean Absolute Error)

Loss function measuring average absolute difference between predicted and actual values.

Masked Language Modeling

Pre-training task where random tokens are masked and model predicts them from context.

Matrix

A 2D array of numbers arranged in rows and columns.

Max Pooling

/mæks ˈpuːlɪŋ/

A pooling operation that takes the maximum value from each window of the feature map.

Mixup

Data augmentation creating synthetic examples by mixing pairs of training samples.

MobileNet

Efficient CNN architecture using depthwise separable convolutions for mobile deployment.

Model

A mathematical representation learned from data that makes predictions or decisions.

Momentum

/moʊˈmɛntəm/

An optimization technique that accelerates gradient descent by accumulating a velocity vector in directions of persistent reduction in the loss.

MSE (Mean Squared Error)

Loss function measuring average squared difference between predicted and actual values.

Multimodal AI

/ˌmʌltɪˈmoʊdəl/

AI systems that can process and relate information from multiple modalities like text, images, audio, and video.

N

NER (Named Entity Recognition)

NLP task identifying and classifying named entities (persons, organizations, locations) in text.

Neural Network

/ˈnjʊərəl ˈnetwɜːrk/

A computational model inspired by biological neural networks, consisting of interconnected nodes (neurons) that process information.

Neuron

Basic computational unit in neural networks that receives inputs, applies weights and activation, produces output.

Next Sentence Prediction

Pre-training task predicting whether sentence B follows sentence A.

Nucleus Sampling (Top-P)

Text generation sampling from smallest set of tokens whose cumulative probability exceeds threshold P.

O

Object Detection

/ˈɒbdʒɪkt dɪˈtɛkʃən/

A computer vision task that identifies and localizes objects within an image using bounding boxes.

Optimization

The process of adjusting model parameters to minimize the loss function and improve performance.

Output

The result produced by a model after processing input data.

Output Layer

/ˈaʊtpʊt ˈleɪər/

The final layer of a neural network that produces predictions or outputs.

Overfitting

/ˌoʊvərˈfɪtɪŋ/

When a model learns training data too well, including noise, resulting in poor generalization to new data.

P

Padding

/ˈpædɪŋ/

Adding extra pixels around the border of input data to control output size in convolution operations.

Parameter

Learnable values in a model that are optimized during training (weights and biases).

Perceptron

Simplest neural network with single layer, binary classifier invented in 1950s.

Perplexity

Measurement of how well a probability model predicts a sample, for evaluating language models.

Pooling Layer

/ˈpuːlɪŋ ˈleɪər/

A downsampling layer in CNNs that reduces spatial dimensions while retaining important features.

Positional Encoding

/pəˈzɪʃənəl ɪnˈkoʊdɪŋ/

A technique to inject position information into transformer inputs since transformers lack inherent sequence order.

Precision and Recall

/prɪˈsɪʒən ənd rɪˈkɔːl/

Metrics for classification: Precision is correct positives / predicted positives; Recall is correct positives / actual positives.

Prediction

The output produced by a trained model when given new input data.

Prompt Engineering

/prɒmpt ˌɛndʒɪˈnɪərɪŋ/

The practice of designing effective text prompts to guide large language models toward desired outputs.

Pruning

Removing unnecessary weights/neurons to reduce model size and computational cost.

Q

Quantization

Reducing precision of weights/activations to lower memory and computation.

Query, Key, Value (QKV)

/ˈkwɪəri kiː ˈvæljuː/

Three vectors used in attention mechanisms to compute weighted combinations of input elements.

Question Answering

NLP task where model extracts or generates answers to questions based on context.

R

RAG (Retrieval-Augmented Generation)

/ræɡ/

A technique that enhances LLM outputs by retrieving relevant information from external knowledge bases.

Receptive Field

/rɪˈsɛptɪv fiːld/

The region of input space that affects a particular neuron's activation in a neural network.

Regression

/rɪˈɡrɛʃən/

A supervised learning task where the goal is to predict continuous numerical values.

Regularization

/ˌrɛɡjʊləraɪˈzeɪʃən/

Techniques to prevent overfitting by adding constraints or penalties to the model during training.

Reinforcement Learning

/ˌriːɪnˈfɔːrsmənt ˈlɜːrnɪŋ/

A machine learning paradigm where agents learn by interacting with an environment and receiving rewards or penalties.

ReLU (Rectified Linear Unit)

/ˈriːluː/

An activation function that outputs the input if positive, otherwise zero: f(x) = max(0, x).

ResNet (Residual Network)

/ˈrɛznɛt/

A CNN architecture that uses residual connections (skip connections) to enable training of very deep networks.

RMSprop

Adaptive learning rate optimization algorithm using moving average of squared gradients.

RNN (Recurrent Neural Network)

/rɪˈkʌrənt ˈnjʊərəl ˈnetwɜːrk/

A neural network architecture designed for sequential data, with connections that loop back to previous states.

S

Scalar

A single numerical value, a zero-dimensional tensor.

Self-Attention

/sɛlf əˈtɛnʃən/

An attention mechanism used in deep learning models that allows a neural network to weigh the importance of different parts of an input relative to each other.

Self-Supervised Learning

Learning paradigm where models create supervision signal from unlabeled data.

Semantic Segmentation

/sɪˈmæntɪk ˌsɛɡmɛnˈteɪʃən/

A computer vision task that assigns a class label to every pixel in an image.

Sentiment Analysis

NLP task determining emotional tone or opinion expressed in text.

SGD (Stochastic Gradient Descent)

/stoʊˈkæstɪk ˈɡreɪdiənt dɪˈsɛnt/

A gradient descent variant that updates weights using gradients from a single random training example at a time.

Sigmoid Function

/ˈsɪɡmɔɪd/

An activation function that maps inputs to values between 0 and 1: f(x) = 1/(1 + e^(-x)).

Softmax Function

/ˈsɒftmæks/

An activation function that converts a vector of values into a probability distribution summing to 1.

SQuAD (Stanford Question Answering Dataset)

Dataset

Reading comprehension dataset with 100k+ questions on Wikipedia articles.

Stride

/straɪd/

The number of pixels by which a filter moves across the input during convolution or pooling operations.

Supervised Learning

/ˈsuːpərvaɪzd ˈlɜːrnɪŋ/

A machine learning paradigm where models learn from labeled training data with input-output pairs.

T

Temperature (Sampling)

Parameter controlling randomness in text generation by scaling logits before softmax.

Test Data

Data held out for final evaluation of a trained model, never seen during training.

Text Summarization

NLP task condensing long text while preserving key information.

Tokenization

/ˌtoʊkənaɪˈzeɪʃən/

The process of breaking text into smaller units (tokens) like words, subwords, or characters for processing.

Top-K Sampling

Text generation technique sampling from only the K most likely next tokens.

Training Data

The subset of data used to train a machine learning model.

Transfer Learning

A technique where knowledge learned from one task is applied to a different but related task, reducing training time and data requirements.

Transformer

/trænsˈfɔːrmər/

A neural network architecture based entirely on attention mechanisms, without recurrent or convolutional layers.

Triplet Loss

Loss function learning embeddings by minimizing distance between anchor-positive and maximizing anchor-negative.

U

U-Net

/juː nɛt/

A CNN architecture designed for biomedical image segmentation, featuring an encoder-decoder structure with skip connections.

Unsupervised Learning

/ˌʌnsupərˈvaɪzd ˈlɜːrnɪŋ/

A machine learning paradigm where models find patterns in unlabeled data without explicit supervision.

V

VAE (Variational Autoencoder)

/viː eɪ iː/

A generative model that learns a probabilistic latent space representation of data.

Validation Set

/ˌvælɪˈdeɪʃən sɛt/

A portion of data held out during training to tune hyperparameters and prevent overfitting.

Vanishing Gradient

/ˈvænɪʃɪŋ ˈɡreɪdiənt/

A problem in deep networks where gradients become extremely small, preventing effective learning in early layers.

Vector

An ordered array of numbers representing a point in multi-dimensional space.

VGG

Deep CNN architecture using small 3x3 filters throughout, emphasizing depth.

ViT (Vision Transformer)

Transformer architecture adapted for computer vision by treating image patches as tokens.

W

Weight

Learnable parameters in neural networks that determine the strength of connections between neurons.

Weight Initialization

/weɪt ɪˌnɪʃəlaɪˈzeɪʃən/

Methods for setting initial values of neural network weights before training begins.

Word Embedding

/wɜːrd ɪmˈbɛdɪŋ/

Dense vector representations of words that capture semantic and syntactic relationships.

Y

YOLO (You Only Look Once)

/ˈjoʊloʊ/

A real-time object detection architecture that frames detection as a regression problem.

Z

Zero-Shot Learning