Machine Learning Glossary

The following is a list of short explanations of different terms in machine learning. The aim is to keep things simple and brief, not to explain the terms in full detail.

Active Learning

The algorithm gives a pattern and asks for a label.

Backpropagation

A clever implementation of gradient descent for neural networks.

Bias

Bias is a concept which describes a systematic error. A classifier with a high bias tends to give one answer more often, no matter what the input is. This concept is relatied to variance and well described with the images here.

BLSTM, BiLSTM

Bidirectional long short-term memory (see paper and poster).

Co-Training

A form of semi-supervised learning. Two independant classifiers are trained on different labeled datasets. The classifiers are applied to the unlabeled data. Data with high confidence will be added to the other classifiers data.

Collaborative Filtering

You have users and items which are rated. No user rated everything. You want to fill the gaps (see article).

Computer Vision

The academic discipline which deals with how to gain high-level understanding from digital images or videos. Common tasks include image classifiction, semantic segmentation, detection and localization.

Curriculum learning

A method for pretraining. First optimize a smoothed objective and gradually consider less smoothing. So a curriculum is a sequence of training criteria. One might show gradually more difficult training examples. See Curriculum Learning by Benigo, Louradour, Collobert and Weston for details.

Curse of dimensionality

Various problems of high-dimensional spaces that do not occur in low-dimensional spaces. High-dimensional often means several 100 dimensions.

DCGAN (Deep Convolutional Generative Adverserial Networks)

TODO

DCIGN (Deep Convolutional Inverse Graphic Network)

TODO

DCNN (Doubly Convolutional Neural Network)

Introduced in this paper (summary). Note Some people also call Deep Convolutional Neural Networks DCNNs.

DNN

Deep Neural Network. The meaning of "deep" differs. Sometimes it means at least one hidden layer, sometimes it means at least 12 hidden layers.

Domain adaptation

A model is trained on dataset $A$. How does it have to be changed to work on dataset $B$?

Detection in Computer Vision (Object detection)

Object detection in an image is a computer vision task. The input is an image and the output is a list with rectangles which contain objects of the given type. Face detection is one well-studied example. A photo could contain no face or hundrets of them. The rectangles can overlap.

Deep Learning

Buzzword. The meaning depends on who you ask / in which year you asked. Sometimes it means multi-layer perceptrons with more than $N$ layers (some say $N=2$ is already deep learning, others want N>20 or nowadays $N>100$).

Discriminative Model

The model gives a conditional probability of the classes $k$, given the feature vector $x$: $P(k | x)$. This kind of model is often used for prediction.

FC7-Features

Features of an image which was run through a trained neural network. AlexNet called the last fully connected layer FC7. However, FC7 features are not necessarily created by AlexNet.

FMLLR

Feature-Space Maximum Likelihood Linear Regression

Feature Map

A feature map is the result of a single filter of a convolutional layer being applied. So it is the activation of that filter over the given input.

Fine-tuning

See pre-training

GMM

Gaussian Mixture Model

GEMM (GEneral Matrix to Matrix Multiplication)

General Matrix to Matrix Multiplication is the problem of calculating the result of $C = A \cdot B$ with $A \in \mathbb{R}^{n \times m}, B \in \mathbb{R}^{m \times k}, C \in \mathbb{R}^{n \times k}$.

Generative model

The model gives the relationship of variables: $P(x, y)$. This kind of model can be used for prediction, too.

Gradient Descent

An iterative optimization algorithm for differentiable functions.

HMM

Hidden Markov Model

i-vector

speaker identity vector. See Front-End Factor Analysis for Speaker Verification.

MANN

Memory-Augmented Neural Networks (see Blog post)

Machine Vision

Computer vision applied for industrial applications.

Matrix Completion

See collaborative filtering.

MLLR

Maximum Likelihood Linear Regression

MMD (Maximum Mean Descrepancy)

MMD is a measure of the difference between a distribution $P$ and a distribution $Q$: $$MMD(F, p, q) = sup_{f \in F} (\mathbb{E}_{x \sim p} [f(x)] - \mathbb{E}_{y \sim q} [f(y)])$$

Multi-Task learning

Train a model which does multiple tasks at the same time, e.g. segmentation and detection (see MultiNet).

NEAT

Neuroevolution of Augmenting Topologies (see Blogpost).

Object recognition

Classification on images. The task is to decide in which class a given image falls, judging by the content. This can be cat, dog, plane or similar.

One-Shot learning

Learn only with one or very few examples per class. See One-Shot Learning of Object Categories.

Optical Flow

Optical flow is defined for two images. It describes how the points in one image moved when switching to the second image.

PCA

Principal component analysis (short: PCA) is a linear transformation which projects $n$ points $\mathbf{x} \in \mathbb{R}^{n \times s}$ with $s$ features each on a hyperplane in such a way that the projection error is minimal. Hence it is an unsupervised method for feature reduction. It simply works by finding a matrix $P \in \mathbb{R}^{s \times m}$, where $m \leq s$ can be chosen as small as desired.

Pre-training

You have machine learning model $m$.
Pre-training: You have a dataset $A$ on which you train $m$.
You have a dataset $B$. Before you start training the model, you initialize some of the parameters of $m$ with the model which is trained on $A$.
Fine-tuning: You train $m$ on $B$.

Regularization

Regularization are techniques to make the fitted function smoother. This helps to prevent overfitting.
Examples: L1, L2, Dropout, Weight Decay in Neural Networks. Parameter $C$ in SVMs.

Reinforcement Learning

Reinforcment learning is a sub-field of machine learning, which focuses on the question how to find actions which lead to higher rewards. See German lecture notes.

Self-Learning

One form of semi-supervised learning, where you train an initial system on the labeled data, then label the unlabeled data where the classifier is 'sure enough'. After that, you train a new system on all data and re-label the unlabeled data. This is iterated.

Semi-supervised learning

Some training data has labels, but most has no labels.

Supervised learning

All training data has labels.

Spatial Pyramid Pooling (SPP)

SPP is the idea of dividing the image into a grid with a fixed number of cells and a variable size, depending on the input. Each cell computes one feature and hence leads to a fixed-size representation of a variable-sized input.
See paper and summary

TF-IDF

TF-IDF (short for Term frequency–inverse document frequency) is a measure that reflects how important a word is to a document in a collection or corpus.

Transductive learning

label unlabeled data (the aim here is NOT to find a hypothesis)

Unsupervised learning

No training data has labels.

VC-Dimension

A theoretical natural number assigned to any classifier. The higher the VC dimension of a classifier, the more situations it is able to capture (see longer explanation, german explanation).

VLAD

Vector of Locally Aligned Descriptors

VTLN

vocal tract length normalization

WRN

Wide residual network

Zero-Shot learning

Learning to predict classes, of which no example has been seen during training. For example, Flicker gets several new tags each day and they want to predict tags for new images. One idea is to use WordNet and ImageNet to generate a common embedding. This way, new words of WordNet could already have an embedding and thus new images categories could also automatically be classified the right way. See Zero-Shot Learning with Semantic Output Codes as well as this YouTube video.

Machine Learning Glossary

See also

Published

Category

Tags

Contact