The following is a list of short explanations of different terms in machine learning. The aim is to keep things simple and brief, not to explain the terms in full detail.

- Active Learning
- The algorithm gives a pattern and asks for a label.
- Backpropagation
- A clever implementation of gradient descent for neural networks.
- BLSTM, BiLSTM
- Bidirectional long short-term memory (see paper and poster).
- Co-Training
- A form of semi-supervised learning. Two independant classifiers are trained on different labeled datasets. The classifiers are applied to the unlabeled data. Data with high confidence will be added to the other classifiers data.
- Computer Vision
- You have users and items which are rated. No user rated everything. You want to fill the gaps (see article).
- Computer Vision
- The academic discipline which deals with how to gain high-level understanding from digital images or videos. Common tasks include image classifiction, semantic segmentation, detection and localization.
- Curriculum learning
- A method for pretraining. First optimize a smoothed objective and gradually consider less smoothing. So a curriculum is a sequence of training criteria. One might show gradually more difficult training examples. See Curriculum Learning by Benigo, Louradour, Collobert and Weston for details.
- Curse of dimensionality
- Various problems of high-dimensional spaces that do not occur in low-dimensional spaces. High-dimensional often means several 100 dimensions. See also: Average Distance of Points
- DCGAN (Deep Convolutional Generative Adverserial Networks)
- TODO
- DCIGN (Deep Convolutional Inverse Graphic Network)
- TODO
- DCNN (Doubly Convolutional Neural Network)
- Introduced in this paper (summary).
**Note**Some people also call*Deep Convolutional Neural Networks*DCNNs. - DNN
- Deep Neural Network. The meaning of "deep" differs. Sometimes it means at least one hidden layer, sometimes it means at least 12 hidden layers.
- Domain adaptation
- A model is trained on dataset $A$. How does it have to be changed to work on dataset $B$?
- Detection in Computer Vision (Object detection)
- Object detection in an image is a computer vision task. The input is an image and the output is a list with rectangles which contain objects of the given type. Face detection is one well-studied example. A photo could contain no face or hundrets of them. The rectangles can overlap.
- Deep Learning
- Buzzword. The meaning depends on who you ask / in which year you asked. Sometimes it means multi-layer perceptrons with more than $N$ layers (some say $N=2$ is already deep learning, others want N>20 or nowadays $N>100$).
- Discriminative Model
- The model gives a conditional probability of the classes $k$, given the feature vector $x$: $P(k | x)$. This kind of model is often used for prediction.
- FC7-Features
- Features of an image which was run through a trained neural network. AlexNet called the last fully connected layer FC7. However, FC7 features are not necessarily created by AlexNet.
- FMLLR
- Feature-Space Maximum Likelihood Linear Regression
- Feature Map
- A feature map is the result of a single filter of a convolutional layer being applied. So it is the activation of that filter over the given input.
- GMM
- Gaussian Mixture Model
- GEMM (GEneral Matrix to Matrix Multiplication)
- General Matrix to Matrix Multiplication is the problem of calculating the result of $C = A \cdot B$ with $A \in \mathbb{R}^{n \times m}, B \in \mathbb{R}^{m \times k}, C \in \mathbb{R}^{n \times k}$.
- Generative model
- The model gives the relationship of variables: $P(x, y)$. This kind of model can be used for prediction, too.
- Gradient Descent
- An iterative optimization algorithm for differentiable functions.
- HMM
- Hidden Markov Model
- i-vector
- speaker identity vector. See Front-End Factor Analysis for Speaker Verification.
- Machine Vision
- Computer vision applied for industrial applications.
- Matrix Completion
- See collaborative filtering.
- MLLR
- Maximum Likelihood Linear Regression
- MMD (Maximum Mean Descrepancy)
- MMD is a measure of the difference between a distribution $P$ and a distribution $Q$: $$MMD(F, p, q) = sup_{f \in F} (\mathbb{E}_{x \sim p} [f(x)] - \mathbb{E}_{y \sim q} [f(y)])$$
- Multi-Task learning
- Train a model which does multiple tasks at the same time, e.g. segmentation and detection (see MultiNet).
- Object recognition
- Classification on images. The task is to decide in which class a given image falls, judging by the content. This can be cat, dog, plane or similar.
- One-Shot learning
- Learn only with one or very few examples per class. See One-Shot Learning of Object Categories.
- Optical Flow
- Optical flow is defined for two images. It describes how the points in one image moved when switching to the second image.
- PCA
- Principal component analysis (short: PCA) is a linear transformation which projects $n$ points $\mathbf{x} \in \mathbb{R}^{n \times s}$ with $s$ features each on a hyperplane in such a way that the projection error is minimal. Hence it is an unsupervised method for feature reduction. It simply works by finding a matrix $P \in \mathbb{R}^{s \times m}$, where $m \leq s$ can be chosen as small as desired.
- Regularization
- Regularization are techniques to make the fitted function smoother. This
helps to prevent overfitting.

Examples: L1, L2, Dropout, Weight Decay in Neural Networks. Parameter $C$ in SVMs. - Reinforcement Learning
- Reinforcment learning is a sub-field of machine learning, which focuses on the question how to find actions which lead to higher rewards. See German lecture notes.
- Self-Learning
- One form of semi-supervised learning, where you train an initial system on the labeled data, then label the unlabeled data where the classifier is 'sure enough'. After that, you train a new system on all data and re-label the unlabeled data. This is iterated.
- Semi-supervised learning
- Some training data has labels, but most has no labels.
- Supervised learning
- All training data has labels.
- Spatial Pyramid Pooling (SPP)
- SPP is the idea of dividing the image into a grid with a fixed number
of cells and a variable size, depending on the input. Each cell computes
one feature and hence leads to a fixed-size representation of a variable-sized
input.

See paper and summary - TF-IDF
- TF-IDF (short for Term frequency–inverse document frequency) is a measure that reflects how important a word is to a document in a collection or corpus.
- Transductive learning
- label unlabeled data (the aim here is NOT to find a hypothesis)
- Unsupervised learning
- No training data has labels.
- VC-Dimension
- A theoretical natural number assigned to any classifier. The higher the VC dimension of a classifier, the more situations it is able to capture (see longer explanation, german explanation).
- VTLN
- vocal tract length normalization
- Zero-Shot learning
- Learning to predict classes, of which no example has been seen during training. For example, Flicker gets several new tags each day and they want to predict tags for new images. One idea is to use WordNet and ImageNet to generate a common embedding. This way, new words of WordNet could already have an embedding and thus new images categories could also automatically be classified the right way. See Zero-Shot Learning with Semantic Output Codes as well as this YouTube video.

## See also

- Lectures:
- Wikipedia
- scholarpedia
- Other