The following is a list of short explanations of different terms in machine learning. The aim is to keep things simple and brief, not to explain the terms in full detail.
 Active Learning
 The algorithm gives a pattern and asks for a label.
 Backpropagation
 A clever implementation of gradient descent for neural networks.
 Bias
 Bias is a concept which describes a systematic error. A classifier with a high bias tends to give one answer more often, no matter what the input is. This concept is relatied to variance and well described with the images here.
 BLSTM, BiLSTM
 Bidirectional long shortterm memory (see paper and poster).
 CoTraining
 A form of semisupervised learning. Two independant classifiers are trained on different labeled datasets. The classifiers are applied to the unlabeled data. Data with high confidence will be added to the other classifiers data.
 Collaborative Filtering
 You have users and items which are rated. No user rated everything. You want to fill the gaps (see article).
 Computer Vision
 The academic discipline which deals with how to gain highlevel understanding from digital images or videos. Common tasks include image classifiction, semantic segmentation, detection and localization.
 Curriculum learning
 A method for pretraining. First optimize a smoothed objective and gradually consider less smoothing. So a curriculum is a sequence of training criteria. One might show gradually more difficult training examples. See Curriculum Learning by Benigo, Louradour, Collobert and Weston for details.
 Curse of dimensionality
 Various problems of highdimensional spaces that do not occur in lowdimensional spaces. Highdimensional often means several 100 dimensions.
 DCGAN (Deep Convolutional Generative Adverserial Networks)
 TODO
 DCIGN (Deep Convolutional Inverse Graphic Network)
 TODO
 DCNN (Doubly Convolutional Neural Network)
 Introduced in this paper (summary). Note Some people also call Deep Convolutional Neural Networks DCNNs.
 DNN
 Deep Neural Network. The meaning of "deep" differs. Sometimes it means at least one hidden layer, sometimes it means at least 12 hidden layers.
 Domain adaptation
 A model is trained on dataset $A$. How does it have to be changed to work on dataset $B$?
 Detection in Computer Vision (Object detection)
 Object detection in an image is a computer vision task. The input is an image and the output is a list with rectangles which contain objects of the given type. Face detection is one wellstudied example. A photo could contain no face or hundrets of them. The rectangles can overlap.
 Deep Learning
 Buzzword. The meaning depends on who you ask / in which year you asked. Sometimes it means multilayer perceptrons with more than $N$ layers (some say $N=2$ is already deep learning, others want N>20 or nowadays $N>100$).
 Discriminative Model
 The model gives a conditional probability of the classes $k$, given the feature vector $x$: $P(k  x)$. This kind of model is often used for prediction.
 FC7Features
 Features of an image which was run through a trained neural network. AlexNet called the last fully connected layer FC7. However, FC7 features are not necessarily created by AlexNet.
 FMLLR
 FeatureSpace Maximum Likelihood Linear Regression
 Feature Map
 A feature map is the result of a single filter of a convolutional layer being applied. So it is the activation of that filter over the given input.
 Finetuning
 See pretraining
 GMM
 Gaussian Mixture Model
 GEMM (GEneral Matrix to Matrix Multiplication)
 General Matrix to Matrix Multiplication is the problem of calculating the result of $C = A \cdot B$ with $A \in \mathbb{R}^{n \times m}, B \in \mathbb{R}^{m \times k}, C \in \mathbb{R}^{n \times k}$.
 Generative model
 The model gives the relationship of variables: $P(x, y)$. This kind of model can be used for prediction, too.
 Gradient Descent
 An iterative optimization algorithm for differentiable functions.
 HMM
 Hidden Markov Model
 ivector
 speaker identity vector. See FrontEnd Factor Analysis for Speaker Verification.
 MANN
 MemoryAugmented Neural Networks (see Blog post)
 Machine Vision
 Computer vision applied for industrial applications.
 Matrix Completion
 See collaborative filtering.
 MLLR
 Maximum Likelihood Linear Regression
 MMD (Maximum Mean Descrepancy)
 MMD is a measure of the difference between a distribution $P$ and a distribution $Q$: $$MMD(F, p, q) = sup_{f \in F} (\mathbb{E}_{x \sim p} [f(x)]  \mathbb{E}_{y \sim q} [f(y)])$$
 MultiTask learning
 Train a model which does multiple tasks at the same time, e.g. segmentation and detection (see MultiNet).
 NEAT
 Neuroevolution of Augmenting Topologies (see Blogpost).
 Object recognition
 Classification on images. The task is to decide in which class a given image falls, judging by the content. This can be cat, dog, plane or similar.
 OneShot learning
 Learn only with one or very few examples per class. See OneShot Learning of Object Categories.
 Optical Flow
 Optical flow is defined for two images. It describes how the points in one image moved when switching to the second image.
 PCA
 Principal component analysis (short: PCA) is a linear transformation which projects $n$ points $\mathbf{x} \in \mathbb{R}^{n \times s}$ with $s$ features each on a hyperplane in such a way that the projection error is minimal. Hence it is an unsupervised method for feature reduction. It simply works by finding a matrix $P \in \mathbb{R}^{s \times m}$, where $m \leq s$ can be chosen as small as desired.
 Pretraining

 You have machine learning model $m$.
 Pretraining: You have a dataset $A$ on which you train $m$.
 You have a dataset $B$. Before you start training the model, you initialize some of the parameters of $m$ with the model which is trained on $A$.
 Finetuning: You train $m$ on $B$.
 Regularization
 Regularization are techniques to make the fitted function smoother. This
helps to prevent overfitting.
Examples: L1, L2, Dropout, Weight Decay in Neural Networks. Parameter $C$ in SVMs.  Reinforcement Learning
 Reinforcment learning is a subfield of machine learning, which focuses on the question how to find actions which lead to higher rewards. See German lecture notes.
 SelfLearning
 One form of semisupervised learning, where you train an initial system on the labeled data, then label the unlabeled data where the classifier is 'sure enough'. After that, you train a new system on all data and relabel the unlabeled data. This is iterated.
 Semisupervised learning
 Some training data has labels, but most has no labels.
 Supervised learning
 All training data has labels.
 Spatial Pyramid Pooling (SPP)
 SPP is the idea of dividing the image into a grid with a fixed number
of cells and a variable size, depending on the input. Each cell computes
one feature and hence leads to a fixedsize representation of a variablesized
input.
See paper and summary  TFIDF
 TFIDF (short for Term frequency–inverse document frequency) is a measure that reflects how important a word is to a document in a collection or corpus.
 Transductive learning
 label unlabeled data (the aim here is NOT to find a hypothesis)
 Unsupervised learning
 No training data has labels.
 VCDimension
 A theoretical natural number assigned to any classifier. The higher the VC dimension of a classifier, the more situations it is able to capture (see longer explanation, german explanation).
 VLAD
 Vector of Locally Aligned Descriptors
 VTLN
 vocal tract length normalization
 WRN
 Wide residual network
 ZeroShot learning
 Learning to predict classes, of which no example has been seen during training. For example, Flicker gets several new tags each day and they want to predict tags for new images. One idea is to use WordNet and ImageNet to generate a common embedding. This way, new words of WordNet could already have an embedding and thus new images categories could also automatically be classified the right way. See ZeroShot Learning with Semantic Output Codes as well as this YouTube video.
See also
 Lectures:
 Wikipedia
 scholarpedia
 Other