Comparing Classifiers

Comparing Classifiers Classification problems occur quite often and many different classification algorithms have been described and implemented. But what is the best algorithm for a given error function and dataset? I read questions like “I have problem X. What is the best classifier?” quite often and my first impulse is always to write: Just try them! I guess people asking this question might think that it is super difficult to do so. However, the sklearn tutorial contains...

Function Approximation

Function Approximation I was recently quite disappointed by how bad neural networks are for function approximation (see How should a neural network for unbound function approximation be structured?). However, I’ve just found that Gaussian processes are great for function approximation! There are two important types of function approximation: Interpolation: What values does the function have in between of known values? Extrapolation: What values does the function have outsive of the known values? I did a couple of...

Using SVMs with sklearn

Using SVMs with sklearn Support Vector Machines (SVMs) is a group of powerful classifiers. In this article, I will give a short impression of how they work. I continue with an example how to use SVMs with sklearn. SVM theory SVMs can be described with 5 ideas in mind: Linear, binary classifiers: If data is linearly separable, it can be separated by a hyperplane. There is one hyperplane which maximizes the distance to the next datapoints (support vectors). This hyperplane...

Explaining Away

Explaining Away Explaining away is an effect where which is explained in Pearl (1988) with an example similar to the following one: A car’s engine can fail (\(X\)). The reason might either be a dead battery \(Y\) or a blocked fuel pump \(Z\). This results in the following Bayesian Network: A common effect Now assume you know that the engine does not fail (\(X=0\)). This guarantees that the battery is not dead (\(Y=0\)) and the fuel pump...

How to clear a USB stick

How to clear a USB stick Once in a while I think it is time to reduce the damage being done by the loss of a USB stick. USB sticks Remove all data Find the USB stick on your Linux system with fdisk -l. Make super sure that you really got the stick (e.g. by removing it and executing the command again). It should be something like /dev/sdb1. The following command will overwrite the data on the stick 5 times: $ shred...

Preparing for the New Year

Preparing for the New Year It’s new years eve and - as always - I try to finish some things and have some plans for next year. Review of 2015 The last year has had it ups and downs. Most of them are private, so I’ll not write too much about it. However, I would like to share two of them with you: I’ve been on my first summer school of the German National Academic Foundation (Sommerakademie der Studienstiftung, see...

Analyzing PyPI Data - 2

Analyzing PyPI Data - 2 This is part two of a series. See Analyzing PyPI Data for part one. I’ve recently got a request to expand my analysis of the Python Package Index commonly known as PyPI. It is a repository of Python packages where everybody can upload packages; pretty much without any restriction. In the article Analyzing PyPI Metadata you can read some general stuff about the repository. This article is going a bit more deeper. This time I...