Recent Posts

Data Applications

"Data is the new oil", "we need to be data driven", "we need to apply AI to keep being competitive" are some of the prashes I hear a lot. As I haven't seen yet a clear article pointing out what is done with the data ... here you are 🙂 Why it's … Read More »

Siamese Networks

Siamese Networks are feature extractors trained to learn an embedding in $\mathbb{R}^n$ where not the absolute output is important, but the relative one. Schema of a Siamese Network $m_1$. The original paper1 was about signature verification. You have one original signature and one that might be the … Read More »

WiLI-2018

WiLI-2018, the Wikipedia Language Identification database, is a collection of sentences from Wikipedia of different languages. It can be used to test how hard it is to distinguish different languages. If you want to get to the data, go to zenodo.org. If you want to get to the publication … Read More »

Expert Systems

This is an article I had for quite a while as a draft. As part of my yearly cleanup, I've published it without finishing it. It might not be finished or have other problems. Science fiction movies are full of advanced systems for medical analysis and treatment: Stargate SG1: The … Read More »

Techniques for Analyzing ML models

This is an article I had for quite a while as a draft. As part of my yearly cleanup, I've published it without finishing it. It might not be finished or have other problems. Techniques for model analysis: Prediction-Based: * Decision boundaries * LIME * Feature importance * SHAP values * Partial Dependence Plots * Sensitivity … Read More »

Code Challenges in ML

Having machines that can write software is the wet dreem of probably every company. Instead of having years of development you just tell the machine what to do and it automatically creates the software. As you might have guessed, we are not there yet. Not even close. But a friend … Read More »

Perfect Models

When you develop a model, you want the optimal model. The perfect one. The first problem with that desire are diagonal goals: Diagonal goals in model development Typical goals when designing a model are: Quality: Have a high accuracy, low error, high $F_\beta$ score, ... Production Inference speed: The faster … Read More »

Recommender Systems

I recently became interested in recommender systems. You know, the thing on Amazon that tells you which products you might be interested in. Or the stuff on Spotify that gives you a song you might like. On YouTube the next videos shown. On StumbleUpon, your next stumble. On a news … Read More »

Regression

A while ago, this link pointed to the content which is now in the Forecasting article. Regression is one of the core tasks in machine learning. In this task, you get some input and your target variable is a single floating point number. For example, predicting the price of a … Read More »

Evaluation of binary classifiers

Binary classification is likely the simplest task in machine learning. It is typically solved with Random Forests, Neural Networks, SVMs or a naive Bayes classifier. For all of them, you have to measure how well you are doing. In this article, I give an overview over the different metrics for … Read More »