Today, I played a bit with the International airline passengers dataset.

It's only feature is a date given by year and month. One should predict the number of passengers (in thousands).

## Code

See Github.

## Scoring functions

In the following, \(y\) is the ordered list of target, \(y_P\) is the list of predictions in the same order and \bar{y} is the mean of \(y\).

Name | Image | X is better | Definition and Usage |
---|---|---|---|

MAE | $[0, \infty)$ | lower | $f(y, y_P) = \frac{1}{|y|} \sum_{y_i, y_i^{P} \in (y, y^{P})} |y_i - y_i^P}|$ |

MSE | $[0, \infty)$ | lower | $f(y, y_P) = \frac{1}{|y|} \sum_{y_i, y_i^P \in (y, y^P)} (y_i - y_i^P)^2$ |

$R^2$ | $[0, 1]$ | higher | $f(y, y_P) = 1 - \frac{\sum (y_i - y_i^P)^2}{\sum (y_i - \bar{y})^2}$ |

Explained Variance | $(-\infty, 1]$ | higher | $f(y, y_P) = 1 - \frac{Var(y - y_P)}{Var(y)}$ |

See also:

## Results

name | training time | testing time | MAE | MAD | $R^2$ | explained variance | MSE |
---|---|---|---|---|---|---|---|

GradientBoostingRegressor | 11.6ms | 0.1ms | 40.0 | 31.0 | 0.5689 | 0.6246 | 2631.9 |

GaussianProcessRegressor | 8.4ms | 0.2ms | 150.9 | 85.9 | -8.0324 | -6.7089 | 55138.3 |

AdaBoostRegressor | 69.3ms | 1.1ms | 60.1 | 53.3 | 0.1240 | 0.5812 | 5347.3 |

SGDRegressor | 0.8ms | 0.1ms | 106.5 | 84.8 | -1.7081 | 0.1471 | 16531.5 |

RANSACRegressor | 4.1ms | 0.0ms | 68.5 | 39.4 | -0.4294 | 0.1479 | 8726.0 |

PassiveAggressiveRegressor | 0.2ms | 0.0ms | 115.8 | 115.5 | -1.8841 | 0.1268 | 17606.2 |

BaggingRegressor | 13.4ms | 0.9ms | 46.7 | 37.1 | 0.4162 | 0.4912 | 3564.0 |

HuberRegressor | 8.0ms | 0.0ms | 65.4 | 59.8 | -0.0745 | 0.0395 | 6559.5 |

RandomForestRegressor | 18.6ms | 5.2ms | 48.3 | 38.5 | 0.4336 | 0.6535 | 3457.7 |

ExtraTreesRegressor | 17.2ms | 4.7ms | 44.1 | 33.2 | 0.4744 | 0.5036 | 3208.6 |

SVR | 2.1ms | 0.3ms | 202.2 | 182.0 | -6.6885 | 0.0105 | 46934.4 |

Linear SVR + Standardscaler | 1.4ms | 0.2ms | 84.1 | 60.6 | -0.9573 | 0.1338 | 11948.3 |

ElasticNet | 0.3ms | 0.0ms | 56.0 | 41.7 | 0.1069 | 0.1698 | 5452.0 |

Lasso | 0.4ms | 0.0ms | 56.0 | 41.5 | 0.1069 | 0.1698 | 5451.8 |

I like the median absolute error best, because it tells me how many passengers my prediction is typically away from the true prediction.

Please note that bad results here do not mean the regressor is bad. Some might just use their full potential with more data, some might be better suited to different regression problems, e.g. interpolation instead of extrapolation or working with higher dimensional data.

## Things I learned

- SGDRegressor without scaling is crazy bad.
- Without hyperparameter optimization (e.g. twiddling), SVR is shitty.
- Without scaling, linear SVR runs many hours for training. On a dataset with 3 features and less than 150 data points.