I'm recently thinking a lot about recommendations and about building the book recommendation portal I had in mind since 2013.
However, for recommendation systems it is as hard as with any branch of machine learning to find a good overview over techniques, their respective strengths and drawbacks as well as hard performance measures.
So let's get started.
The Data
The Movielens 20M contains 20 million movie ratings. They were created by 138,000 users for 27,000 movies.
The data looks like this:
userId movieId rating timestamp
0 1 2 3.5 1112486027
1 1 29 3.5 1112484676
2 1 32 3.5 1112484819
3 1 47 3.5 1112484727
4 1 50 3.5 1112484580
5 1 112 3.5 1094785740
6 1 151 4.0 1094785734
7 1 223 4.0 1112485573
8 1 253 4.0 1112484940
9 1 260 4.0 1112484826
10 1 293 4.0 1112484703
There is genres and tags as well.
The Evaluation
The task is to predict the ratings. To do so, the data gets sorted by timestamp. A 50% train data and 50% test data split is done. On the test data, the mean average error (MAE) is calculated. Lower is better. The results have to be given with exactly three decimal places.
Baselines
All of the following evaluations took roughly 43s on my Thinkpad T460p. The memory consumption of all of them is not relevant.
Name | MAE | MSE | Comment |
---|---|---|---|
Constant 1 | 2.422 | 6.939 | I don't expect this to be awesome, but it should be better than MAE of 5. |
Constant 5 | 1.603 | 3.761 | Together with Constant 5, this gives the range in which all recommenders will be. |
Constant 2.5 | 1.217 | 1.996 | Predicting the middle is the best if you have absolute no prior knowledge and MAE. |
Median User Rating | 0.733 | 1.112 | Every user in the test set was also in the training set! |
Median Movie Rating | 0.723 | 1.061 | For known movies, predict their median value. For unknown ones, predict the median of all medians of movie ratings. |
User-adjusted movie rating | 0.825 | 1.042 | Use the Median movie rating, but add user bias |
Code
#!/usr/bin/env python
"""Analyze the quality of recommendations."""
# 3rd party modules
from sklearn.base import BaseEstimator
from sklearn.model_selection import train_test_split
import click
import pandas as pd
def load_data(rating_filepath="ratings.csv"):
"""Load extracted movie lense data."""
nrows = None
df = pd.read_csv(rating_filepath, nrows=nrows)
df["rating"] = df["rating"].astype("int16")
df = df.sort_values(by="timestamp")
df_x = df[["timestamp", "userId", "movieId"]]
df_y = df[["rating"]]
df_train_x, df_test_x, df_train_y, df_test_y = train_test_split(df_x, df_y)
return {
"train": {"x": df_train_x, "y": df_train_y},
"test": {"x": df_test_x, "y": df_test_y},
}
class BaselineRecommender(BaseEstimator):
"""Create a baseline recommender."""
def __init__(self, strategy="constant", constant=2.5):
self.strategy = strategy
if constant is not None and strategy != "constant":
raise RuntimeError(
"constant is only meaningful in the constant " "strategy."
)
self.constant = constant
def fit(self, df_x, df_y):
"""Fit the recommender on movielens data."""
df = df_x.join(df_y)
self.median_by_user = (
df.groupby(by="userId").aggregate({"rating": "median"})["rating"].to_dict()
)
self.median_by_movie = (
df.groupby(by="movieId").aggregate({"rating": "median"})["rating"].to_dict()
)
self.avg_movie = sum(self.median_by_movie.values()) / len(self.median_by_movie)
self.avg_user = sum(self.median_by_user.values()) / len(self.median_by_user)
def predict(self, df_x):
"""Fit ratings for user/movie combinations."""
results = []
for entry in df_x.to_dict("records"):
if self.strategy == "constant":
prediction = self.constant
elif self.strategy == "movie_median":
movie = entry["movieId"]
prediction = self.median_by_movie.get(movie, self.avg_movie)
elif self.strategy == "user_median":
user = entry["userId"]
prediction = self.median_by_user[user]
elif self.strategy == "user_ajdust_movie_median":
movie = entry["movieId"]
movie_median = self.median_by_movie.get(movie, self.avg_movie)
user = entry["userId"]
user_bias = self.median_by_user[user] - self.avg_user
prediction = movie_median + user_bias
else:
raise NotImplemented()
results.append(prediction)
return results
def evaluate(true_ratings, predicted_ratings, func="mae"):
"""Evaluate the results of a rating prediction."""
assert len(true_ratings) == len(predicted_ratings)
if func == "mae":
absolute_errors = sum(
abs(a - b) for a, b in zip(true_ratings, predicted_ratings)
)
mae = absolute_errors / len(true_ratings)
val = mae
elif func == "mse":
sq_errors = sum((a - b) ** 2 for a, b in zip(true_ratings, predicted_ratings))
val = sq_errors / len(true_ratings)
return val
@click.command()
@click.option(
"--strategy",
default="constant",
type=click.Choice(
["constant", "movie_median", "user_median", "user_ajdust_movie_median"]
),
)
@click.option("--constant", default=None, type=float)
def main(strategy, constant):
"""Analyze recommenders on the Movielens 20M dataset."""
data = load_data()
m = BaselineRecommender(strategy=strategy, constant=constant)
m.fit(data["train"]["x"], data["train"]["y"])
y_pred = m.predict(data["test"]["x"])
mae = evaluate(data["test"]["y"]["rating"], y_pred, func="mae")
mse = evaluate(data["test"]["y"]["rating"], y_pred, func="mse")
print("MAE of baseline: {:0.3f}".format(mae))
print("MSE of baseline: {:0.3f}".format(mse))
if __name__ == "__main__":
main()
Problems
- Ratings instead of Order: For applications, we are not interested in the right rating but getting the order right. So a constant bias for a user is fine. MAE does not capture that fact.
Publications
- Prateek Sappadla, Yash Sadhwani, Pranit Arora: Movie Recommender System: They claim to have reached MSE=0.65 with matrix factorization and 0.70 with k-nearest users.
- Shuyu Luo: Introduction to Recommender System, 2018.