Image Classification

Image classification is the following task: You have an image and you want to assign it one label. The set of possible labels is finite and typically not bigger than 1000.

So for example, you might ask: What can you see in this image?

It is one of the most common and probably simplest tasks in the intersection of machine learning and computer vision. A commonly used dataset is ImageNet, which consists of exactly 1000 classes and has more than 1 000 000 training samples. To be exact, it is the ImageNet Large Scale Visual Recognition Challenge (ILSVRC).

However, I miss easy to use examples. So here you are.

Prerequisites

Tensorflow
- CUDA
- CuDNN
Keras

Code

The following code is taken from Keras / François Chollet. Full credit to him for doing the difficult work.

The code defines one of the state of the art models, a so called ResNet. See Deep Residual Learning for Image Recognition for details. Then it downloads the weights, stores them for subsequent uses and applies it to the data.

#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""ResNet50 model for Keras."""
from __future__ import print_function

import numpy as np
import json
import os
import time

from keras import backend as K
from keras.preprocessing import image
from keras.applications import ResNet50
from keras.utils.data_utils import get_file

CLASS_INDEX = None
CLASS_INDEX_PATH = (
    "https://s3.amazonaws.com/deep-learning-models/"
    "image-models/imagenet_class_index.json"
)


def preprocess_input(x, dim_ordering="default"):
    """
    Standard preprocessing of image data.

    1. Make sure the order of the channels is correct (RGB, BGR, depending on
       the backend)
    2. Mean subtraction by channel.

    Parameters
    ----------
    x : numpy array
        The image
    dim_ordering : string, optional (default: 'default')
        Either 'th' for Theano or 'tf' for Tensorflow

    Returns
    -------
    numpy array
        The preprocessed image
    """
    if dim_ordering == "default":
        dim_ordering = K.image_dim_ordering()
    assert dim_ordering in {"tf", "th"}

    if dim_ordering == "th":
        x[:, 0, :, :] -= 103.939
        x[:, 1, :, :] -= 116.779
        x[:, 2, :, :] -= 123.68
        # 'RGB'->'BGR'
        x = x[:, ::-1, :, :]
    else:
        x[:, :, :, 0] -= 103.939
        x[:, :, :, 1] -= 116.779
        x[:, :, :, 2] -= 123.68
        # 'RGB'->'BGR'
        x = x[:, :, :, ::-1]
    return x


def decode_predictions(preds, top=5):
    """
    Decode the predictionso of the ImageNet trained network.

    Parameters
    ----------
    preds : numpy array
    top : int
        How many predictions to return

    Returns
    -------
    list of tuples
        e.g. (u'n02206856', u'bee', 0.71072823) for the WordNet identifier,
        the class name and the probability.
    """
    global CLASS_INDEX
    if len(preds.shape) != 2 or preds.shape[1] != 1000:
        raise ValueError(
            "`decode_predictions` expects "
            "a batch of predictions "
            "(i.e. a 2D array of shape (samples, 1000)). "
            "Found array with shape: " + str(preds.shape)
        )
    if CLASS_INDEX is None:
        fpath = get_file(
            "imagenet_class_index.json", CLASS_INDEX_PATH, cache_subdir="models"
        )
        CLASS_INDEX = json.load(open(fpath))
    results = []
    for pred in preds:
        top_indices = pred.argsort()[-top:][::-1]
        result = [tuple(CLASS_INDEX[str(i)]) + (pred[i],) for i in top_indices]
        results.append(result)
    return results


def is_valid_file(parser, arg):
    """
    Check if arg is a valid file that already exists on the file system.

    Parameters
    ----------
    parser : argparse object
    arg : str

    Returns
    -------
    arg
    """
    arg = os.path.abspath(arg)
    if not os.path.exists(arg):
        parser.error("The file %s does not exist!" % arg)
    else:
        return arg


def get_parser():
    """Get parser object."""
    from argparse import ArgumentParser, ArgumentDefaultsHelpFormatter

    parser = ArgumentParser(
        description=__doc__, formatter_class=ArgumentDefaultsHelpFormatter
    )
    parser.add_argument(
        "-f",
        "--file",
        dest="filename",
        type=lambda x: is_valid_file(parser, x),
        help="Classify image",
        metavar="IMAGE",
        required=True,
    )
    return parser


if __name__ == "__main__":
    args = get_parser().parse_args()

    # Load model
    model = ResNet50(include_top=True, weights="imagenet")

    img_path = args.filename
    img = image.load_img(img_path, target_size=(224, 224))
    x = image.img_to_array(img)
    x = np.expand_dims(x, axis=0)
    x = preprocess_input(x)
    print("Input image shape:", x.shape)
    t0 = time.time()
    preds = model.predict(x)
    t1 = time.time()
    print("Prediction time: {:0.3f}s".format(t1 - t0))
    for wordnet_id, class_name, prob in decode_predictions(preds)[0]:
        print(
            "{wid}\t{prob:>6}%\t{name}".format(
                wid=wordnet_id, name=class_name, prob="%0.2f" % (prob * 100)
            )
        )

Store it as resnet50.py and make it executable.

(In case the JSON becomes unavailable: Here you are)

How to use

$ ./resnet50.py -f honey-bee.jpg

alternatively, if you have a GPU but not that much memory:

$ CUDA_VISIBLE_DEVICES="" ./resnet50.py -f honey-bee.jpg

If you apply this to the jellyfish image from above, you get:

Input image shape: (1, 224, 224, 3)
n01910747    100.00%    jellyfish
n01496331      0.00%    electric_ray
n10565667      0.00%    scuba_diver
n01914609      0.00%    sea_anemone
n02607072      0.00%    anemone_fish

This takes about 6 seconds on CPU on my laptop.

Alternative Models

If you are building an application, you might want to look into alternatives:

Modelname	Model size	Input Size	Top1-Accuracy	Top5-Accuracy	Time
ResNet50	102.9 MB	224 × 224	77.15%	93.29%	0.495s
VGG16	553.5 MB	224 × 224	73.0%	91.2%	0.488s
InceptionV3	95.1 MB	299 × 299	78.8%	94.4%	0.681s
Xception	91.9 MB	299 × 299	79.0%	94.5%	0.761s

The speed only for the prediction. The model size is several 100 MB, so this takes a while. In a real application you can (1) load the model only once and (2) run the evaluation on a batch of many images to speed things up.

More models:

titu1994/Inception-v4

Image Classification

Prerequisites

Code

How to use

Alternative Models

See also

Published

Category

Tags

Contact