• Martin Thoma
  • Home
  • Categories
  • Tags
  • Archives
  • Support me

Image Classification

Contents

  • Prerequisites
  • Code
  • How to use
  • Alternative Models
  • See also

Image classification is the following task: You have an image and you want to assign it one label. The set of possible labels is finite and typically not bigger than 1000.

So for example, you might ask: What can you see in this image?

A jellyfish
A jellyfish

It is one of the most common and probably simplest tasks in the intersection of machine learning and computer vision. A commonly used dataset is ImageNet, which consists of exactly 1000 classes and has more than 1 000 000 training samples. To be exact, it is the ImageNet Large Scale Visual Recognition Challenge (ILSVRC).

However, I miss easy to use examples. So here you are.

Prerequisites

  • Tensorflow
    • CUDA
    • CuDNN
  • Keras

Code

The following code is taken from Keras / François Chollet. Full credit to him for doing the difficult work.

The code defines one of the state of the art models, a so called ResNet. See Deep Residual Learning for Image Recognition for details. Then it downloads the weights, stores them for subsequent uses and applies it to the data.

#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""ResNet50 model for Keras."""
from __future__ import print_function

import numpy as np
import json
import os
import time

from keras import backend as K
from keras.preprocessing import image
from keras.applications import ResNet50
from keras.utils.data_utils import get_file

CLASS_INDEX = None
CLASS_INDEX_PATH = (
    "https://s3.amazonaws.com/deep-learning-models/"
    "image-models/imagenet_class_index.json"
)


def preprocess_input(x, dim_ordering="default"):
    """
    Standard preprocessing of image data.

    1. Make sure the order of the channels is correct (RGB, BGR, depending on
       the backend)
    2. Mean subtraction by channel.

    Parameters
    ----------
    x : numpy array
        The image
    dim_ordering : string, optional (default: 'default')
        Either 'th' for Theano or 'tf' for Tensorflow

    Returns
    -------
    numpy array
        The preprocessed image
    """
    if dim_ordering == "default":
        dim_ordering = K.image_dim_ordering()
    assert dim_ordering in {"tf", "th"}

    if dim_ordering == "th":
        x[:, 0, :, :] -= 103.939
        x[:, 1, :, :] -= 116.779
        x[:, 2, :, :] -= 123.68
        # 'RGB'->'BGR'
        x = x[:, ::-1, :, :]
    else:
        x[:, :, :, 0] -= 103.939
        x[:, :, :, 1] -= 116.779
        x[:, :, :, 2] -= 123.68
        # 'RGB'->'BGR'
        x = x[:, :, :, ::-1]
    return x


def decode_predictions(preds, top=5):
    """
    Decode the predictionso of the ImageNet trained network.

    Parameters
    ----------
    preds : numpy array
    top : int
        How many predictions to return

    Returns
    -------
    list of tuples
        e.g. (u'n02206856', u'bee', 0.71072823) for the WordNet identifier,
        the class name and the probability.
    """
    global CLASS_INDEX
    if len(preds.shape) != 2 or preds.shape[1] != 1000:
        raise ValueError(
            "`decode_predictions` expects "
            "a batch of predictions "
            "(i.e. a 2D array of shape (samples, 1000)). "
            "Found array with shape: " + str(preds.shape)
        )
    if CLASS_INDEX is None:
        fpath = get_file(
            "imagenet_class_index.json", CLASS_INDEX_PATH, cache_subdir="models"
        )
        CLASS_INDEX = json.load(open(fpath))
    results = []
    for pred in preds:
        top_indices = pred.argsort()[-top:][::-1]
        result = [tuple(CLASS_INDEX[str(i)]) + (pred[i],) for i in top_indices]
        results.append(result)
    return results


def is_valid_file(parser, arg):
    """
    Check if arg is a valid file that already exists on the file system.

    Parameters
    ----------
    parser : argparse object
    arg : str

    Returns
    -------
    arg
    """
    arg = os.path.abspath(arg)
    if not os.path.exists(arg):
        parser.error("The file %s does not exist!" % arg)
    else:
        return arg


def get_parser():
    """Get parser object."""
    from argparse import ArgumentParser, ArgumentDefaultsHelpFormatter

    parser = ArgumentParser(
        description=__doc__, formatter_class=ArgumentDefaultsHelpFormatter
    )
    parser.add_argument(
        "-f",
        "--file",
        dest="filename",
        type=lambda x: is_valid_file(parser, x),
        help="Classify image",
        metavar="IMAGE",
        required=True,
    )
    return parser


if __name__ == "__main__":
    args = get_parser().parse_args()

    # Load model
    model = ResNet50(include_top=True, weights="imagenet")

    img_path = args.filename
    img = image.load_img(img_path, target_size=(224, 224))
    x = image.img_to_array(img)
    x = np.expand_dims(x, axis=0)
    x = preprocess_input(x)
    print("Input image shape:", x.shape)
    t0 = time.time()
    preds = model.predict(x)
    t1 = time.time()
    print("Prediction time: {:0.3f}s".format(t1 - t0))
    for wordnet_id, class_name, prob in decode_predictions(preds)[0]:
        print(
            "{wid}\t{prob:>6}%\t{name}".format(
                wid=wordnet_id, name=class_name, prob="%0.2f" % (prob * 100)
            )
        )

Store it as resnet50.py and make it executable.

(In case the JSON becomes unavailable: Here you are)

How to use

$ ./resnet50.py -f honey-bee.jpg

alternatively, if you have a GPU but not that much memory:

$ CUDA_VISIBLE_DEVICES="" ./resnet50.py -f honey-bee.jpg

If you apply this to the jellyfish image from above, you get:

Input image shape: (1, 224, 224, 3)
n01910747    100.00%    jellyfish
n01496331      0.00%    electric_ray
n10565667      0.00%    scuba_diver
n01914609      0.00%    sea_anemone
n02607072      0.00%    anemone_fish

This takes about 6 seconds on CPU on my laptop.

Alternative Models

If you are building an application, you might want to look into alternatives:

Modelname Model size Input Size Top1-Accuracy Top5-Accuracy Time
ResNet50 102.9 MB 224 × 224 77.15% 93.29% 0.495s
VGG16 553.5 MB 224 × 224 73.0% 91.2% 0.488s
InceptionV3 95.1 MB 299 × 299 78.8% 94.4% 0.681s
Xception 91.9 MB 299 × 299 79.0% 94.5% 0.761s
The speed only for the prediction. The model size is several 100 MB, so this takes a while. In a real application you can (1) load the model only once and (2) run the evaluation on a batch of many images to speed things up.

More models:

  • titu1994/Inception-v4

See also

  • Building powerful image classification models using very little data

Published

Mär 15, 2017
by Martin Thoma

Category

Machine Learning

Tags

  • Computer Vision 6
  • ImageNet 1
  • machine learning 81

Contact

  • Martin Thoma - A blog about Code, the Web and Cyberculture
  • E-mail subscription
  • RSS-Feed
  • Privacy/Datenschutzerklärung
  • Impressum
  • Powered by Pelican. Theme: Elegant by Talha Mansoor