How to download ImageNet

Machine Learning algorithms for computer vision need huge amounts of data. Here are a few remarks on how to download them.

Make sure you have enough space (df -h)
Get a download manager. I use aria2c (sudo apt-get install aria2)

For ImageNet, you have to register at image-net.org.

Download the files like this:

$ aria2c -s 16 [URL]

After downloading the file, use

$ md5sum [Filename]

and compare the hash with the provided hash. If it differs, download the file again.

The ImageNet training data tar file contains 1000 files of the form n01440764.tar, n01443537.tar, ...

Each of those files contains JPEGs of one class. You can look the class label up with

from nltk.corpus import wordnet as wn

print(wn._synset_from_pos_and_offset("n", 1440764))
print(wn._synset_from_pos_and_offset("n", 1443537))

which reveals

Synset('tench.n.01')
Synset('goldfish.n.01')

If you extract all 1000 of those tar files into one directory, this takes about 6 hours with a script like this:

#!/usr/bin/env python

import glob
import tarfile


def untar(fname, targetd_dir):
    with tarfile.open(fname) as tar:
        tar.extractall(path=targetd_dir)


files = glob.glob("ILSVRC2012_img_train/*.tar")

for f in files:
    untar(f, "extracted")

This gives 1281170 files in total.

How to download ImageNet

Datasets

Published

Category

Tags

Contact