• Martin Thoma
  • Home
  • Categories
  • Tags
  • Archives
  • Support me

How to download ImageNet

Contents

  • Datasets

Machine Learning algorithms for computer vision need huge amounts of data. Here are a few remarks on how to download them.

  1. Make sure you have enough space (df -h)
  2. Get a download manager. I use aria2c (sudo apt-get install aria2)

For ImageNet, you have to register at image-net.org.

Download the files like this:

$ aria2c -s 16 [URL]

After downloading the file, use

$ md5sum [Filename]

and compare the hash with the provided hash. If it differs, download the file again.

The ImageNet training data tar file contains 1000 files of the form n01440764.tar, n01443537.tar, ...

Each of those files contains JPEGs of one class. You can look the class label up with

from nltk.corpus import wordnet as wn

print(wn._synset_from_pos_and_offset("n", 1440764))
print(wn._synset_from_pos_and_offset("n", 1443537))

which reveals

  • Synset('tench.n.01')
  • Synset('goldfish.n.01')

If you extract all 1000 of those tar files into one directory, this takes about 6 hours with a script like this:

#!/usr/bin/env python

import glob
import tarfile


def untar(fname, targetd_dir):
    with tarfile.open(fname) as tar:
        tar.extractall(path=targetd_dir)


files = glob.glob("ILSVRC2012_img_train/*.tar")

for f in files:
    untar(f, "extracted")

This gives 1281170 files in total.

Datasets

  • Download ImageNet
  • Download Places365

Published

Mär 6, 2017
by Martin Thoma

Category

Machine Learning

Tags

  • download 2
  • machine learning 81

Contact

  • Martin Thoma - A blog about Code, the Web and Cyberculture
  • E-mail subscription
  • RSS-Feed
  • Privacy/Datenschutzerklärung
  • Impressum
  • Powered by Pelican. Theme: Elegant by Talha Mansoor