Omniglot Dataset

Estimated reading: 2 minutes 156 views

Visualization of the Omniglot Dataset on the Deep Lake UI

Omniglot Dataset

What is Omniglot Dataset?

The Omniglot dataset is created with the goal of creating learning algorithms that are more human-like. It includes 1623 handwritten characters from 50 different alphabets. Each of the 1623 characters was created by 20 individuals using Amazon’s Mechanical Turk service. Each image is accompanied by stroke data, which consists of a series of [x,y,t] coordinates separated by time (t) in milliseconds. The dataset is split into a background set of 30 alphabets and an evaluation set of 20 alphabets.

Download Omniglot Dataset in Python

Instead of downloading the Omniglot dataset in Python, you can effortlessly load it in Python via our Deep Lake open-source with just one line of code.

Load Omniglot Dataset Training Subset in Python

				
					import deeplake
ds = deeplake.load('hub://activeloop/omniglot-images-strokes-train')
				
			

Load Omniglot Dataset Validation Subset in Python

				
					import deeplake
ds = deeplake.load('hub://activeloop/omniglot-images-strokes-val')
				
			

Omniglot Dataset Structure

Omniglot Data Fields
  • image: tensor that contains image of size 105×105.
  • alphabet: tensor that contains different alphabets.
  • character_in_alphabet: tensor that contains characters in the alphabets.
  • penstroke: tensor that contains stroke data, a sequences of [x,y,t] coordinates with time (t) in milliseconds beginning with “START” and Breaks between pen strokes are denoted as “BREAK” (indicating a pen up action).
Omniglot Data Splits

How to use Omniglot Dataset with PyTorch and TensorFlow in Python

Train a model on Omniglot dataset with PyTorch in Python

Let’s use Deep Lake built-in PyTorch one-line dataloader to connect the data to the compute:

				
					dataloader = ds.pytorch(num_workers=0, batch_size=4, shuffle=False)
				
			
Train a model on Omniglot dataset with TensorFlow in Python
				
					dataloader = ds.tensorflow()
				
			

Additional Information about Omniglot Dataset

Omniglot Dataset Description

Omniglot Dataset Curators

Brenden M. Lake, Ruslan Salakhutdinov, Joshua B. Tenenbaum

Omniglot Dataset Licensing Information
Omniglot Dataset Citation Information
				
					@article{lake2015human,
title={Human-level concept learning through probabilistic program induction},
author={Lake, Brenden M and Salakhutdinov, Ruslan and Tenenbaum, Joshua B},
journal={Science},
volume={350},
number={6266},
pages={1332--1338},
year={2015},
publisher={American Association for the Advancement of Science}
}
				
			
CONTENTS