Omniglot Dataset

Name: Omniglot
Creator: Brenden M. Lake
License: https://github.com/brendenlake/omniglot/blob/master/LICENSE

Estimated reading: 2 minutes 3916 views

Visualization of the Omniglot Dataset in the Deep Lake UI

Omniglot Dataset

What is Omniglot Dataset?

The Omniglot dataset is created with the goal of creating learning algorithms that are more human-like. It includes 1623 handwritten characters from 50 different alphabets. Each of the 1623 characters was created by 20 individuals using Amazon’s Mechanical Turk service. Each image is accompanied by stroke data, which consists of a series of [x,y,t] coordinates separated by time (t) in milliseconds. The dataset is split into a background set of 30 alphabets and an evaluation set of 20 alphabets.

Download Omniglot Dataset in Python

Instead of downloading the Omniglot dataset in Python, you can effortlessly load it in Python via our Deep Lake open-source with just one line of code.

Load Omniglot Dataset Training Subset in Python

				
					import deeplake
ds = deeplake.load('hub://activeloop/omniglot-images-strokes-train')

Load Omniglot Dataset Validation Subset in Python

				
					import deeplake
ds = deeplake.load('hub://activeloop/omniglot-images-strokes-val')

Omniglot Dataset Structure

Omniglot Data Fields

image: tensor that contains the image of size 105×105.

alphabet: tensor that contains different alphabets.

character_in_alphabet: tensor that contains characters in the alphabet.

penstroke: tensor that contains stroke data, a sequence of [x,y,t] coordinates with time (t) in milliseconds beginning with “START” and Breaks between pen strokes are denoted as “BREAK” (indicating a pen up action).

Omniglot Data Splits

The Omniglot dataset training set is composed of 19280 samples.

The Omniglot dataset validation set was composed of 13180 samples.

How to use Omniglot Dataset with PyTorch and TensorFlow in Python

Train a model on the Omniglot dataset with PyTorch in Python

Let’s use Deep Lake built-in PyTorch one-line dataloader to connect the data to the compute:

				
					dataloader = ds.pytorch(num_workers=0, batch_size=4, shuffle=False)

Train a model on the Omniglot dataset with TensorFlow in Python

				
					dataloader = ds.tensorflow()

Additional Information about Omniglot Dataset

Omniglot Dataset Description

Homepage: https://github.com/brendenlake/omniglot

Paper: https://www.cs.cmu.edu/~rsalakhu/papers/LakeEtAl2015Science.pdf

Omniglot Dataset Curators

Brenden M. Lake, Ruslan Salakhutdinov, Joshua B. Tenenbaum

Omniglot Dataset Licensing Information

MIT License

Omniglot Dataset Citation Information

				
					@article{lake2015human,
title={Human-level concept learning through probabilistic program induction},
author={Lake, Brenden M and Salakhutdinov, Ruslan and Tenenbaum, Joshua B},
journal={Science},
volume={350},
number={6266},
pages={1332--1338},
year={2015},
publisher={American Association for the Advancement of Science}
}

Omniglot Dataset

Omniglot Dataset

What is Omniglot Dataset?

Download Omniglot Dataset in Python

Load Omniglot Dataset Training Subset in Python

Load Omniglot Dataset Validation Subset in Python

Omniglot Dataset Structure

Omniglot Data Fields

Omniglot Data Splits

How to use Omniglot Dataset with PyTorch and TensorFlow in Python

Train a model on the Omniglot dataset with PyTorch in Python

Train a model on the Omniglot dataset with TensorFlow in Python

Additional Information about Omniglot Dataset

Omniglot Dataset Description

Omniglot Dataset Curators

Omniglot Dataset Licensing Information

Omniglot Dataset Citation Information

Omniglot Dataset

CONTENTS