Machine Learning Datasets Machine Learning Datasets
  • GitHub
  • Slack
  • Documentation
  • Datasets
    • MNIST
    • ImageNet Dataset
    • COCO Dataset
    • CIFAR 10 Dataset
    • CIFAR 100 Dataset
    • FFHQ Dataset
    • Places205 Dataset
    • GTZAN Genre Dataset
    • GTZAN Music Speech Dataset
    • The Street View House Numbers (SVHN) Dataset
    • Caltech 101 Dataset
    • LibriSpeech Dataset
    • dSprites Dataset
    • PUCPR Dataset
    • RAVDESS Dataset
    • GTSRB Dataset
    • CSSD Dataset
    • ATIS Dataset
    • Free Spoken Digit Dataset (FSDD)
    • not-MNIST Dataset
    • ECSSD Dataset
    • COCO-Text Dataset
    • CoQA Dataset
    • FGNET Dataset
    • ESC-50 Dataset
    • GlaS Dataset
    • UTZappos50k Dataset
    • Pascal VOC 2012 Dataset
    • Pascal VOC 2007 Dataset
    • Omniglot Dataset
    • HMDB51 Dataset
    • Chest X-Ray Image Dataset
    • NIH Chest X-ray Dataset
    • Fashionpedia Dataset
    • DRIVE Dataset
    • Kaggle Cats & Dogs Dataset
    • Lincolnbeet Dataset
    • Sentiment-140 Dataset
    • MURA Dataset
    • LIAR Dataset
    • Stanford Cars Dataset
    • SWAG Dataset
    • HASYv2 Dataset
    • WFLW Dataset
    • Visdrone Dataset
    • 11k Hands Dataset
    • QuAC Dataset
    • LFW Deep Funneled Dataset
    • LFW Funneled Dataset
    • Office-Home Dataset
    • LFW Dataset
    • PlantVillage Dataset
    • Optical Handwritten Digits Dataset
    • UCI Seeds Dataset
    • STN-PLAD Dataset
    • FER2013 Dataset
    • Adience Dataset
    • PPM-100 Dataset
    • CelebA Dataset
    • Fashion MNIST Dataset
    • Google Objectron Dataset
    • CARPK Dataset
    • CACD Dataset
    • Flickr30k Dataset
    • Kuzushiji-Kanji (KKanji) dataset
    • KMNIST
    • EMNIST Dataset
    • USPS Dataset
    • MARS Dataset
    • HICO Classification Dataset
    • NSynth Dataset
    • RESIDE dataset
    • Electricity Dataset
    • DRD Dataset
    • Caltech 256 Dataset
    • AFW Dataset
    • PACS Dataset
    • TIMIT Dataset
    • KTH Actions Dataset
    • WIDER Face Dataset
    • WISDOM Dataset
    • DAISEE Dataset
    • WIDER Dataset
    • LSP Dataset
    • UCF Sports Action Dataset
    • Wiki Art Dataset
    • FIGRIM Dataset
    • ANIMAL (ANIMAL10N) Dataset
    • OPA Dataset
    • DomainNet Dataset
    • HAM10000 Dataset
    • Tiny ImageNet Dataset
    • Speech Commands Dataset
    • 300w Dataset
    • Food 101 Dataset
    • VCTK Dataset
    • LOL Dataset
    • AQUA Dataset
    • LFPW Dataset
    • ARID Video Action dataset
    • NABirds Dataset
    • SQuAD Dataset
    • ICDAR 2013 Dataset
    • Animal Pose Dataset
Get Started
Machine Learning Datasets Machine Learning Datasets
Get Started
Machine Learning Datasets
  • GitHub
  • Slack
  • Documentation
  • Datasets
    • MNIST
    • ImageNet Dataset
    • COCO Dataset
    • CIFAR 10 Dataset
    • CIFAR 100 Dataset
    • FFHQ Dataset
    • Places205 Dataset
    • GTZAN Genre Dataset
    • GTZAN Music Speech Dataset
    • The Street View House Numbers (SVHN) Dataset
    • Caltech 101 Dataset
    • LibriSpeech Dataset
    • dSprites Dataset
    • PUCPR Dataset
    • RAVDESS Dataset
    • GTSRB Dataset
    • CSSD Dataset
    • ATIS Dataset
    • Free Spoken Digit Dataset (FSDD)
    • not-MNIST Dataset
    • ECSSD Dataset
    • COCO-Text Dataset
    • CoQA Dataset
    • FGNET Dataset
    • ESC-50 Dataset
    • GlaS Dataset
    • UTZappos50k Dataset
    • Pascal VOC 2012 Dataset
    • Pascal VOC 2007 Dataset
    • Omniglot Dataset
    • HMDB51 Dataset
    • Chest X-Ray Image Dataset
    • NIH Chest X-ray Dataset
    • Fashionpedia Dataset
    • DRIVE Dataset
    • Kaggle Cats & Dogs Dataset
    • Lincolnbeet Dataset
    • Sentiment-140 Dataset
    • MURA Dataset
    • LIAR Dataset
    • Stanford Cars Dataset
    • SWAG Dataset
    • HASYv2 Dataset
    • WFLW Dataset
    • Visdrone Dataset
    • 11k Hands Dataset
    • QuAC Dataset
    • LFW Deep Funneled Dataset
    • LFW Funneled Dataset
    • Office-Home Dataset
    • LFW Dataset
    • PlantVillage Dataset
    • Optical Handwritten Digits Dataset
    • UCI Seeds Dataset
    • STN-PLAD Dataset
    • FER2013 Dataset
    • Adience Dataset
    • PPM-100 Dataset
    • CelebA Dataset
    • Fashion MNIST Dataset
    • Google Objectron Dataset
    • CARPK Dataset
    • CACD Dataset
    • Flickr30k Dataset
    • Kuzushiji-Kanji (KKanji) dataset
    • KMNIST
    • EMNIST Dataset
    • USPS Dataset
    • MARS Dataset
    • HICO Classification Dataset
    • NSynth Dataset
    • RESIDE dataset
    • Electricity Dataset
    • DRD Dataset
    • Caltech 256 Dataset
    • AFW Dataset
    • PACS Dataset
    • TIMIT Dataset
    • KTH Actions Dataset
    • WIDER Face Dataset
    • WISDOM Dataset
    • DAISEE Dataset
    • WIDER Dataset
    • LSP Dataset
    • UCF Sports Action Dataset
    • Wiki Art Dataset
    • FIGRIM Dataset
    • ANIMAL (ANIMAL10N) Dataset
    • OPA Dataset
    • DomainNet Dataset
    • HAM10000 Dataset
    • Tiny ImageNet Dataset
    • Speech Commands Dataset
    • 300w Dataset
    • Food 101 Dataset
    • VCTK Dataset
    • LOL Dataset
    • AQUA Dataset
    • LFPW Dataset
    • ARID Video Action dataset
    • NABirds Dataset
    • SQuAD Dataset
    • ICDAR 2013 Dataset
    • Animal Pose Dataset

Machine Learning Datasets

  • Folder icon closed Folder open iconDatasets
    • MNIST
    • ImageNet Dataset
    • COCO Dataset
    • CIFAR 10 Dataset
    • CIFAR 100 Dataset
    • FFHQ Dataset
    • Places205 Dataset
    • GTZAN Genre Dataset
    • GTZAN Music Speech Dataset
    • The Street View House Numbers (SVHN) Dataset
    • Caltech 101 Dataset
    • LibriSpeech Dataset
    • dSprites Dataset
    • PUCPR Dataset
    • RAVDESS Dataset
    • GTSRB Dataset
    • CSSD Dataset
    • ATIS Dataset
    • Free Spoken Digit Dataset (FSDD)
    • not-MNIST Dataset
    • ECSSD Dataset
    • COCO-Text Dataset
    • CoQA Dataset
    • FGNET Dataset
    • ESC-50 Dataset
    • GlaS Dataset
    • UTZappos50k Dataset
    • Pascal VOC 2012 Dataset
    • Pascal VOC 2007 Dataset
    • Omniglot Dataset
    • HMDB51 Dataset
    • Chest X-Ray Image Dataset
    • NIH Chest X-ray Dataset
    • Fashionpedia Dataset
    • DRIVE Dataset
    • Kaggle Cats & Dogs Dataset
    • Lincolnbeet Dataset
    • Sentiment-140 Dataset
    • MURA Dataset
    • LIAR Dataset
    • Stanford Cars Dataset
    • SWAG Dataset
    • HASYv2 Dataset
    • WFLW Dataset
    • Visdrone Dataset
    • 11k Hands Dataset
    • QuAC Dataset
    • LFW Deep Funneled Dataset
    • LFW Funneled Dataset
    • Office-Home Dataset
    • LFW Dataset
    • PlantVillage Dataset
    • Optical Handwritten Digits Dataset
    • UCI Seeds Dataset
    • STN-PLAD Dataset
    • FER2013 Dataset
    • Adience Dataset
    • PPM-100 Dataset
    • CelebA Dataset
    • Fashion MNIST Dataset
    • Google Objectron Dataset
    • CARPK Dataset
    • CACD Dataset
    • Flickr30k Dataset
    • Kuzushiji-Kanji (KKanji) dataset
    • KMNIST
    • EMNIST Dataset
    • USPS Dataset
    • MARS Dataset
    • HICO Classification Dataset
    • NSynth Dataset
    • RESIDE dataset
    • Electricity Dataset
    • DRD Dataset
    • Caltech 256 Dataset
    • AFW Dataset
    • PACS Dataset
    • TIMIT Dataset
    • KTH Actions Dataset
    • WIDER Face Dataset
    • WISDOM Dataset
    • DAISEE Dataset
    • WIDER Dataset
    • LSP Dataset
    • UCF Sports Action Dataset
    • Wiki Art Dataset
    • FIGRIM Dataset
    • ANIMAL (ANIMAL10N) Dataset
    • OPA Dataset
    • DomainNet Dataset
    • HAM10000 Dataset
    • Tiny ImageNet Dataset
    • Speech Commands Dataset
    • 300w Dataset
    • Food 101 Dataset
    • VCTK Dataset
    • LOL Dataset
    • AQUA Dataset
    • LFPW Dataset
    • ARID Video Action dataset
    • NABirds Dataset
    • SQuAD Dataset
    • ICDAR 2013 Dataset
    • Animal Pose Dataset
  • Folder icon closed Folder open iconDeep Lake Docs Home
  • Folder icon closed Folder open iconDataset Visualization
  • Folder icon closed Folder open iconAPI Basics
  • Folder icon closed Folder open iconStorage & Credentials
  • Folder icon closed Folder open iconGetting Started
  • Folder icon closed Folder open iconTutorials (w Colab)
  • Folder icon closed Folder open iconPlaybooks
  • Folder icon closed Folder open iconData Layout
  • Folder icon closed Folder open iconShuffling in ds.pytorch()
  • Folder icon closed Folder open iconStorage Synchronization
  • Folder icon closed Folder open iconTensor Relationships
  • Folder icon closed Folder open iconQuickstart
  • Folder icon closed Folder open iconHow to Contribute
Datasets

CelebA Dataset

Estimated reading: 6 minutes 7180 views

Visualization of the Celeb-A dataset in the Deep Lake UI

Celeb-A dataset

What is Celeb-A Dataset?

The CelebFaces Attributes Dataset (CelebA) consists of more than 200K celebrity images with 40 attribute annotations each. The images range from extreme poses to heavily background-cluttered backgrounds. Images cover large pose variations, background clutter, and diverse people, making this dataset great for training and testing models for face detection. It can identify people with brown hair, smiling, or wearing glasses.

Download Celeb-A Dataset in Python

Instead of downloading the CelebA dataset in Python, you can effortlessly load it in Python via our Deep Lake open-source with just one line of code.

Load CelebA Dataset Training Subset in Python

				
					import deeplake
ds = deeplake.load("hub://activeloop/celeb-a-train")
				
			

Load CelebA Dataset Validation Subset in Python

				
					import deeplake
ds = deeplake.load("hub://activeloop/celeb-a-val")
				
			

Load CelebA Dataset Testing Subset in Python

				
					import deeplake
ds = deeplake.load("hub://activeloop/celeb-a-test")
				
			

CelebA Dataset Structure

CelebA Data Fields
  • image: tensor containing the 178×218 image.
  • bbox: tensor containing bounding box of their respective images.
  • keypoints: tensor to identify 63 various key points from face
  • clock_shadow: tensor to check cloak shadow.
  • arched_eyebrows: tensor to check arch eyebrows.
  • attractive: tensor to check if attractive or not.
  • bags_under_eyes: tensor to check if bags are under the eyes.
  • bald: tensor to check if bald or not.
  • bangs: tensor to check if bangs are there or not.
  • big_lips: tensor to check if big lips are there or not.
  • big_nose: tensor to check if big nose is there or not.
  • black_hair: tensor to check the presence of black hair.
  • blond_hair: tensor to check if blond hair or not.
  • blurry: tensor to check if the image is blurred.
  • brown_hair: tensor to check the presence of brown hair.
  • bushy_eyebrows: tensor to check the presence of bushy eyebrows.
  • chubby: tensor to check if chubby or not.
  • double_chin: tensor to check the presence of double chin.
  • eyeglasses: tensor checks the presence of eyebrows.
  • goatee: tensor to check the presence of a goatee in a person.
  • gray_hair: tensor to check the presence of gray hair.
  • heavy_makeup: tensor to check the presence of heavy makeup.
  • high_cheekbones: tensor to check the presence of high cheekbones.
  • male: tensor to check if the person is male.
  • mouth_slightly_open: tensor to check if the mouth is open.
  • mustache: tensor to check the presence of a mustache.
  • narrow_eyes: tensor to check narrow eyes or not.
  • no_beard: tensor to check if the beard is present.
  • oval_face: tensor to check if the face is oval.
  • pale_skin: tensor to check if the skin is pale.
  • pointy_nose: tensor to check if the nose is pointy.
  • receding_hairline: tensor to check if the hairline is receding.
  • rosy_cheeks: tensor to check if the cheeks are rosy.
  • sideburns: tensor to check the presence of sideburns.
  • smiling: tensor to check if the person is smiling.
  • straight_hair: tensor to check if the hair is straight.
  • wavy_hair: tensor to check if the hair is wavy.
  • wearing_earrings: tensor to check the presence of earing.
  • wearing_hat: tensor to check the presence of the hat.
  • wearing_lipstick: tensor to check the presence of lipstick.
  • wearing_necklace: tensor to check the presence of the necklace.
  • wearing_necktie: tensor to check the presence of necktie.
  • young: tensor to check if the person is young.
CelebA Data Splits
  • The CelebA dataset training set is composed of 162,770.
  • The CelebA dataset test set was composed of 19,962.
  • The CelebA dataset val set was composed of 19,867.

How to use CelebA Dataset with PyTorch and TensorFlow in Python

Train a model on CelebA dataset with PyTorch in Python

Let’s use Deep Lake built-in PyTorch one-line dataloader to connect the data to the compute:

				
					dataloader = ds.pytorch(num_workers=0, batch_size=4, shuffle=False)
				
			
Train a model on CelebA dataset with TensorFlow in Python
				
					dataloader = ds.tensorflow()
				
			

Additional Information about CelebA Dataset

CelebA Dataset Description

  • Homepage: https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html
  • Repository: N/A
  • Paper: Liu, Ziwei and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou: Deep Learning Face Attributes in the Wild, Proceedings of International Conference on Computer Vision (ICCV), 2015
  • Point of Contact: ziwei.liu at ntu.edu.sg
CelebA Dataset Curators

Liu, Ziwei and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou

CelebA Dataset Licensing Information

Deep Lake users may have access to a variety of publicly available datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have a license to use the datasets. It is your responsibility to determine whether you have permission to use the datasets under their license.

If you’re a dataset owner and do not want your dataset to be included in this library, please get in touch through a GitHub issue. Thank you for your contribution to the ML community!

CelebA Dataset Citation Information
				
					@inproceedings{liu2015faceattributes,
  title = {Deep Learning Face Attributes in the Wild},
  author = {Liu, Ziwei and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou},
  booktitle = {Proceedings of International Conference on Computer Vision (ICCV)},
  month = {December},
  year = {2015} 
}
				
			

CelebA Dataset FAQs

What is the CelebA dataset for Python?

The CelebFaces Attributes Dataset (CelebA) consists of more than 200K celebrity images with 40 attribute annotations each. The images range from extreme poses to heavily background-cluttered backgrounds.

What is the CelebA dataset used for?

This dataset is great for training and testing models for face detection, particularly for recognizing facial attributes such as finding people with brown hair, smiling, or wearing glasses. Images cover large pose variations, background clutter, and diverse people, supported by a large number of images and rich annotations.

How can I use CelebA dataset in PyTorch or TensorFlow?

You can stream the CelebA dataset while training a model in PyTorch or TensorFlow with one line of code using the open-source package Activeloop Deep Lake in Python. See detailed instructions on how to train a model on the CelebA dataset with PyTorch in Python or train a model on the CelebA dataset with TensorFlow in Python.

Datasets - Previous PPM-100 Dataset Next - Datasets Fashion MNIST Dataset
Share this Doc

CelebA Dataset

Or copy link

Clipboard Icon
CONTENTS
Leaf Illustration

© 2024 All Rights Reserved by Snark AI, inc dba Activeloop