Machine Learning Datasets Machine Learning Datasets
  • GitHub
  • Slack
  • Documentation
Get Started
Machine Learning Datasets Machine Learning Datasets
Get Started
Machine Learning Datasets
  • GitHub
  • Slack
  • Documentation

Docy

Machine Learning Datasets

  • Folder icon closed Folder open iconDatasets
    • MNIST
    • ImageNet Dataset
    • COCO Dataset
    • CIFAR 10 Dataset
    • CIFAR 100 Dataset
    • FFHQ Dataset
    • Places205 Dataset
    • GTZAN Genre Dataset
    • GTZAN Music Speech Dataset
    • The Street View House Numbers (SVHN) Dataset
    • Caltech 101 Dataset
    • LibriSpeech Dataset
    • dSprites Dataset
    • PUCPR Dataset
    • RAVDESS Dataset
    • GTSRB Dataset
    • CSSD Dataset
    • ATIS Dataset
    • Free Spoken Digit Dataset (FSDD)
    • not-MNIST Dataset
    • ECSSD Dataset
    • COCO-Text Dataset
    • CoQA Dataset
    • FGNET Dataset
    • ESC-50 Dataset
    • GlaS Dataset
    • UTZappos50k Dataset
    • Pascal VOC 2012 Dataset
    • Pascal VOC 2007 Dataset
    • Omniglot Dataset
    • HMDB51 Dataset
    • Chest X-Ray Image Dataset
    • NIH Chest X-ray Dataset
    • Fashionpedia Dataset
    • DRIVE Dataset
    • Kaggle Cats & Dogs Dataset
    • Lincolnbeet Dataset
    • Sentiment-140 Dataset
    • MURA Dataset
    • LIAR Dataset
    • Stanford Cars Dataset
    • SWAG Dataset
    • HASYv2 Dataset
    • WFLW Dataset
    • Visdrone Dataset
    • 11k Hands Dataset
    • QuAC Dataset
    • LFW Deep Funneled Dataset
    • LFW Funneled Dataset
    • Office-Home Dataset
    • LFW Dataset
    • PlantVillage Dataset
    • Optical Handwritten Digits Dataset
    • UCI Seeds Dataset
    • STN-PLAD Dataset
    • FER2013 Dataset
    • Adience Dataset
    • PPM-100 Dataset
    • CelebA Dataset
    • Fashion MNIST Dataset
    • Google Objectron Dataset
    • CARPK Dataset
    • CACD Dataset
    • Flickr30k Dataset
    • Kuzushiji-Kanji (KKanji) dataset
    • KMNIST
    • EMNIST Dataset
    • USPS Dataset
    • MARS Dataset
    • HICO Classification Dataset
    • NSynth Dataset
    • RESIDE dataset
    • Electricity Dataset
    • DRD Dataset
    • Caltech 256 Dataset
    • AFW Dataset
    • PACS Dataset
    • TIMIT Dataset
    • KTH Actions Dataset
    • WIDER Face Dataset
    • WISDOM Dataset
    • DAISEE Dataset
    • WIDER Dataset
    • LSP Dataset
    • UCF Sports Action Dataset
    • Wiki Art Dataset
    • FIGRIM Dataset
    • ANIMAL (ANIMAL10N) Dataset
    • OPA Dataset
    • DomainNet Dataset
    • HAM10000 Dataset
    • Tiny ImageNet Dataset
    • Speech Commands Dataset
    • 300w Dataset
    • Food 101 Dataset
    • VCTK Dataset
    • LOL Dataset
    • AQUA Dataset
    • LFPW Dataset
    • ARID Video Action dataset
    • NABirds Dataset
    • SQuAD Dataset
    • ICDAR 2013 Dataset
    • Animal Pose Dataset
  • Folder icon closed Folder open iconDeep Lake Docs Home
  • Folder icon closed Folder open iconDataset Visualization
  • API Basics
  • Storage & Credentials
  • Getting Started
  • Tutorials (w Colab)
  • Playbooks
  • Data Layout
  • Folder icon closed Folder open iconShuffling in ds.pytorch()
  • Folder icon closed Folder open iconStorage Synchronization
  • Folder icon closed Folder open iconTensor Relationships
  • Folder icon closed Folder open iconQuickstart
  • Folder icon closed Folder open iconHow to Contribute

QuAC Dataset

Estimated reading: 4 minutes

QuAC Dataset

What is QuAC Dataset?

QuAC (Question Answering in Context) is a question-answering in-context dataset containing 14K information-seeking QA conversations (100K questions in total). Data consists of a dialogue between two people, where a student asks questions and a teacher gives answers by providing short snippets of the text. QuAC presents challenges not found in current machine-understanding data sets: its questions are often open-ended, unanswerable, or only meaningful in the context of dialogue.

Download QuAC Dataset in Python

Instead of downloading the QuAC dataset in Python, you can effortlessly load it in Python via our Deep Lake open-source with just one line of code.

Load QuAC Dataset Training Subset in Python

				
					import deeplake
ds = deeplake.load("hub://activeloop/quac-train")
				
			

Load QuAC Dataset Validation Subset in Python

				
					import deeplake
ds = deeplake.load("hub://activeloop/quac-val")
				
			

QuAC Dataset Structure

QuAC Data Fields
For the training set
  • id: tensor containing the id of the dialogue.
  • context: tensor containing text in Wikipedia.
  • followup_label: tensor which contains the list of follow-up actions.
  • yesorno_answer: tensor containing yes or no in the dialogue. y represents yes, n represents no, and x represents neither of them.
  • question: tensor containing questions in the dialogue.
  • answer_text: tensor that contains an answer to the questions.
  • answer_start: tensor that contains starting offsets.
  • original_ans_text: tensor that contains original answers given by the teacher in the dialogue
  • original_ans_start: tensor that contains starting offsets of the original answer.
For the validation set
  • id: tensor containing the id of the dialogue.
  • context: tensor containing text in Wikipedia.
  • followup_label: tensor which contains a list of follow-up actions.
  • yesorno_answer: tensor containing yes or no in the dialogue. y represents yes, n represents no, and x represents neither of them.
  • question: tensor containing questions in the dialogue.
  • answer_text: tensor that contains answers to the questions.
  • answer_start: tensor that contains starting offsets.
  • original_ans_text: tensor that contains original answers given by the teacher in the dialogue.
  • original_ans_start: tensor that contains starting offsets of the original answer.
QuAC Data Splits
  • The QuAC dataset training set comprises 83,568 questions, 11,567 dialogs, and 6843 unique sections.
  • The QuAC dataset validation set comprises 7,354 questions, 1,000 dialogs, and 1,000 unique sections.

How to use QuAC Dataset with PyTorch and TensorFlow in Python

Train a model on QuAC dataset with PyTorch in Python

Let’s use Deep Lake built-in PyTorch one-line dataloader to connect the data to the compute:

				
					dataloader = ds.pytorch(num_workers=0, batch_size=4, shuffle=False)
				
			
Train a model on QuAC dataset with TensorFlow in Python
				
					dataloader = ds.tensorflow()
				
			

QuAC Dataset Creation

Data Collection and Normalization Information
Amazon Mechanical Turk was used for collecting the data. The task was limited to workers in English-speaking countries with over 1000 HITs and at an acceptance rate of at least 95%. The workers were rewarded based on how many turns they had in the dialog with their partner, which encouraged them to have long conversations with their partner and to discard dialogs with less than three QA pairs. A qualification task was created, allowing workers to report their partners for various problems to ensure quality.

Additional Information about QuAC Dataset

QuAC Dataset Description

  • Homepage: https://quac.ai/
  • Repository: N/A
  • Paper: Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen-tau Yih, Yejin Choi, Percy Liang, Luke Zettlemoyer: QuAC : Question Answering in Context
  • Point of Contact: http://yann.lecun.com/, [email protected], [email protected], [email protected], [email protected]
QuAC Dataset Curators
Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen-tau Yih, Yejin Choi, Percy Liang, Luke Zettlemoyer
QuAC Dataset Licensing Information
CC BY-SA 4.0 Licence
QuAC Dataset Citation Information
				
					@misc{choi2018quac,
      title={QuAC : Question Answering in Context}, 
      author={Eunsol Choi and He He and Mohit Iyyer and Mark Yatskar and Wen-tau Yih and Yejin Choi and Percy Liang and Luke Zettlemoyer},
      year={2018},
      eprint={1808.07036},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
				
			
Datasets - Previous 11k Hands Dataset Next - Datasets LFW Deep Funneled Dataset
Leaf Illustration

© 2022 All Rights Reserved by Snark AI, inc dba Activeloop