Learning from crowdsourced labeled data: a survey

Jing Zhang, Xindong Wu, Victor S. Sheng

Research output: Contribution to journalArticlepeer-review

136 Scopus citations


With the rapid growing of crowdsourcing systems, quite a few applications based on a supervised learning paradigm can easily obtain massive labeled data at a relatively low cost. However, due to the variable uncertainty of crowdsourced labelers, learning procedures face great challenges. Thus, improving the qualities of labels and learning models plays a key role in learning from the crowdsourced labeled data. In this survey, we first introduce the basic concepts of the qualities of labels and learning models. Then, by reviewing recently proposed models and algorithms on ground truth inference and learning models, we analyze connections and distinctions among these techniques as well as clarify the level of the progress of related researches. In order to facilitate the studies in this field, we also introduce open accessible real-world data sets collected from crowdsourcing systems and open source libraries and tools. Finally, some potential issues for future studies are discussed.

Original languageEnglish
Pages (from-to)543-576
Number of pages34
JournalArtificial Intelligence Review
Issue number4
StatePublished - Dec 1 2016


  • Crowdsourcing
  • Ground truth inference
  • Label quality
  • Learning from crowds
  • Learning model quality
  • Multiple noisy labeling


Dive into the research topics of 'Learning from crowdsourced labeled data: a survey'. Together they form a unique fingerprint.

Cite this