Learning from crowdsourced labeled data: a survey

Jing Zhang, Xindong Wu, Victor S. Sheng

Research output: Contribution to journalArticlepeer-review

59 Scopus citations

Abstract

With the rapid growing of crowdsourcing systems, quite a few applications based on a supervised learning paradigm can easily obtain massive labeled data at a relatively low cost. However, due to the variable uncertainty of crowdsourced labelers, learning procedures face great challenges. Thus, improving the qualities of labels and learning models plays a key role in learning from the crowdsourced labeled data. In this survey, we first introduce the basic concepts of the qualities of labels and learning models. Then, by reviewing recently proposed models and algorithms on ground truth inference and learning models, we analyze connections and distinctions among these techniques as well as clarify the level of the progress of related researches. In order to facilitate the studies in this field, we also introduce open accessible real-world data sets collected from crowdsourcing systems and open source libraries and tools. Finally, some potential issues for future studies are discussed.

Original languageEnglish
Pages (from-to)543-576
Number of pages34
JournalArtificial Intelligence Review
Volume46
Issue number4
DOIs
StatePublished - Dec 1 2016

Keywords

  • Crowdsourcing
  • Ground truth inference
  • Label quality
  • Learning from crowds
  • Learning model quality
  • Multiple noisy labeling

Fingerprint Dive into the research topics of 'Learning from crowdsourced labeled data: a survey'. Together they form a unique fingerprint.

Cite this