Multi-Class Ground Truth Inference in Crowdsourcing with Clustering

Jing Zhang, Victor S. Sheng, Jian Wu, Xindong Wu

Research output: Contribution to journalArticlepeer-review

74 Scopus citations


Due to low quality of crowdsourced labelers, the integrated label of each example is usually inferred from its multiple noisy labels provided by different labelers. This paper proposes a novel algorithm, Ground Truth Inference using Clustering (GTIC), to improve the quality of integrated labels for multi-class labeling. For a K labeling case, GTIC utilizes the multiple noisy label sets of examples to generate features. Then, it uses a K-Means algorithm to cluster all examples into K different groups, each of which is mapped to a specific class. Examples in the same cluster are assigned a corresponding class label. We compare GTIC with four existing multi-class ground truth inference algorithms, majority voting (MV), Dawid & Skene's (DS), ZenCrowd (ZC) and Spectral DS (SDS), on one synthetic and eight real-world datasets. Experimental results show that the performance of GTIC is significantly superior to the others in terms of both accuracy and M-AUC. Besides, the running time of GTIC is about twenty times faster than EM-based complicated inference algorithms.

Original languageEnglish
Article number7345572
Pages (from-to)1080-1085
Number of pages6
JournalIEEE Transactions on Knowledge and Data Engineering
Issue number4
StatePublished - Apr 1 2016


  • Clustering
  • EM algorithm
  • crowdsourcing
  • ground truth inference
  • multi-class labeling


Dive into the research topics of 'Multi-Class Ground Truth Inference in Crowdsourcing with Clustering'. Together they form a unique fingerprint.

Cite this