Multi-Class Ground Truth Inference in Crowdsourcing with Clustering

Jing Zhang; Victor S. Sheng; Jian Wu; Xindong Wu

doi:10.1109/TKDE.2015.2504974

Multi-Class Ground Truth Inference in Crowdsourcing with Clustering

Jing Zhang, Victor S. Sheng, Jian Wu, Xindong Wu

Computer Science

Research output: Contribution to journal › Article › peer-review

80 Scopus citations

Abstract

Due to low quality of crowdsourced labelers, the integrated label of each example is usually inferred from its multiple noisy labels provided by different labelers. This paper proposes a novel algorithm, Ground Truth Inference using Clustering (GTIC), to improve the quality of integrated labels for multi-class labeling. For a K labeling case, GTIC utilizes the multiple noisy label sets of examples to generate features. Then, it uses a K-Means algorithm to cluster all examples into K different groups, each of which is mapped to a specific class. Examples in the same cluster are assigned a corresponding class label. We compare GTIC with four existing multi-class ground truth inference algorithms, majority voting (MV), Dawid & Skene's (DS), ZenCrowd (ZC) and Spectral DS (SDS), on one synthetic and eight real-world datasets. Experimental results show that the performance of GTIC is significantly superior to the others in terms of both accuracy and M-AUC. Besides, the running time of GTIC is about twenty times faster than EM-based complicated inference algorithms.

Original language	English
Article number	7345572
Pages (from-to)	1080-1085
Number of pages	6
Journal	IEEE Transactions on Knowledge and Data Engineering
Volume	28
Issue number	4
DOIs	https://doi.org/10.1109/TKDE.2015.2504974
State	Published - Apr 1 2016

Keywords

Clustering
EM algorithm
crowdsourcing
ground truth inference
multi-class labeling

Access to Document

10.1109/TKDE.2015.2504974

Cite this

@article{7e038532b4ac48f3b014bbb87db91bac,

title = "Multi-Class Ground Truth Inference in Crowdsourcing with Clustering",

abstract = "Due to low quality of crowdsourced labelers, the integrated label of each example is usually inferred from its multiple noisy labels provided by different labelers. This paper proposes a novel algorithm, Ground Truth Inference using Clustering (GTIC), to improve the quality of integrated labels for multi-class labeling. For a K labeling case, GTIC utilizes the multiple noisy label sets of examples to generate features. Then, it uses a K-Means algorithm to cluster all examples into K different groups, each of which is mapped to a specific class. Examples in the same cluster are assigned a corresponding class label. We compare GTIC with four existing multi-class ground truth inference algorithms, majority voting (MV), Dawid & Skene's (DS), ZenCrowd (ZC) and Spectral DS (SDS), on one synthetic and eight real-world datasets. Experimental results show that the performance of GTIC is significantly superior to the others in terms of both accuracy and M-AUC. Besides, the running time of GTIC is about twenty times faster than EM-based complicated inference algorithms.",

keywords = "Clustering, EM algorithm, crowdsourcing, ground truth inference, multi-class labeling",

author = "Jing Zhang and Sheng, {Victor S.} and Jian Wu and Xindong Wu",

note = "Publisher Copyright: {\textcopyright} 2015 IEEE.",

year = "2016",

month = apr,

day = "1",

doi = "10.1109/TKDE.2015.2504974",

language = "English",

volume = "28",

pages = "1080--1085",

journal = "IEEE Transactions on Knowledge and Data Engineering",

issn = "1041-4347",

number = "4",

}

TY - JOUR

T1 - Multi-Class Ground Truth Inference in Crowdsourcing with Clustering

AU - Zhang, Jing

AU - Sheng, Victor S.

AU - Wu, Jian

AU - Wu, Xindong

PY - 2016/4/1

Y1 - 2016/4/1

N2 - Due to low quality of crowdsourced labelers, the integrated label of each example is usually inferred from its multiple noisy labels provided by different labelers. This paper proposes a novel algorithm, Ground Truth Inference using Clustering (GTIC), to improve the quality of integrated labels for multi-class labeling. For a K labeling case, GTIC utilizes the multiple noisy label sets of examples to generate features. Then, it uses a K-Means algorithm to cluster all examples into K different groups, each of which is mapped to a specific class. Examples in the same cluster are assigned a corresponding class label. We compare GTIC with four existing multi-class ground truth inference algorithms, majority voting (MV), Dawid & Skene's (DS), ZenCrowd (ZC) and Spectral DS (SDS), on one synthetic and eight real-world datasets. Experimental results show that the performance of GTIC is significantly superior to the others in terms of both accuracy and M-AUC. Besides, the running time of GTIC is about twenty times faster than EM-based complicated inference algorithms.

AB - Due to low quality of crowdsourced labelers, the integrated label of each example is usually inferred from its multiple noisy labels provided by different labelers. This paper proposes a novel algorithm, Ground Truth Inference using Clustering (GTIC), to improve the quality of integrated labels for multi-class labeling. For a K labeling case, GTIC utilizes the multiple noisy label sets of examples to generate features. Then, it uses a K-Means algorithm to cluster all examples into K different groups, each of which is mapped to a specific class. Examples in the same cluster are assigned a corresponding class label. We compare GTIC with four existing multi-class ground truth inference algorithms, majority voting (MV), Dawid & Skene's (DS), ZenCrowd (ZC) and Spectral DS (SDS), on one synthetic and eight real-world datasets. Experimental results show that the performance of GTIC is significantly superior to the others in terms of both accuracy and M-AUC. Besides, the running time of GTIC is about twenty times faster than EM-based complicated inference algorithms.

KW - Clustering

KW - EM algorithm

KW - crowdsourcing

KW - ground truth inference

KW - multi-class labeling

UR - http://www.scopus.com/inward/record.url?scp=84963731561&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2015.2504974

DO - 10.1109/TKDE.2015.2504974

M3 - Article

AN - SCOPUS:84963731561

SN - 1041-4347

VL - 28

SP - 1080

EP - 1085

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

IS - 4

M1 - 7345572

ER -

Multi-Class Ground Truth Inference in Crowdsourcing with Clustering

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this