TY - JOUR
T1 - Consensus algorithms for biased labeling in crowdsourcing
AU - Zhang, Jing
AU - Sheng, Victor S.
AU - Li, Qianmu
AU - Wu, Jian
AU - Wu, Xindong
N1 - Funding Information:
The authors thank anonymous reviewers for their insightful and constructive comments that have helped improve the quality of this paper and also thank Professor John Boyland from University of Wisconsin at Milwaukee for helping us improve the English presentation. This research has been supported by the National Natural Science Foundation of China under Grant no. 61603186, the Natural Science Foundation of Jiangsu Province, China, under Grant no. BK20160843, the China Postdoctoral Science Foundation under Grant no. 2016M590457, the Postdoctoral Science Foundation of Jiangsu Province, China under Grant no. 1601199C, the Project supported by the Jiangsu Key Laboratory of Image and Video Understanding for Social Safety (Nanjing University of Science and Technology) under Grant no. 30916014107, the U.S. National Science Foundation under Grant nos. 1613950 and 1115417, the Program for Changjiang Scholars and Innovative Research Team in University (PCSIRT) of the Ministry of Education, China, under Grant no. IRT13059.
Publisher Copyright:
© 2016 Elsevier Inc.
PY - 2017/3/1
Y1 - 2017/3/1
N2 - Although it has become an accepted lay view that when labeling objects through crowdsourcing systems, non-expert annotators often exhibit biases, this argument lacks sufficient evidential observation and systematic empirical study. This paper initially analyzes eight real-world datasets from different domains whose class labels were collected from crowdsourcing systems. Our analyses show that biased labeling is a systematic tendency for binary categorization; in other words, for a large number of annotators, their labeling qualities on the negative class (supposed to be the majority) are significantly greater than are those on the positive class (minority). Therefore, the paper empirically studies the performance of four existing EM-based consensus algorithms, DS, GLAD, RY, and ZenCrowd, on these datasets. Our investigation shows that all of these state-of-the-art algorithms ignore the potential bias characteristics of datasets and perform badly although they model the complexity of the systems. To address the issue of handling biased labeling, the paper further proposes a novel consensus algorithm, namely adaptive weighted majority voting (AWMV), based on the statistical difference between the labeling qualities of the two classes. AWMV utilizes the frequency of positive labels in the multiple noisy label set of each example to obtain a bias rate and then assigns weights derived from the bias rate to negative and positive labels. Comparison results among the five consensus algorithms (AWMV and the four existing) show that the proposed AWMV algorithm has the best overall performance. Finally, this paper notes some potential related topics for future study.
AB - Although it has become an accepted lay view that when labeling objects through crowdsourcing systems, non-expert annotators often exhibit biases, this argument lacks sufficient evidential observation and systematic empirical study. This paper initially analyzes eight real-world datasets from different domains whose class labels were collected from crowdsourcing systems. Our analyses show that biased labeling is a systematic tendency for binary categorization; in other words, for a large number of annotators, their labeling qualities on the negative class (supposed to be the majority) are significantly greater than are those on the positive class (minority). Therefore, the paper empirically studies the performance of four existing EM-based consensus algorithms, DS, GLAD, RY, and ZenCrowd, on these datasets. Our investigation shows that all of these state-of-the-art algorithms ignore the potential bias characteristics of datasets and perform badly although they model the complexity of the systems. To address the issue of handling biased labeling, the paper further proposes a novel consensus algorithm, namely adaptive weighted majority voting (AWMV), based on the statistical difference between the labeling qualities of the two classes. AWMV utilizes the frequency of positive labels in the multiple noisy label set of each example to obtain a bias rate and then assigns weights derived from the bias rate to negative and positive labels. Comparison results among the five consensus algorithms (AWMV and the four existing) show that the proposed AWMV algorithm has the best overall performance. Finally, this paper notes some potential related topics for future study.
KW - Consensus
KW - Crowdsourcing
KW - EM algorithm
KW - Labeling bias
KW - Weighted majority voting
UR - http://www.scopus.com/inward/record.url?scp=85007158228&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2016.12.026
DO - 10.1016/j.ins.2016.12.026
M3 - Article
AN - SCOPUS:85007158228
SN - 0020-0255
VL - 382-383
SP - 254
EP - 273
JO - Information Sciences
JF - Information Sciences
ER -