Consensus algorithms for biased labeling in crowdsourcing

Jing Zhang; Victor S. Sheng; Qianmu Li; Jian Wu; Xindong Wu

doi:10.1016/j.ins.2016.12.026

Consensus algorithms for biased labeling in crowdsourcing

Jing Zhang, Victor S. Sheng, Qianmu Li, Jian Wu, Xindong Wu

Computer Science

Research output: Contribution to journal › Article › peer-review

31 Scopus citations

Abstract

Although it has become an accepted lay view that when labeling objects through crowdsourcing systems, non-expert annotators often exhibit biases, this argument lacks sufficient evidential observation and systematic empirical study. This paper initially analyzes eight real-world datasets from different domains whose class labels were collected from crowdsourcing systems. Our analyses show that biased labeling is a systematic tendency for binary categorization; in other words, for a large number of annotators, their labeling qualities on the negative class (supposed to be the majority) are significantly greater than are those on the positive class (minority). Therefore, the paper empirically studies the performance of four existing EM-based consensus algorithms, DS, GLAD, RY, and ZenCrowd, on these datasets. Our investigation shows that all of these state-of-the-art algorithms ignore the potential bias characteristics of datasets and perform badly although they model the complexity of the systems. To address the issue of handling biased labeling, the paper further proposes a novel consensus algorithm, namely adaptive weighted majority voting (AWMV), based on the statistical difference between the labeling qualities of the two classes. AWMV utilizes the frequency of positive labels in the multiple noisy label set of each example to obtain a bias rate and then assigns weights derived from the bias rate to negative and positive labels. Comparison results among the five consensus algorithms (AWMV and the four existing) show that the proposed AWMV algorithm has the best overall performance. Finally, this paper notes some potential related topics for future study.

Original language	English
Pages (from-to)	254-273
Number of pages	20
Journal	Information Sciences
Volume	382-383
DOIs	https://doi.org/10.1016/j.ins.2016.12.026
State	Published - Mar 1 2017

Keywords

Consensus
Crowdsourcing
EM algorithm
Labeling bias
Weighted majority voting

Access to Document

10.1016/j.ins.2016.12.026

Cite this

@article{2d6d9dd09ef6460e87912723e7da3a61,

title = "Consensus algorithms for biased labeling in crowdsourcing",

abstract = "Although it has become an accepted lay view that when labeling objects through crowdsourcing systems, non-expert annotators often exhibit biases, this argument lacks sufficient evidential observation and systematic empirical study. This paper initially analyzes eight real-world datasets from different domains whose class labels were collected from crowdsourcing systems. Our analyses show that biased labeling is a systematic tendency for binary categorization; in other words, for a large number of annotators, their labeling qualities on the negative class (supposed to be the majority) are significantly greater than are those on the positive class (minority). Therefore, the paper empirically studies the performance of four existing EM-based consensus algorithms, DS, GLAD, RY, and ZenCrowd, on these datasets. Our investigation shows that all of these state-of-the-art algorithms ignore the potential bias characteristics of datasets and perform badly although they model the complexity of the systems. To address the issue of handling biased labeling, the paper further proposes a novel consensus algorithm, namely adaptive weighted majority voting (AWMV), based on the statistical difference between the labeling qualities of the two classes. AWMV utilizes the frequency of positive labels in the multiple noisy label set of each example to obtain a bias rate and then assigns weights derived from the bias rate to negative and positive labels. Comparison results among the five consensus algorithms (AWMV and the four existing) show that the proposed AWMV algorithm has the best overall performance. Finally, this paper notes some potential related topics for future study.",

keywords = "Consensus, Crowdsourcing, EM algorithm, Labeling bias, Weighted majority voting",

author = "Jing Zhang and Sheng, {Victor S.} and Qianmu Li and Jian Wu and Xindong Wu",

note = "Publisher Copyright: {\textcopyright} 2016 Elsevier Inc.",

year = "2017",

month = mar,

day = "1",

doi = "10.1016/j.ins.2016.12.026",

language = "English",

volume = "382-383",

pages = "254--273",

journal = "Information Sciences",

issn = "0020-0255",

}

TY - JOUR

T1 - Consensus algorithms for biased labeling in crowdsourcing

AU - Zhang, Jing

AU - Sheng, Victor S.

AU - Li, Qianmu

AU - Wu, Jian

AU - Wu, Xindong

PY - 2017/3/1

Y1 - 2017/3/1

N2 - Although it has become an accepted lay view that when labeling objects through crowdsourcing systems, non-expert annotators often exhibit biases, this argument lacks sufficient evidential observation and systematic empirical study. This paper initially analyzes eight real-world datasets from different domains whose class labels were collected from crowdsourcing systems. Our analyses show that biased labeling is a systematic tendency for binary categorization; in other words, for a large number of annotators, their labeling qualities on the negative class (supposed to be the majority) are significantly greater than are those on the positive class (minority). Therefore, the paper empirically studies the performance of four existing EM-based consensus algorithms, DS, GLAD, RY, and ZenCrowd, on these datasets. Our investigation shows that all of these state-of-the-art algorithms ignore the potential bias characteristics of datasets and perform badly although they model the complexity of the systems. To address the issue of handling biased labeling, the paper further proposes a novel consensus algorithm, namely adaptive weighted majority voting (AWMV), based on the statistical difference between the labeling qualities of the two classes. AWMV utilizes the frequency of positive labels in the multiple noisy label set of each example to obtain a bias rate and then assigns weights derived from the bias rate to negative and positive labels. Comparison results among the five consensus algorithms (AWMV and the four existing) show that the proposed AWMV algorithm has the best overall performance. Finally, this paper notes some potential related topics for future study.

AB - Although it has become an accepted lay view that when labeling objects through crowdsourcing systems, non-expert annotators often exhibit biases, this argument lacks sufficient evidential observation and systematic empirical study. This paper initially analyzes eight real-world datasets from different domains whose class labels were collected from crowdsourcing systems. Our analyses show that biased labeling is a systematic tendency for binary categorization; in other words, for a large number of annotators, their labeling qualities on the negative class (supposed to be the majority) are significantly greater than are those on the positive class (minority). Therefore, the paper empirically studies the performance of four existing EM-based consensus algorithms, DS, GLAD, RY, and ZenCrowd, on these datasets. Our investigation shows that all of these state-of-the-art algorithms ignore the potential bias characteristics of datasets and perform badly although they model the complexity of the systems. To address the issue of handling biased labeling, the paper further proposes a novel consensus algorithm, namely adaptive weighted majority voting (AWMV), based on the statistical difference between the labeling qualities of the two classes. AWMV utilizes the frequency of positive labels in the multiple noisy label set of each example to obtain a bias rate and then assigns weights derived from the bias rate to negative and positive labels. Comparison results among the five consensus algorithms (AWMV and the four existing) show that the proposed AWMV algorithm has the best overall performance. Finally, this paper notes some potential related topics for future study.

KW - Consensus

KW - Crowdsourcing

KW - EM algorithm

KW - Labeling bias

KW - Weighted majority voting

UR - http://www.scopus.com/inward/record.url?scp=85007158228&partnerID=8YFLogxK

U2 - 10.1016/j.ins.2016.12.026

DO - 10.1016/j.ins.2016.12.026

M3 - Article

AN - SCOPUS:85007158228

SN - 0020-0255

VL - 382-383

SP - 254

EP - 273

JO - Information Sciences

JF - Information Sciences

ER -

Consensus algorithms for biased labeling in crowdsourcing

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this