Learning from biased crowdsourced labeling with deep clustering

Ming Wu; Qianmu Li; Fei Yang; Jing Zhang; Victor S. Sheng; Jun Hou

doi:10.1016/j.eswa.2022.118608

Learning from biased crowdsourced labeling with deep clustering

Ming Wu, Qianmu Li, Fei Yang, Jing Zhang, Victor S. Sheng, Jun Hou

Computer Science

Research output: Contribution to journal › Article › peer-review

2 Scopus citations

Abstract

With the rapid development of crowdsourcing learning, amount of labels can be obtained from crowd workers fast and cheaply. However, crowdsourcing learning also faces challenges due to the varied qualities of amateurish crowd workers. To improve the quality of crowd labels, many researchers focus on inferring the ground truth from noisy labels, and take different factors, e.g. the reliability of workers and the difficulty of instances, into consideration to infer the aggregated labels. Nevertheless, to the best of our knowledge, label aggregation for biased crowdsourced labeling scenarios has not been sufficiently studied. Actually, the phenomenon of biased labeling exists in many crowdsourcing annotation tasks and affects the performance of label aggregation. To this end, this paper proposes a novel framework termed Biased Crowdsourcing Learning with Deep Clustering (BCLDC), which involves label aggregation and prediction using deep clustering to improve the quality of aggregated labels and learned models in biased labeling scenarios. BCLDC utilizes a deep clustering method to detect the labeling bias and then eliminates the bias by adjusting the number of labels belonging to the minority class which has fewer labels. Finally, a classifier is trained simultaneously with the aggregated labels inferred by an EM algorithm. Experimental results on six real-world datasets and five synthetic datasets consistently show that the proposed BCLDC outperforms other state-of-the-art algorithms in terms of ground truth inference and prediction.

Original language	English
Article number	118608
Journal	Expert Systems with Applications
Volume	211
DOIs	https://doi.org/10.1016/j.eswa.2022.118608
State	Published - Jan 2023

Keywords

Biased labeling
Classification
Clustering
Crowdsourcing
Label aggregation

Access to Document

10.1016/j.eswa.2022.118608

Cite this

@article{e833961763894bcaa32100e41503fcca,

title = "Learning from biased crowdsourced labeling with deep clustering",

abstract = "With the rapid development of crowdsourcing learning, amount of labels can be obtained from crowd workers fast and cheaply. However, crowdsourcing learning also faces challenges due to the varied qualities of amateurish crowd workers. To improve the quality of crowd labels, many researchers focus on inferring the ground truth from noisy labels, and take different factors, e.g. the reliability of workers and the difficulty of instances, into consideration to infer the aggregated labels. Nevertheless, to the best of our knowledge, label aggregation for biased crowdsourced labeling scenarios has not been sufficiently studied. Actually, the phenomenon of biased labeling exists in many crowdsourcing annotation tasks and affects the performance of label aggregation. To this end, this paper proposes a novel framework termed Biased Crowdsourcing Learning with Deep Clustering (BCLDC), which involves label aggregation and prediction using deep clustering to improve the quality of aggregated labels and learned models in biased labeling scenarios. BCLDC utilizes a deep clustering method to detect the labeling bias and then eliminates the bias by adjusting the number of labels belonging to the minority class which has fewer labels. Finally, a classifier is trained simultaneously with the aggregated labels inferred by an EM algorithm. Experimental results on six real-world datasets and five synthetic datasets consistently show that the proposed BCLDC outperforms other state-of-the-art algorithms in terms of ground truth inference and prediction.",

keywords = "Biased labeling, Classification, Clustering, Crowdsourcing, Label aggregation",

author = "Ming Wu and Qianmu Li and Fei Yang and Jing Zhang and Sheng, {Victor S.} and Jun Hou",

note = "Publisher Copyright: {\textcopyright} 2022 Elsevier Ltd",

year = "2023",

month = jan,

doi = "10.1016/j.eswa.2022.118608",

language = "English",

volume = "211",

journal = "Expert Systems with Applications",

issn = "0957-4174",

}

TY - JOUR

T1 - Learning from biased crowdsourced labeling with deep clustering

AU - Wu, Ming

AU - Li, Qianmu

AU - Yang, Fei

AU - Zhang, Jing

AU - Sheng, Victor S.

AU - Hou, Jun

PY - 2023/1

Y1 - 2023/1

N2 - With the rapid development of crowdsourcing learning, amount of labels can be obtained from crowd workers fast and cheaply. However, crowdsourcing learning also faces challenges due to the varied qualities of amateurish crowd workers. To improve the quality of crowd labels, many researchers focus on inferring the ground truth from noisy labels, and take different factors, e.g. the reliability of workers and the difficulty of instances, into consideration to infer the aggregated labels. Nevertheless, to the best of our knowledge, label aggregation for biased crowdsourced labeling scenarios has not been sufficiently studied. Actually, the phenomenon of biased labeling exists in many crowdsourcing annotation tasks and affects the performance of label aggregation. To this end, this paper proposes a novel framework termed Biased Crowdsourcing Learning with Deep Clustering (BCLDC), which involves label aggregation and prediction using deep clustering to improve the quality of aggregated labels and learned models in biased labeling scenarios. BCLDC utilizes a deep clustering method to detect the labeling bias and then eliminates the bias by adjusting the number of labels belonging to the minority class which has fewer labels. Finally, a classifier is trained simultaneously with the aggregated labels inferred by an EM algorithm. Experimental results on six real-world datasets and five synthetic datasets consistently show that the proposed BCLDC outperforms other state-of-the-art algorithms in terms of ground truth inference and prediction.

AB - With the rapid development of crowdsourcing learning, amount of labels can be obtained from crowd workers fast and cheaply. However, crowdsourcing learning also faces challenges due to the varied qualities of amateurish crowd workers. To improve the quality of crowd labels, many researchers focus on inferring the ground truth from noisy labels, and take different factors, e.g. the reliability of workers and the difficulty of instances, into consideration to infer the aggregated labels. Nevertheless, to the best of our knowledge, label aggregation for biased crowdsourced labeling scenarios has not been sufficiently studied. Actually, the phenomenon of biased labeling exists in many crowdsourcing annotation tasks and affects the performance of label aggregation. To this end, this paper proposes a novel framework termed Biased Crowdsourcing Learning with Deep Clustering (BCLDC), which involves label aggregation and prediction using deep clustering to improve the quality of aggregated labels and learned models in biased labeling scenarios. BCLDC utilizes a deep clustering method to detect the labeling bias and then eliminates the bias by adjusting the number of labels belonging to the minority class which has fewer labels. Finally, a classifier is trained simultaneously with the aggregated labels inferred by an EM algorithm. Experimental results on six real-world datasets and five synthetic datasets consistently show that the proposed BCLDC outperforms other state-of-the-art algorithms in terms of ground truth inference and prediction.

KW - Biased labeling

KW - Classification

KW - Clustering

KW - Crowdsourcing

KW - Label aggregation

UR - http://www.scopus.com/inward/record.url?scp=85137161801&partnerID=8YFLogxK

U2 - 10.1016/j.eswa.2022.118608

DO - 10.1016/j.eswa.2022.118608

M3 - Article

AN - SCOPUS:85137161801

SN - 0957-4174

VL - 211

JO - Expert Systems with Applications

JF - Expert Systems with Applications

M1 - 118608

ER -

Learning from biased crowdsourced labeling with deep clustering

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this