TY - JOUR
T1 - Learning from biased crowdsourced labeling with deep clustering
AU - Wu, Ming
AU - Li, Qianmu
AU - Yang, Fei
AU - Zhang, Jing
AU - Sheng, Victor S.
AU - Hou, Jun
N1 - Publisher Copyright:
© 2022 Elsevier Ltd
PY - 2023/1
Y1 - 2023/1
N2 - With the rapid development of crowdsourcing learning, amount of labels can be obtained from crowd workers fast and cheaply. However, crowdsourcing learning also faces challenges due to the varied qualities of amateurish crowd workers. To improve the quality of crowd labels, many researchers focus on inferring the ground truth from noisy labels, and take different factors, e.g. the reliability of workers and the difficulty of instances, into consideration to infer the aggregated labels. Nevertheless, to the best of our knowledge, label aggregation for biased crowdsourced labeling scenarios has not been sufficiently studied. Actually, the phenomenon of biased labeling exists in many crowdsourcing annotation tasks and affects the performance of label aggregation. To this end, this paper proposes a novel framework termed Biased Crowdsourcing Learning with Deep Clustering (BCLDC), which involves label aggregation and prediction using deep clustering to improve the quality of aggregated labels and learned models in biased labeling scenarios. BCLDC utilizes a deep clustering method to detect the labeling bias and then eliminates the bias by adjusting the number of labels belonging to the minority class which has fewer labels. Finally, a classifier is trained simultaneously with the aggregated labels inferred by an EM algorithm. Experimental results on six real-world datasets and five synthetic datasets consistently show that the proposed BCLDC outperforms other state-of-the-art algorithms in terms of ground truth inference and prediction.
AB - With the rapid development of crowdsourcing learning, amount of labels can be obtained from crowd workers fast and cheaply. However, crowdsourcing learning also faces challenges due to the varied qualities of amateurish crowd workers. To improve the quality of crowd labels, many researchers focus on inferring the ground truth from noisy labels, and take different factors, e.g. the reliability of workers and the difficulty of instances, into consideration to infer the aggregated labels. Nevertheless, to the best of our knowledge, label aggregation for biased crowdsourced labeling scenarios has not been sufficiently studied. Actually, the phenomenon of biased labeling exists in many crowdsourcing annotation tasks and affects the performance of label aggregation. To this end, this paper proposes a novel framework termed Biased Crowdsourcing Learning with Deep Clustering (BCLDC), which involves label aggregation and prediction using deep clustering to improve the quality of aggregated labels and learned models in biased labeling scenarios. BCLDC utilizes a deep clustering method to detect the labeling bias and then eliminates the bias by adjusting the number of labels belonging to the minority class which has fewer labels. Finally, a classifier is trained simultaneously with the aggregated labels inferred by an EM algorithm. Experimental results on six real-world datasets and five synthetic datasets consistently show that the proposed BCLDC outperforms other state-of-the-art algorithms in terms of ground truth inference and prediction.
KW - Biased labeling
KW - Classification
KW - Clustering
KW - Crowdsourcing
KW - Label aggregation
UR - http://www.scopus.com/inward/record.url?scp=85137161801&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2022.118608
DO - 10.1016/j.eswa.2022.118608
M3 - Article
AN - SCOPUS:85137161801
SN - 0957-4174
VL - 211
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 118608
ER -