TY - JOUR
T1 - Crowdsourced Label Aggregation Using Bilayer Collaborative Clustering
AU - Zhang, Jing
AU - Sheng, Victor S.
AU - Wu, Jian
N1 - Publisher Copyright:
© 2012 IEEE.
PY - 2019/10
Y1 - 2019/10
N2 - With online crowdsourcing platforms, labels can be acquired at relatively low costs from massive nonexpert workers. To improve the quality of labels obtained from these imperfect crowdsourced workers, we usually let different workers provide labels for the same instance. Then, the true labels for all instances are estimated from these multiple noisy labels. This traditional general-purpose label aggregation process, solely relying on the collected noisy labels, cannot significantly improve the accuracy of integrated labels under a low labeling quality circumstance. This paper proposes a novel bilayer collaborative clustering (BLCC) method for the label aggregation in crowdsourcing. BLCC first generates the conceptual-level features for the instances from their multiple noisy labels and infers the initially integrated labels by performing clustering on the conceptual-level features. Then, it performs another clustering on the physical-level features to form the estimations of the true labels on the physical layer. The clustering results on both layers can facilitate in tracking the changes in the uncertainties of the instances. Finally, the initially integrated labels that are likely to be wrongly inferred on the conceptual layer can be addressed using the estimated labels on the physical layer. The clustering processes on both layers can keep providing guidance information for each other in the multiple label remedy rounds. The experimental results on 12 real-world crowdsourcing data sets show that the performance of the proposed method in terms of accuracy is better than that of the state-of-The-Art methods.
AB - With online crowdsourcing platforms, labels can be acquired at relatively low costs from massive nonexpert workers. To improve the quality of labels obtained from these imperfect crowdsourced workers, we usually let different workers provide labels for the same instance. Then, the true labels for all instances are estimated from these multiple noisy labels. This traditional general-purpose label aggregation process, solely relying on the collected noisy labels, cannot significantly improve the accuracy of integrated labels under a low labeling quality circumstance. This paper proposes a novel bilayer collaborative clustering (BLCC) method for the label aggregation in crowdsourcing. BLCC first generates the conceptual-level features for the instances from their multiple noisy labels and infers the initially integrated labels by performing clustering on the conceptual-level features. Then, it performs another clustering on the physical-level features to form the estimations of the true labels on the physical layer. The clustering results on both layers can facilitate in tracking the changes in the uncertainties of the instances. Finally, the initially integrated labels that are likely to be wrongly inferred on the conceptual layer can be addressed using the estimated labels on the physical layer. The clustering processes on both layers can keep providing guidance information for each other in the multiple label remedy rounds. The experimental results on 12 real-world crowdsourcing data sets show that the performance of the proposed method in terms of accuracy is better than that of the state-of-The-Art methods.
KW - Clustering
KW - crowdsourcing
KW - label aggregation
KW - label noise handling
KW - truth inference
UR - http://www.scopus.com/inward/record.url?scp=85060915560&partnerID=8YFLogxK
U2 - 10.1109/TNNLS.2018.2890148
DO - 10.1109/TNNLS.2018.2890148
M3 - Article
C2 - 30703041
AN - SCOPUS:85060915560
SN - 2162-237X
VL - 30
SP - 3172
EP - 3185
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
IS - 10
M1 - 8626164
ER -