TY - GEN
T1 - Integrating active learning with supervision for crowdsourcing generalization
AU - Shu, Zhenyu
AU - Sheng, Victor S.
AU - Zhang, Yang
AU - Wang, Dianhong
AU - Zhang, Jing
AU - Chen, Heng
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2016/3/2
Y1 - 2016/3/2
N2 - With various online crowdsourcing platforms, it is easy to collect multiple labels for the same examples from the crowd. Consensus integration algorithms can infer the estimated ground truths from the multiple label sets of these crowdsourcing datasets. However, it couldn't be avoided that these integrated (estimated) labels still contain noises. In order to further improve the performance of a model learned from data with these integrated labels, we propose an active learning framework to further improve the data quality, such that to improve the model quality, through acquiring limited true labels from experts (the oracle). We further investigate two active learning strategies in terms of two uncertainty measures (i.e., CLUE and MUE) within the active learning framework. From our experimental results on eight simulation crowdsourcing datasets and four real-world crowdsourcing datasets with three popular consensus integration algorithms, we draw several conclusions as follows. (i) Our active learning framework with the input from the oracle significantly improves the generalization ability of the model learned from crowdsourcing data. (ii) Our two active learning strategies outperform a random active learning strategy.
AB - With various online crowdsourcing platforms, it is easy to collect multiple labels for the same examples from the crowd. Consensus integration algorithms can infer the estimated ground truths from the multiple label sets of these crowdsourcing datasets. However, it couldn't be avoided that these integrated (estimated) labels still contain noises. In order to further improve the performance of a model learned from data with these integrated labels, we propose an active learning framework to further improve the data quality, such that to improve the model quality, through acquiring limited true labels from experts (the oracle). We further investigate two active learning strategies in terms of two uncertainty measures (i.e., CLUE and MUE) within the active learning framework. From our experimental results on eight simulation crowdsourcing datasets and four real-world crowdsourcing datasets with three popular consensus integration algorithms, we draw several conclusions as follows. (i) Our active learning framework with the input from the oracle significantly improves the generalization ability of the model learned from crowdsourcing data. (ii) Our two active learning strategies outperform a random active learning strategy.
KW - Active learning
KW - Crowdsourcing
KW - Supervised classification
UR - http://www.scopus.com/inward/record.url?scp=84969706165&partnerID=8YFLogxK
U2 - 10.1109/ICMLA.2015.13
DO - 10.1109/ICMLA.2015.13
M3 - Conference contribution
AN - SCOPUS:84969706165
T3 - Proceedings - 2015 IEEE 14th International Conference on Machine Learning and Applications, ICMLA 2015
SP - 232
EP - 237
BT - Proceedings - 2015 IEEE 14th International Conference on Machine Learning and Applications, ICMLA 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - IEEE 14th International Conference on Machine Learning and Applications, ICMLA 2015
Y2 - 9 December 2015 through 11 December 2015
ER -