TY - JOUR
T1 - Learning from crowdsourced labeled data
T2 - a survey
AU - Zhang, Jing
AU - Wu, Xindong
AU - Sheng, Victor S.
N1 - Funding Information:
This research has been supported by the China Postdoctoral Science Foundation under grant 2016M590457, the Program for Changjiang Scholars and Innovative Research Team in University (PCSIRT) of the Ministry of Education, China, under grant IRT13059, the National 973 Program of China under grant 2013CB329604, and the US National Science Foundation under grant IIS-1115417.
Publisher Copyright:
© 2016, Springer Science+Business Media Dordrecht.
PY - 2016/12/1
Y1 - 2016/12/1
N2 - With the rapid growing of crowdsourcing systems, quite a few applications based on a supervised learning paradigm can easily obtain massive labeled data at a relatively low cost. However, due to the variable uncertainty of crowdsourced labelers, learning procedures face great challenges. Thus, improving the qualities of labels and learning models plays a key role in learning from the crowdsourced labeled data. In this survey, we first introduce the basic concepts of the qualities of labels and learning models. Then, by reviewing recently proposed models and algorithms on ground truth inference and learning models, we analyze connections and distinctions among these techniques as well as clarify the level of the progress of related researches. In order to facilitate the studies in this field, we also introduce open accessible real-world data sets collected from crowdsourcing systems and open source libraries and tools. Finally, some potential issues for future studies are discussed.
AB - With the rapid growing of crowdsourcing systems, quite a few applications based on a supervised learning paradigm can easily obtain massive labeled data at a relatively low cost. However, due to the variable uncertainty of crowdsourced labelers, learning procedures face great challenges. Thus, improving the qualities of labels and learning models plays a key role in learning from the crowdsourced labeled data. In this survey, we first introduce the basic concepts of the qualities of labels and learning models. Then, by reviewing recently proposed models and algorithms on ground truth inference and learning models, we analyze connections and distinctions among these techniques as well as clarify the level of the progress of related researches. In order to facilitate the studies in this field, we also introduce open accessible real-world data sets collected from crowdsourcing systems and open source libraries and tools. Finally, some potential issues for future studies are discussed.
KW - Crowdsourcing
KW - Ground truth inference
KW - Label quality
KW - Learning from crowds
KW - Learning model quality
KW - Multiple noisy labeling
UR - http://www.scopus.com/inward/record.url?scp=84976512003&partnerID=8YFLogxK
U2 - 10.1007/s10462-016-9491-9
DO - 10.1007/s10462-016-9491-9
M3 - Article
AN - SCOPUS:84976512003
VL - 46
SP - 543
EP - 576
JO - Artificial Intelligence Review
JF - Artificial Intelligence Review
SN - 0269-2821
IS - 4
ER -