TY - JOUR
T1 - Majority Voting and Pairing with Multiple Noisy Labeling
AU - Sheng, Victor S.
AU - Zhang, Jing
AU - Gu, Bin
AU - Wu, Xindong
N1 - Funding Information:
This research has been supported by the U.S. National Science Foundation under Grant No. IIS-1115417, the National Natural Science Foundation of China under Grant No. 61603186, 61472267, the Natural Science Foundation of Jiangsu Province, China, under Grant No. BK20160843, the China Postdoctoral Science Foundation under Grant No. 2016M590457, the National 863 Project of China under Grant No. 2006AA12A106, the Project Funded by the Priority Academic Program Development (PAPD) of Jiangsu Higher Education Institutions, Program for Changjiang Scholars and Innovative Research Team in University (PCSIRT) of the Ministry of Education, China, under Grant No. IRT13059, and the National 973 Program of China under Grant No. 2013CB329604.
Publisher Copyright:
© 1989-2012 IEEE.
PY - 2019/7/1
Y1 - 2019/7/1
N2 - With the crowdsourcing of small tasks becoming easier, it is possible to obtain non-expert/imperfect labels at low cost. With low-cost imperfect labeling, it is straightforward to collect multiple labels for the same data items. This paper proposes strategies of utilizing these multiple labels for supervised learning, based on two basic ideas: majority voting and pairing. We show several interesting results based on our experiments. (i) The strategies based on the majority voting idea work well under the situation where the certainty level is high. (ii) On the contrary, the pairing strategies are more preferable under the situation where the certainty level is low. (iii) Among the majority voting strategies, soft majority voting can reduce the bias and roughness, and perform better than majority voting. (iv) Pairing can completely avoid the bias by having both sides (potentially correct and incorrect/noisy information) considered. Beta estimation is applied to reduce the impact of the noise in pairing. Our experimental results show that pairing with Beta estimation always performs well under different certainty levels. (v) All strategies investigated are labeling quality agnostic strategies for real-world applications, and some of them perform better than or at least very close to the gnostic strategies.
AB - With the crowdsourcing of small tasks becoming easier, it is possible to obtain non-expert/imperfect labels at low cost. With low-cost imperfect labeling, it is straightforward to collect multiple labels for the same data items. This paper proposes strategies of utilizing these multiple labels for supervised learning, based on two basic ideas: majority voting and pairing. We show several interesting results based on our experiments. (i) The strategies based on the majority voting idea work well under the situation where the certainty level is high. (ii) On the contrary, the pairing strategies are more preferable under the situation where the certainty level is low. (iii) Among the majority voting strategies, soft majority voting can reduce the bias and roughness, and perform better than majority voting. (iv) Pairing can completely avoid the bias by having both sides (potentially correct and incorrect/noisy information) considered. Beta estimation is applied to reduce the impact of the noise in pairing. Our experimental results show that pairing with Beta estimation always performs well under different certainty levels. (v) All strategies investigated are labeling quality agnostic strategies for real-world applications, and some of them perform better than or at least very close to the gnostic strategies.
KW - Crowdsourcing
KW - classification
KW - data preprocessing
KW - multiple noisy labels
UR - http://www.scopus.com/inward/record.url?scp=85044040579&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2017.2659740
DO - 10.1109/TKDE.2017.2659740
M3 - Article
AN - SCOPUS:85044040579
VL - 31
SP - 1355
EP - 1368
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
SN - 1041-4347
IS - 7
M1 - 7835129
ER -