Majority Voting and Pairing with Multiple Noisy Labeling

Victor S. Sheng, Jing Zhang, Bin Gu, Xindong Wu

Research output: Contribution to journalArticle

4 Scopus citations

Abstract

With the crowdsourcing of small tasks becoming easier, it is possible to obtain non-expert/imperfect labels at low cost. With low-cost imperfect labeling, it is straightforward to collect multiple labels for the same data items. This paper proposes strategies of utilizing these multiple labels for supervised learning, based on two basic ideas: majority voting and pairing. We show several interesting results based on our experiments. (i) The strategies based on the majority voting idea work well under the situation where the certainty level is high. (ii) On the contrary, the pairing strategies are more preferable under the situation where the certainty level is low. (iii) Among the majority voting strategies, soft majority voting can reduce the bias and roughness, and perform better than majority voting. (iv) Pairing can completely avoid the bias by having both sides (potentially correct and incorrect/noisy information) considered. Beta estimation is applied to reduce the impact of the noise in pairing. Our experimental results show that pairing with Beta estimation always performs well under different certainty levels. (v) All strategies investigated are labeling quality agnostic strategies for real-world applications, and some of them perform better than or at least very close to the gnostic strategies.

Original languageEnglish
Article number7835129
Pages (from-to)1355-1368
Number of pages14
JournalIEEE Transactions on Knowledge and Data Engineering
Volume31
Issue number7
DOIs
StatePublished - Jul 1 2019

    Fingerprint

Keywords

  • Crowdsourcing
  • classification
  • data preprocessing
  • multiple noisy labels

Cite this