Simple multiple noisy label utilization strategies

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

22 Scopus citations


With the outsourcing of small tasks becoming easier, it is possible to obtain non-expert/imperfect labels at low cost. With low-cost imperfect labeling, it is straightforward to collect multiple labels for the same data items. This paper addresses the strategies of utilizing these multiple labels for improving the performance of supervised learning, based on two basic ideas: majority voting and pairwise solutions. We show several interesting results based on our experiments. The soft majority voting strategies can reduce the bias and roughness, and improve the performance of the directed hard majority voting strategy. Pairwise strategies can completely avoid the bias by having both sides (potential correct and incorrect/ noisy information) considered (for binary classification). They have very good performance whenever there are a few or many labels available. However, it could also keep the noise. The improved variation that reduces the impact of the noisy information is recommended. All five strategies investigated are labeling quality agnostic strategies, and can be applied to real world applications directly. The experimental results show some of them perform better than or at least very close to the gnostic strategies.

Original languageEnglish
Title of host publicationProceedings - 11th IEEE International Conference on Data Mining, ICDM 2011
Number of pages10
StatePublished - 2011
Event11th IEEE International Conference on Data Mining, ICDM 2011 - Vancouver, BC, Canada
Duration: Dec 11 2011Dec 14 2011

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786


Conference11th IEEE International Conference on Data Mining, ICDM 2011
CityVancouver, BC


  • Classification
  • Crowdsourcing
  • Multiple noisy labels
  • Outsourcing


Dive into the research topics of 'Simple multiple noisy label utilization strategies'. Together they form a unique fingerprint.

Cite this