Active learning with imbalanced multiple noisy labeling

Jing Zhang, Xindong Wu, Victor S. Shengs

Research output: Contribution to journalArticlepeer-review

45 Scopus citations


With crowdsourcing systems, it is easy to collect multiple noisy labels for the same object for supervised learning. This dynamic annotation procedure fits the active learning perspective and accompanies the imbalanced multiple noisy labeling problem. This paper proposes a novel active learning framework with multiple imperfect annotators involved in crowdsourcing systems. The framework contains two core procedures: label integration and instance selection. In the label integration procedure, a positive label threshold (PLAT) algorithm is introduced to induce the class membership from the multiple noisy label set of each instance in a training set. PLAT solves the imbalanced labeling problem by dynamically adjusting the threshold for determining the class membership of an example. Furthermore, three novel instance selection strategies are proposed to adapt PLAT for improving the learning performance. These strategies are respectively based on the uncertainty derived from the multiple labels, the uncertainty derived from the learned model, and the combination method (CFI). Experimental results on 12 datasets with different underlying class distributions demonstrate that the three novel instance selection strategies significantly improve the learning performance, and CFI has the best performance when labeling behaviors exhibit different levels of imbalance in crowdsourcing systems. We also apply our methods to a real-world scenario, obtaining noisy labels from Amazon Mechanical Turk, and show that our proposed strategies achieve very high performance.

Original languageEnglish
Article number6878424
Pages (from-to)1095-1107
Number of pages13
JournalIEEE Transactions on Cybernetics
Issue number5
StatePublished - May 1 2015


  • Active learning
  • crowdsourcing
  • imbalanced learning
  • repeated labeling
  • supervised classification


Dive into the research topics of 'Active learning with imbalanced multiple noisy labeling'. Together they form a unique fingerprint.

Cite this