Learning from crowds with active learning and self-healing

Zhenyu Shu, Victor S. Sheng, Jingjing Li

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

With the development of crowdsourcing, data acquisition for supervised learning from annotators all over the world becomes simple and economical. To improve accuracy, it is nature to obtain multiple noisy labels (i.e., a multiple label set) for each example from the crowd. Then, consensus algorithms can infer the estimated ground truth from the multiple label set for each example. The estimated ground truth is also called an integrated label, which could be a noise. That is, a dataset constructed via integrating the multiple noisy labels for each example in a crowdsourcing dataset (called an integrated dataset) still contains noises. In order to further improve the data quality of an integrated dataset, so that to improve the performance of a model learned from the integrated dataset, this paper proposes a framework that integrates active learning with the self-healing of a model together. With active learning, a limited number of examples from the integrated dataset, which are most likely noises, are selected for the oracle to correct; with the self-healing of a model, the data quality of the integrated dataset can be also improved automatically. From our experimental results on eight simulated crowdsourcing datasets with three popular consensus algorithms, we draw some conclusions as follows. (1) Our proposed framework does improve the performance of a model learned from the integrated dataset. (2) The simple active learning selection strategy based on uncertainty estimation can identify noises in the integrated dataset. (3) Self-healing is efficient and effective to improve the data quality of the integrated dataset, so that it improves the accuracy of a model learned from the integrated dataset. We further investigate our proposed framework on a real-world crowdsourcing dataset collected from Amazon Mechanical Turk, and the above conclusions are sustained.

Original languageEnglish
Pages (from-to)2883-2894
Number of pages12
JournalNeural Computing and Applications
Volume30
Issue number9
DOIs
StatePublished - Nov 1 2018

Keywords

  • Active learning
  • Crowdsourcing
  • Machine learning
  • Supervised classification

Fingerprint Dive into the research topics of 'Learning from crowds with active learning and self-healing'. Together they form a unique fingerprint.

Cite this