Semi-Supervised Multi-Label Learning from Crowds via Deep Sequential Generative Model

Wanli Shi, Victor S. Sheng, Xiang Li, Bin Gu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Multi-label classification (MLC) is pervasive in real-world applications. Conventional MLC algorithms assume that enough ground truth labels are available for training a classifier. While in reality, obtaining ground truth labels is expensive and time-consuming. In the field of data mining, it is more efficient to use crowdsourcing for label collection. In this setting, an MLC algorithm needs to deal with the noisiness of the crowdsourced labels as well as the remaining massive unlabeled data. In this paper, we propose a deep generative model to describe the label generation process for this semi-supervised multi-label learning problem. Although deep generative models are widely used for MLC problems, no previous work could address the noisy crowdsourced multi-labels and unlabeled data simultaneously. To address this challenging problem, our novel generative model incorporates latent variables to describe the labeled/unlabeled data as well as the labeling process of crowdsourcing. We introduce an efficient sequential inference model to approximate the model posterior and infer the ground truth labels. Our experimental results on various scales of datasets demonstrate the effectiveness of our proposed model. It performs favorably against four state-of-the-art deep generative models.

Original languageEnglish
Title of host publicationKDD 2020 - Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages1141-1149
Number of pages9
ISBN (Electronic)9781450379984
DOIs
StatePublished - Aug 23 2020
Event26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2020 - Virtual, Online, United States
Duration: Aug 23 2020Aug 27 2020

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Conference

Conference26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2020
Country/TerritoryUnited States
CityVirtual, Online
Period08/23/2008/27/20

Keywords

  • crowdsourcing
  • deep generative model
  • multi-label
  • semi-supervised

Fingerprint

Dive into the research topics of 'Semi-Supervised Multi-Label Learning from Crowds via Deep Sequential Generative Model'. Together they form a unique fingerprint.

Cite this