Virtual Big Data for GAN Based Data Augmentation

Hadi Mansourifar, Lin Chen, Weidong Shi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Researchers deal with the class imbalanced problem in many real-world applications and GAN based data augmentation is considered as an efficient approach to address this problem. GANs need a huge training data to generate efficient augmented data. However, the required sufficient training data is not available in many research areas. In this paper, we introduce a new concept called virtual big data to address this problem. We prove that, virtual big data can provide the GANs sufficient training data to generate efficient augmented data with less mode collapse and vanishing generator gradients problems. We show that, the curse of dimensionality which is considered as a negative factor in machine learning can play a positive role to solve vanishing generator gradients via making discriminator less perfect. First, we transform the training data from n dimensional space into m dimensional space where, mathrm{m}=mathrm{c}mathrm{n} and c is concatenation factor. To do so, c different training instances are selected and concatenated to each other to form a mathrm{c}mathrm{n} dimensional instance. Increasing the dimension of training data from n to mathrm{c}mathrm{n} is key to increase the number of training instances from N to mathrm{C}(mathrm{N}, mathrm{c}). Transformed training data are called virtual big data since they differ original training instances in terms of size and dimension. Our experiments show that, V-GAN, a GAN trained by virtual big data can outperform standard GANs when it comes to deal with extremely scarce training data. Furthermore, V-GAN can outperform traditional oversampling techniques in terms of precision, F1 score and Area Under Curve (AUC) score.

Original languageEnglish
Title of host publicationProceedings - 2019 IEEE International Conference on Big Data, Big Data 2019
EditorsChaitanya Baru, Jun Huan, Latifur Khan, Xiaohua Tony Hu, Ronay Ak, Yuanyuan Tian, Roger Barga, Carlo Zaniolo, Kisung Lee, Yanfang Fanny Ye
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1478-1487
Number of pages10
ISBN (Electronic)9781728108582
DOIs
StatePublished - Dec 2019
Event2019 IEEE International Conference on Big Data, Big Data 2019 - Los Angeles, United States
Duration: Dec 9 2019Dec 12 2019

Publication series

NameProceedings - 2019 IEEE International Conference on Big Data, Big Data 2019

Conference

Conference2019 IEEE International Conference on Big Data, Big Data 2019
CountryUnited States
CityLos Angeles
Period12/9/1912/12/19

Keywords

  • Data Augmentation
  • GAN
  • Imbalanced Data Classification.

Fingerprint Dive into the research topics of 'Virtual Big Data for GAN Based Data Augmentation'. Together they form a unique fingerprint.

Cite this