TY - GEN
T1 - Virtual Big Data for GAN Based Data Augmentation
AU - Mansourifar, Hadi
AU - Chen, Lin
AU - Shi, Weidong
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/12
Y1 - 2019/12
N2 - Researchers deal with the class imbalanced problem in many real-world applications and GAN based data augmentation is considered as an efficient approach to address this problem. GANs need a huge training data to generate efficient augmented data. However, the required sufficient training data is not available in many research areas. In this paper, we introduce a new concept called virtual big data to address this problem. We prove that, virtual big data can provide the GANs sufficient training data to generate efficient augmented data with less mode collapse and vanishing generator gradients problems. We show that, the curse of dimensionality which is considered as a negative factor in machine learning can play a positive role to solve vanishing generator gradients via making discriminator less perfect. First, we transform the training data from n dimensional space into m dimensional space where, mathrm{m}=mathrm{c}mathrm{n} and c is concatenation factor. To do so, c different training instances are selected and concatenated to each other to form a mathrm{c}mathrm{n} dimensional instance. Increasing the dimension of training data from n to mathrm{c}mathrm{n} is key to increase the number of training instances from N to mathrm{C}(mathrm{N}, mathrm{c}). Transformed training data are called virtual big data since they differ original training instances in terms of size and dimension. Our experiments show that, V-GAN, a GAN trained by virtual big data can outperform standard GANs when it comes to deal with extremely scarce training data. Furthermore, V-GAN can outperform traditional oversampling techniques in terms of precision, F1 score and Area Under Curve (AUC) score.
AB - Researchers deal with the class imbalanced problem in many real-world applications and GAN based data augmentation is considered as an efficient approach to address this problem. GANs need a huge training data to generate efficient augmented data. However, the required sufficient training data is not available in many research areas. In this paper, we introduce a new concept called virtual big data to address this problem. We prove that, virtual big data can provide the GANs sufficient training data to generate efficient augmented data with less mode collapse and vanishing generator gradients problems. We show that, the curse of dimensionality which is considered as a negative factor in machine learning can play a positive role to solve vanishing generator gradients via making discriminator less perfect. First, we transform the training data from n dimensional space into m dimensional space where, mathrm{m}=mathrm{c}mathrm{n} and c is concatenation factor. To do so, c different training instances are selected and concatenated to each other to form a mathrm{c}mathrm{n} dimensional instance. Increasing the dimension of training data from n to mathrm{c}mathrm{n} is key to increase the number of training instances from N to mathrm{C}(mathrm{N}, mathrm{c}). Transformed training data are called virtual big data since they differ original training instances in terms of size and dimension. Our experiments show that, V-GAN, a GAN trained by virtual big data can outperform standard GANs when it comes to deal with extremely scarce training data. Furthermore, V-GAN can outperform traditional oversampling techniques in terms of precision, F1 score and Area Under Curve (AUC) score.
KW - Data Augmentation
KW - GAN
KW - Imbalanced Data Classification.
UR - http://www.scopus.com/inward/record.url?scp=85081329516&partnerID=8YFLogxK
U2 - 10.1109/BigData47090.2019.9006268
DO - 10.1109/BigData47090.2019.9006268
M3 - Conference contribution
AN - SCOPUS:85081329516
T3 - Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019
SP - 1478
EP - 1487
BT - Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019
A2 - Baru, Chaitanya
A2 - Huan, Jun
A2 - Khan, Latifur
A2 - Hu, Xiaohua Tony
A2 - Ak, Ronay
A2 - Tian, Yuanyuan
A2 - Barga, Roger
A2 - Zaniolo, Carlo
A2 - Lee, Kisung
A2 - Ye, Yanfang Fanny
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 IEEE International Conference on Big Data, Big Data 2019
Y2 - 9 December 2019 through 12 December 2019
ER -