## Abstract

Researchers deal with the class imbalanced problem in many real-world applications and GAN based data augmentation is considered as an efficient approach to address this problem. GANs need a huge training data to generate efficient augmented data. However, the required sufficient training data is not available in many research areas. In this paper, we introduce a new concept called virtual big data to address this problem. We prove that, virtual big data can provide the GANs sufficient training data to generate efficient augmented data with less mode collapse and vanishing generator gradients problems. We show that, the curse of dimensionality which is considered as a negative factor in machine learning can play a positive role to solve vanishing generator gradients via making discriminator less perfect. First, we transform the training data from n dimensional space into m dimensional space where, mathrm{m}=mathrm{c}mathrm{n} and c is concatenation factor. To do so, c different training instances are selected and concatenated to each other to form a mathrm{c}mathrm{n} dimensional instance. Increasing the dimension of training data from n to mathrm{c}mathrm{n} is key to increase the number of training instances from N to mathrm{C}(mathrm{N}, mathrm{c}). Transformed training data are called virtual big data since they differ original training instances in terms of size and dimension. Our experiments show that, V-GAN, a GAN trained by virtual big data can outperform standard GANs when it comes to deal with extremely scarce training data. Furthermore, V-GAN can outperform traditional oversampling techniques in terms of precision, F1 score and Area Under Curve (AUC) score.

Original language | English |
---|---|

Title of host publication | Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019 |

Editors | Chaitanya Baru, Jun Huan, Latifur Khan, Xiaohua Tony Hu, Ronay Ak, Yuanyuan Tian, Roger Barga, Carlo Zaniolo, Kisung Lee, Yanfang Fanny Ye |

Publisher | Institute of Electrical and Electronics Engineers Inc. |

Pages | 1478-1487 |

Number of pages | 10 |

ISBN (Electronic) | 9781728108582 |

DOIs | |

State | Published - Dec 2019 |

Event | 2019 IEEE International Conference on Big Data, Big Data 2019 - Los Angeles, United States Duration: Dec 9 2019 → Dec 12 2019 |

### Publication series

Name | Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019 |
---|

### Conference

Conference | 2019 IEEE International Conference on Big Data, Big Data 2019 |
---|---|

Country | United States |

City | Los Angeles |

Period | 12/9/19 → 12/12/19 |

## Keywords

- Data Augmentation
- GAN
- Imbalanced Data Classification.