CSRDA: Cost-sensitive Regularized Dual Averaging for Handling Imbalanced and High-dimensional Streaming Data

Zhong Chen, Zhide Fang, Victor Sheng, Andrea Edwards, Kun Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Class-imbalance is one of the most challenging problems in online learning due to its impact on the prediction capability of data stream mining models. Most existing approaches for online learning lack an effective mechanism to handle high-dimensional streaming data with skewed class distributions, resulting in insufficient model interpretation and deterioration of online performance. In this paper, we develop a cost-sensitive regularized dual averaging (CSRDA) method to tackle this problem. Our proposed method substantially extends the influential regularized dual averaging (RDA) method by formulating a new convex optimization function. Specifically, two $R$ 1 -norm regularized cost-sensitive objective functions are directly optimized, respectively. We then theoretically analyze CSRDA's regret bounds and the bounds of primal variables. Thus, CSRDA benefits from achieving a theoretical convergence of balanced cost and sparsity for severe imbalanced and high-dimensional streaming data mining. To validate our method, we conduct extensive experiments on six benchmark streaming datasets with varied imbalance ratios. The experimental results demonstrate that, compared to other baseline methods, CSRDA not only improves classification performance, but also successfully captures sparse features more effectively, hence has better interpretability.

Original languageEnglish
Title of host publicationProceedings - 12th IEEE International Conference on Big Knowledge, ICBK 2021
EditorsZhiguo Gong, Xue Li, Sule Gunduz Oguducu, Lei Chen, Baltasar Fernandez Manjon, Xindong Wu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages164-173
Number of pages10
ISBN (Electronic)9781665438582
DOIs
StatePublished - 2021
Event12th IEEE International Conference on Big Knowledge, ICBK 2021 - Virtual, Auckland, New Zealand
Duration: Dec 7 2021Dec 8 2021

Publication series

NameProceedings - 12th IEEE International Conference on Big Knowledge, ICBK 2021

Conference

Conference12th IEEE International Conference on Big Knowledge, ICBK 2021
Country/TerritoryNew Zealand
CityVirtual, Auckland
Period12/7/2112/8/21

Keywords

  • Cost-sensitive metrics
  • Imbalance ratio
  • Online learning
  • Sparsity
  • Streaming data

Fingerprint

Dive into the research topics of 'CSRDA: Cost-sensitive Regularized Dual Averaging for Handling Imbalanced and High-dimensional Streaming Data'. Together they form a unique fingerprint.

Cite this