TY - GEN
T1 - CSRDA
T2 - 12th IEEE International Conference on Big Knowledge, ICBK 2021
AU - Chen, Zhong
AU - Fang, Zhide
AU - Sheng, Victor
AU - Edwards, Andrea
AU - Zhang, Kun
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - Class-imbalance is one of the most challenging problems in online learning due to its impact on the prediction capability of data stream mining models. Most existing approaches for online learning lack an effective mechanism to handle high-dimensional streaming data with skewed class distributions, resulting in insufficient model interpretation and deterioration of online performance. In this paper, we develop a cost-sensitive regularized dual averaging (CSRDA) method to tackle this problem. Our proposed method substantially extends the influential regularized dual averaging (RDA) method by formulating a new convex optimization function. Specifically, two $R$ 1 -norm regularized cost-sensitive objective functions are directly optimized, respectively. We then theoretically analyze CSRDA's regret bounds and the bounds of primal variables. Thus, CSRDA benefits from achieving a theoretical convergence of balanced cost and sparsity for severe imbalanced and high-dimensional streaming data mining. To validate our method, we conduct extensive experiments on six benchmark streaming datasets with varied imbalance ratios. The experimental results demonstrate that, compared to other baseline methods, CSRDA not only improves classification performance, but also successfully captures sparse features more effectively, hence has better interpretability.
AB - Class-imbalance is one of the most challenging problems in online learning due to its impact on the prediction capability of data stream mining models. Most existing approaches for online learning lack an effective mechanism to handle high-dimensional streaming data with skewed class distributions, resulting in insufficient model interpretation and deterioration of online performance. In this paper, we develop a cost-sensitive regularized dual averaging (CSRDA) method to tackle this problem. Our proposed method substantially extends the influential regularized dual averaging (RDA) method by formulating a new convex optimization function. Specifically, two $R$ 1 -norm regularized cost-sensitive objective functions are directly optimized, respectively. We then theoretically analyze CSRDA's regret bounds and the bounds of primal variables. Thus, CSRDA benefits from achieving a theoretical convergence of balanced cost and sparsity for severe imbalanced and high-dimensional streaming data mining. To validate our method, we conduct extensive experiments on six benchmark streaming datasets with varied imbalance ratios. The experimental results demonstrate that, compared to other baseline methods, CSRDA not only improves classification performance, but also successfully captures sparse features more effectively, hence has better interpretability.
KW - Cost-sensitive metrics
KW - Imbalance ratio
KW - Online learning
KW - Sparsity
KW - Streaming data
UR - http://www.scopus.com/inward/record.url?scp=85125104575&partnerID=8YFLogxK
U2 - 10.1109/ICKG52313.2021.00031
DO - 10.1109/ICKG52313.2021.00031
M3 - Conference contribution
AN - SCOPUS:85125104575
T3 - Proceedings - 12th IEEE International Conference on Big Knowledge, ICBK 2021
SP - 164
EP - 173
BT - Proceedings - 12th IEEE International Conference on Big Knowledge, ICBK 2021
A2 - Gong, Zhiguo
A2 - Li, Xue
A2 - Oguducu, Sule Gunduz
A2 - Chen, Lei
A2 - Manjon, Baltasar Fernandez
A2 - Wu, Xindong
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 7 December 2021 through 8 December 2021
ER -