TY - JOUR
T1 - Adaptive robust local online density estimation for streaming data
AU - Chen, Zhong
AU - Fang, Zhide
AU - Sheng, Victor
AU - Zhao, Jiabin
AU - Fan, Wei
AU - Edwards, Andrea
AU - Zhang, Kun
N1 - Funding Information:
This publication was made possible by funding from the DOD ARO Grant #W911NF-20-1-0249, and the NIH grants 5U54MD007595, 5P20GM103424, 5U19AG055373 and U54GM104940.
Publisher Copyright:
© 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH, DE part of Springer Nature.
PY - 2021/6
Y1 - 2021/6
N2 - Accurate online density estimation is crucial to numerous applications that are prevalent with streaming data. Existing online approaches for density estimation somewhat lack prompt adaptability and robustness when facing concept-drifting and noisy streaming data, resulting in delayed or even deteriorated approximations. To alleviate this issue, in this work, we first propose an adaptive local online kernel density estimator (ALoKDE) for real-time density estimation on data streams. ALoKDE consists of two tightly integrated strategies: (1) a statistical test for concept drift detection and (2) an adaptive weighted local online density estimation when a drift does occur. Specifically, using a weighted form, ALoKDE seeks to provide an unbiased estimation by factoring in the statistical hallmarks of the latest learned distribution and any potential distributional changes that could be introduced by each incoming instance. A robust variant of ALoKDE, i.e., R-ALoKDE, is further developed to effectively handle data streams with varied types/levels of noise. Moreover, we analyze the asymptotic properties of ALoKDE and R-ALoKDE, and also derive their theoretical error bounds regarding bias, variance, MSE and MISE. Extensive comparative studies on various artificial and real-world (noisy) streaming data demonstrate the efficacies of ALoKDE and R-ALoKDE in online density estimation and real-time classification (with noise).
AB - Accurate online density estimation is crucial to numerous applications that are prevalent with streaming data. Existing online approaches for density estimation somewhat lack prompt adaptability and robustness when facing concept-drifting and noisy streaming data, resulting in delayed or even deteriorated approximations. To alleviate this issue, in this work, we first propose an adaptive local online kernel density estimator (ALoKDE) for real-time density estimation on data streams. ALoKDE consists of two tightly integrated strategies: (1) a statistical test for concept drift detection and (2) an adaptive weighted local online density estimation when a drift does occur. Specifically, using a weighted form, ALoKDE seeks to provide an unbiased estimation by factoring in the statistical hallmarks of the latest learned distribution and any potential distributional changes that could be introduced by each incoming instance. A robust variant of ALoKDE, i.e., R-ALoKDE, is further developed to effectively handle data streams with varied types/levels of noise. Moreover, we analyze the asymptotic properties of ALoKDE and R-ALoKDE, and also derive their theoretical error bounds regarding bias, variance, MSE and MISE. Extensive comparative studies on various artificial and real-world (noisy) streaming data demonstrate the efficacies of ALoKDE and R-ALoKDE in online density estimation and real-time classification (with noise).
KW - Adaptive bandwidth selection
KW - Adaptive weighting factor optimization
KW - Ensemble learning
KW - Local sampling
KW - Online density estimation
KW - Streaming data
UR - http://www.scopus.com/inward/record.url?scp=85100507564&partnerID=8YFLogxK
U2 - 10.1007/s13042-021-01275-y
DO - 10.1007/s13042-021-01275-y
M3 - Article
AN - SCOPUS:85100507564
VL - 12
SP - 1803
EP - 1824
JO - International Journal of Machine Learning and Cybernetics
JF - International Journal of Machine Learning and Cybernetics
SN - 1868-8071
IS - 6
ER -