TY - JOUR
T1 - Stratification-Based Outlier Detection over the Deep Web
AU - Xian, Xuefeng
AU - Zhao, Pengpeng
AU - Sheng, Victor S.
AU - Fang, Ligang
AU - Gu, Caidong
AU - Yang, Yuanfeng
AU - Cui, Zhiming
N1 - Publisher Copyright:
© 2016 Xuefeng Xian et al.
PY - 2016
Y1 - 2016
N2 - For many applications, finding rare instances or outliers can be more interesting than finding common patterns. Existing work in outlier detection never considers the context of deep web. In this paper, we argue that, for many scenarios, it is more meaningful to detect outliers over deep web. In the context of deep web, users must submit queries through a query interface to retrieve corresponding data. Therefore, traditional data mining methods cannot be directly applied. The primary contribution of this paper is to develop a new data mining method for outlier detection over deep web. In our approach, the query space of a deep web data source is stratified based on a pilot sample. Neighborhood sampling and uncertainty sampling are developed in this paper with the goal of improving recall and precision based on stratification. Finally, a careful performance evaluation of our algorithm confirms that our approach can effectively detect outliers in deep web.
AB - For many applications, finding rare instances or outliers can be more interesting than finding common patterns. Existing work in outlier detection never considers the context of deep web. In this paper, we argue that, for many scenarios, it is more meaningful to detect outliers over deep web. In the context of deep web, users must submit queries through a query interface to retrieve corresponding data. Therefore, traditional data mining methods cannot be directly applied. The primary contribution of this paper is to develop a new data mining method for outlier detection over deep web. In our approach, the query space of a deep web data source is stratified based on a pilot sample. Neighborhood sampling and uncertainty sampling are developed in this paper with the goal of improving recall and precision based on stratification. Finally, a careful performance evaluation of our algorithm confirms that our approach can effectively detect outliers in deep web.
UR - http://www.scopus.com/inward/record.url?scp=84975087391&partnerID=8YFLogxK
U2 - 10.1155/2016/7386517
DO - 10.1155/2016/7386517
M3 - Article
C2 - 27313603
AN - SCOPUS:84975087391
SN - 1687-5265
VL - 2016
JO - Computational Intelligence and Neuroscience
JF - Computational Intelligence and Neuroscience
M1 - 7386517
ER -