Stratification-Based Outlier Detection over the Deep Web

Xuefeng Xian; Pengpeng Zhao; Victor S. Sheng; Ligang Fang; Caidong Gu; Yuanfeng Yang; Zhiming Cui

doi:10.1155/2016/7386517

Stratification-Based Outlier Detection over the Deep Web

Xuefeng Xian, Pengpeng Zhao, Victor S. Sheng, Ligang Fang, Caidong Gu, Yuanfeng Yang, Zhiming Cui

Computer Science

Research output: Contribution to journal › Article › peer-review

1 Scopus citations

Abstract

For many applications, finding rare instances or outliers can be more interesting than finding common patterns. Existing work in outlier detection never considers the context of deep web. In this paper, we argue that, for many scenarios, it is more meaningful to detect outliers over deep web. In the context of deep web, users must submit queries through a query interface to retrieve corresponding data. Therefore, traditional data mining methods cannot be directly applied. The primary contribution of this paper is to develop a new data mining method for outlier detection over deep web. In our approach, the query space of a deep web data source is stratified based on a pilot sample. Neighborhood sampling and uncertainty sampling are developed in this paper with the goal of improving recall and precision based on stratification. Finally, a careful performance evaluation of our algorithm confirms that our approach can effectively detect outliers in deep web.

Original language	English
Article number	7386517
Journal	Computational Intelligence and Neuroscience
Volume	2016
DOIs	https://doi.org/10.1155/2016/7386517
State	Published - 2016

Access to Document

10.1155/2016/7386517

Cite this

@article{0f0a029bef7443aa8a38426313b123b7,

title = "Stratification-Based Outlier Detection over the Deep Web",

abstract = "For many applications, finding rare instances or outliers can be more interesting than finding common patterns. Existing work in outlier detection never considers the context of deep web. In this paper, we argue that, for many scenarios, it is more meaningful to detect outliers over deep web. In the context of deep web, users must submit queries through a query interface to retrieve corresponding data. Therefore, traditional data mining methods cannot be directly applied. The primary contribution of this paper is to develop a new data mining method for outlier detection over deep web. In our approach, the query space of a deep web data source is stratified based on a pilot sample. Neighborhood sampling and uncertainty sampling are developed in this paper with the goal of improving recall and precision based on stratification. Finally, a careful performance evaluation of our algorithm confirms that our approach can effectively detect outliers in deep web.",

author = "Xuefeng Xian and Pengpeng Zhao and Sheng, {Victor S.} and Ligang Fang and Caidong Gu and Yuanfeng Yang and Zhiming Cui",

note = "Publisher Copyright: {\textcopyright} 2016 Xuefeng Xian et al.",

year = "2016",

doi = "10.1155/2016/7386517",

language = "English",

volume = "2016",

journal = "Computational Intelligence and Neuroscience",

issn = "1687-5265",

}

TY - JOUR

T1 - Stratification-Based Outlier Detection over the Deep Web

AU - Xian, Xuefeng

AU - Zhao, Pengpeng

AU - Sheng, Victor S.

AU - Fang, Ligang

AU - Gu, Caidong

AU - Yang, Yuanfeng

AU - Cui, Zhiming

PY - 2016

Y1 - 2016

N2 - For many applications, finding rare instances or outliers can be more interesting than finding common patterns. Existing work in outlier detection never considers the context of deep web. In this paper, we argue that, for many scenarios, it is more meaningful to detect outliers over deep web. In the context of deep web, users must submit queries through a query interface to retrieve corresponding data. Therefore, traditional data mining methods cannot be directly applied. The primary contribution of this paper is to develop a new data mining method for outlier detection over deep web. In our approach, the query space of a deep web data source is stratified based on a pilot sample. Neighborhood sampling and uncertainty sampling are developed in this paper with the goal of improving recall and precision based on stratification. Finally, a careful performance evaluation of our algorithm confirms that our approach can effectively detect outliers in deep web.

AB - For many applications, finding rare instances or outliers can be more interesting than finding common patterns. Existing work in outlier detection never considers the context of deep web. In this paper, we argue that, for many scenarios, it is more meaningful to detect outliers over deep web. In the context of deep web, users must submit queries through a query interface to retrieve corresponding data. Therefore, traditional data mining methods cannot be directly applied. The primary contribution of this paper is to develop a new data mining method for outlier detection over deep web. In our approach, the query space of a deep web data source is stratified based on a pilot sample. Neighborhood sampling and uncertainty sampling are developed in this paper with the goal of improving recall and precision based on stratification. Finally, a careful performance evaluation of our algorithm confirms that our approach can effectively detect outliers in deep web.

UR - http://www.scopus.com/inward/record.url?scp=84975087391&partnerID=8YFLogxK

U2 - 10.1155/2016/7386517

DO - 10.1155/2016/7386517

M3 - Article

C2 - 27313603

AN - SCOPUS:84975087391

SN - 1687-5265

VL - 2016

JO - Computational Intelligence and Neuroscience

JF - Computational Intelligence and Neuroscience

M1 - 7386517

ER -

Stratification-Based Outlier Detection over the Deep Web

Abstract

Access to Document

Other files and links

Fingerprint

Cite this