Phishing URL Detection Using URL Ranking

Mohammed Nazim Feroz; Susan Mengel

doi:10.1109/BigDataCongress.2015.97

Phishing URL Detection Using URL Ranking

Mohammed Nazim Feroz, Susan Mengel

Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

62 Scopus citations

Abstract

The openness of the Web exposes opportunities for criminals to upload malicious content. In fact, despite extensive research, email based spam filtering techniques are unable to protect other web services. Therefore, a counter measure must be taken that generalizes across web services to protect the user from phishing host URLs. This paper describes an approach that classifies URLs automatically based on their lexical and host-based features. Clustering is performed on the entire dataset and a cluster ID (or label) is derived for each URL, which in turn is used as a predictive feature by the classification system. Online URL reputation services are used in order to categorize URLs and the categories returned are used as a supplemental source of information that would enable the system to rank URLs. The classifier achieves 93-98% accuracy by detecting a large number of phishing hosts, while maintaining a modest false positive rate. URL clustering, URL classification, and URL categorization mechanisms work in conjunction to give URLs a rank.

Original language	English
Title of host publication	Proceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015
Editors	Latifur Khan, Carminati Barbara
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	635-638
Number of pages	4
ISBN (Electronic)	9781467372787
DOIs	https://doi.org/10.1109/BigDataCongress.2015.97
State	Published - Aug 17 2015
Event	4th IEEE International Congress on Big Data, BigData Congress 2015 - New York City, United States Duration: Jun 27 2015 → Jul 2 2015

Publication series

Name	Proceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015

Conference

Conference	4th IEEE International Congress on Big Data, BigData Congress 2015
Country/Territory	United States
City	New York City
Period	06/27/15 → 07/2/15

Keywords

Classification
Clustering
Feature Vector
URL Ranking
Web Categorization

Access to Document

10.1109/BigDataCongress.2015.97

Cite this

Feroz, M. N., & Mengel, S. (2015). Phishing URL Detection Using URL Ranking. In L. Khan, & C. Barbara (Eds.), Proceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015 (pp. 635-638). Article 7207281 (Proceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BigDataCongress.2015.97

@inproceedings{70fca85d9a5e4bb8842662c71b94343a,

title = "Phishing URL Detection Using URL Ranking",

abstract = "The openness of the Web exposes opportunities for criminals to upload malicious content. In fact, despite extensive research, email based spam filtering techniques are unable to protect other web services. Therefore, a counter measure must be taken that generalizes across web services to protect the user from phishing host URLs. This paper describes an approach that classifies URLs automatically based on their lexical and host-based features. Clustering is performed on the entire dataset and a cluster ID (or label) is derived for each URL, which in turn is used as a predictive feature by the classification system. Online URL reputation services are used in order to categorize URLs and the categories returned are used as a supplemental source of information that would enable the system to rank URLs. The classifier achieves 93-98% accuracy by detecting a large number of phishing hosts, while maintaining a modest false positive rate. URL clustering, URL classification, and URL categorization mechanisms work in conjunction to give URLs a rank.",

keywords = "Classification, Clustering, Feature Vector, URL Ranking, Web Categorization",

author = "Feroz, {Mohammed Nazim} and Susan Mengel",

note = "Publisher Copyright: {\textcopyright} 2015 IEEE.; 4th IEEE International Congress on Big Data, BigData Congress 2015 ; Conference date: 27-06-2015 Through 02-07-2015",

year = "2015",

month = aug,

day = "17",

doi = "10.1109/BigDataCongress.2015.97",

language = "English",

series = "Proceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "635--638",

editor = "Latifur Khan and Carminati Barbara",

booktitle = "Proceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015",

}

Feroz, MN & Mengel, S 2015, Phishing URL Detection Using URL Ranking. in L Khan & C Barbara (eds), Proceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015., 7207281, Proceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015, Institute of Electrical and Electronics Engineers Inc., pp. 635-638, 4th IEEE International Congress on Big Data, BigData Congress 2015, New York City, United States, 06/27/15. https://doi.org/10.1109/BigDataCongress.2015.97

Phishing URL Detection Using URL Ranking. / Feroz, Mohammed Nazim; Mengel, Susan.
Proceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015. ed. / Latifur Khan; Carminati Barbara. Institute of Electrical and Electronics Engineers Inc., 2015. p. 635-638 7207281 (Proceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Phishing URL Detection Using URL Ranking

AU - Feroz, Mohammed Nazim

AU - Mengel, Susan

PY - 2015/8/17

Y1 - 2015/8/17

N2 - The openness of the Web exposes opportunities for criminals to upload malicious content. In fact, despite extensive research, email based spam filtering techniques are unable to protect other web services. Therefore, a counter measure must be taken that generalizes across web services to protect the user from phishing host URLs. This paper describes an approach that classifies URLs automatically based on their lexical and host-based features. Clustering is performed on the entire dataset and a cluster ID (or label) is derived for each URL, which in turn is used as a predictive feature by the classification system. Online URL reputation services are used in order to categorize URLs and the categories returned are used as a supplemental source of information that would enable the system to rank URLs. The classifier achieves 93-98% accuracy by detecting a large number of phishing hosts, while maintaining a modest false positive rate. URL clustering, URL classification, and URL categorization mechanisms work in conjunction to give URLs a rank.

AB - The openness of the Web exposes opportunities for criminals to upload malicious content. In fact, despite extensive research, email based spam filtering techniques are unable to protect other web services. Therefore, a counter measure must be taken that generalizes across web services to protect the user from phishing host URLs. This paper describes an approach that classifies URLs automatically based on their lexical and host-based features. Clustering is performed on the entire dataset and a cluster ID (or label) is derived for each URL, which in turn is used as a predictive feature by the classification system. Online URL reputation services are used in order to categorize URLs and the categories returned are used as a supplemental source of information that would enable the system to rank URLs. The classifier achieves 93-98% accuracy by detecting a large number of phishing hosts, while maintaining a modest false positive rate. URL clustering, URL classification, and URL categorization mechanisms work in conjunction to give URLs a rank.

KW - Classification

KW - Clustering

KW - Feature Vector

KW - URL Ranking

KW - Web Categorization

UR - http://www.scopus.com/inward/record.url?scp=84959487048&partnerID=8YFLogxK

U2 - 10.1109/BigDataCongress.2015.97

DO - 10.1109/BigDataCongress.2015.97

M3 - Conference contribution

AN - SCOPUS:84959487048

T3 - Proceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015

SP - 635

EP - 638

BT - Proceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015

A2 - Khan, Latifur

A2 - Barbara, Carminati

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 4th IEEE International Congress on Big Data, BigData Congress 2015

Y2 - 27 June 2015 through 2 July 2015

ER -

Feroz MN, Mengel S. Phishing URL Detection Using URL Ranking. In Khan L, Barbara C, editors, Proceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015. Institute of Electrical and Electronics Engineers Inc. 2015. p. 635-638. 7207281. (Proceedings - 2015 IEEE International Congress on Big Data, BigData Congress 2015). doi: 10.1109/BigDataCongress.2015.97

Phishing URL Detection Using URL Ranking

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this