Examination of data, rule generation and detection of phishing URLs using online logistic regression

Mohammed Nazim Feroz, Susan Mengel

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

31 Scopus citations

Abstract

Web services such as online banking, gaming, and social networking have rapidly evolved as has the reliance upon them by people to perform everyday tasks. As a result, a large amount of information is uploaded on a daily basis to the Web. The openness of the Web exposes opportunities for criminals to upload malicious content. Despite extensive research, email based spam filtering techniques are unable to protect other web services. Therefore, a counter measure must be taken that generalizes across web services to protect the user from phishing hosts. The paper describes an approach that classifies URLs automatically based on their lexical and host-based features. The usability of Mahout is demonstrated for such scalable machine learning problems, and online learning is considered over batch learning. The classifier achieves 93-97% accuracy by detecting a large number of phishing hosts, while maintaining a modest false positive rate. The raw data is examined, and the effectiveness of various feature subsets is assessed. The relevance of bigrams is assessed, and strengthened by using the chi-squared and information gain attribute evaluation methods.

Original languageEnglish
Title of host publicationProceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014
EditorsWo Chang, Jun Huan, Nick Cercone, Saumyadipta Pyne, Vasant Honavar, Jimmy Lin, Xiaohua Tony Hu, Charu Aggarwal, Bamshad Mobasher, Jian Pei, Raghunath Nambiar
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages241-250
Number of pages10
ISBN (Electronic)9781479956654
DOIs
StatePublished - 2014
Event2nd IEEE International Conference on Big Data, IEEE Big Data 2014 - Washington, United States
Duration: Oct 27 2014Oct 30 2014

Publication series

NameProceedings - 2014 IEEE International Conference on Big Data, IEEE Big Data 2014

Conference

Conference2nd IEEE International Conference on Big Data, IEEE Big Data 2014
Country/TerritoryUnited States
CityWashington
Period10/27/1410/30/14

Keywords

  • Attribute Evaluation
  • Decision Tree
  • Feature Vector
  • Rule Generation
  • Stochastic Gradient Descent

Fingerprint

Dive into the research topics of 'Examination of data, rule generation and detection of phishing URLs using online logistic regression'. Together they form a unique fingerprint.

Cite this