Email Embeddings for Phishing Detection

Luis Felipe Gutierrez, Faranak Abri, Miriam Armstrong, Akbar Siami Namin, Keith S. Jones

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations


The problem of detecting phishing emails through machine learning techniques has been discussed extensively in the literature. Conventional and state-of-the-art machine learning algorithms have demonstrated the possibility of building classifiers with high accuracy. The existing research studies treat phishing and genuine emails through general indicators and thus it is not exactly clear what phishing features are contributing to variations of the classifiers. In this paper, we crafted a set of phishing and legitimate emails with similar indicators in order to investigate whether these cues are captured or disregarded by email embeddings, i.e., vectorizations. We then fed machine learning classifiers with the carefully crafted emails to find out about the performance of email embeddings developed. Our results show that using these indicators, email embeddings techniques is effective for classifying emails as phishing or legitimate.

Original languageEnglish
Title of host publicationProceedings - 2020 IEEE International Conference on Big Data, Big Data 2020
EditorsXintao Wu, Chris Jermaine, Li Xiong, Xiaohua Tony Hu, Olivera Kotevska, Siyuan Lu, Weijia Xu, Srinivas Aluru, Chengxiang Zhai, Eyhab Al-Masri, Zhiyuan Chen, Jeff Saltz
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages6
ISBN (Electronic)9781728162515
StatePublished - Dec 10 2020
Event8th IEEE International Conference on Big Data, Big Data 2020 - Virtual, Atlanta, United States
Duration: Dec 10 2020Dec 13 2020

Publication series

NameProceedings - 2020 IEEE International Conference on Big Data, Big Data 2020


Conference8th IEEE International Conference on Big Data, Big Data 2020
Country/TerritoryUnited States
CityVirtual, Atlanta


  • Email Embeddings
  • Natural Language Processing
  • Phishing Emails


Dive into the research topics of 'Email Embeddings for Phishing Detection'. Together they form a unique fingerprint.

Cite this