Spam Emails Detection Based on Distributed Word Embedding with Deep Learning

Sriram Srinivasan, Vinayakumar Ravi, Mamoun Alazab, Simran Ketha, Ala’ M. Al-Zoubi, Soman Kotti Padannayil

    Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

    61 Citations (Scopus)

    Abstract

    In recent years, a rapid shift from general and random attacks to more sophisticated and advanced ones can be noticed. Unsolicited email or spam is one of the sources of many types of cybercrime techniques that use complicated methods to trick specific victims. Spam detection is one of the leading machine learning-oriented applications in the last decade. In this work, we present a new methodology for detecting spam emails based on deep learning architectures in the context of natural language processing (NLP). Past works on classical machine learning based spam email detection has relied on various feature engineering methods. Identifying a proper feature engineering method is a difficult task and moreover vulnerable in an adversarial environment. Our proposed method leverage the text representation of NLP and map towards spam email detection task. Various email representation methods are utilized to transform emails into email word vectors, as an essential step for machine learning algorithms. Moreover, optimal parameters are identified for many deep learning architectures and email representation by following the hyper-parameter tuning approach. The performance of many classical machine learning classifiers and deep learning architectures with various text representations are evaluated based on publicly available three email corpora. The experimental results show that the deep learning architectures performed better when compared to the standard machine learning classifiers in terms of accuracy, precision, recall, and F1-score. This is essentially due to the fact that the deep learning architectures facilitate to learn hierarchical, abstract and sequential feature representations of emails. Furthermore, word embedding with deep learning has performed well in comparison to the other classical email representation methods. The word embedding simplify to learn the syntactic, semantic and contextual similarity of emails. This endows word embedding with deep learning methods in spam email filtering in the real environment.

    Original languageEnglish
    Title of host publicationMachine Intelligence and Big Data Analytics for Cybersecurity Applications
    EditorsYassine Maleh, Mohammad Shojafar, Mamoun Alazab, Youssef Baddi
    Place of PublicationSwitzerland
    PublisherSpringer Science and Business Media Deutschland GmbH
    Chapter7
    Pages161-189
    Number of pages29
    Edition1
    ISBN (Electronic)978-3-030-57024-8
    ISBN (Print)978-3-030-57023-1
    DOIs
    Publication statusPublished - 2021

    Publication series

    NameStudies in Computational Intelligence
    Volume919
    ISSN (Print)1860-949X
    ISSN (Electronic)1860-9503

    Fingerprint

    Dive into the research topics of 'Spam Emails Detection Based on Distributed Word Embedding with Deep Learning'. Together they form a unique fingerprint.

    Cite this