DURLD: Malicious URL detection using deep learning-based character level representations

Sriram Srinivasan, R. Vinayakumar, Ajay Arunachalam, Mamoun Alazab, K. P. Soman

    Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

    Abstract

    Cybercriminals widely use Malicious URL, a.k.a. malicious website as a primary mechanism to host unsolicited content, such as spam, malicious advertisements, phishing, and drive-by exploits, to name a few. Previous studies used blacklisting, regular expression, and signature matching approaches to detect malicious URLs. However, these approaches are limited to detect variants of existing or newly generated malicious URLs. Over the last decade, classic machine learning techniques have been used to detect malicious URLs. In this work, we evaluate various state-of-the-art deep learning-based character level embedding methods for malicious URL detection. To leverage and transform the performance improvement, we propose DeepURLDetect (DURLD) in which raw URLs are encoded using character level embedding. To capture several types of information in URL, we used the hidden layers in deep learning architectures to extract features from character level embedding and then employ a non-linear activation function to estimate the probability of the URL as malicious or not. Experimental evaluation demonstrates that DURLD can detect variants of malicious URLs, and it is computationally inexpensive when compared to various relevant deep learning-based character level embedding methods.

    Original languageEnglish
    Title of host publicationMalware Analysis Using Artificial Intelligence and Deep Learning
    EditorsMark Stamp, Mamoun Alazab, Andrii Shalaginov
    PublisherSpringer
    Pages535-554
    Number of pages20
    ISBN (Electronic)9783030625825
    ISBN (Print)9783030625818
    DOIs
    Publication statusPublished - 20 Dec 2020

    Fingerprint

    Dive into the research topics of 'DURLD: Malicious URL detection using deep learning-based character level representations'. Together they form a unique fingerprint.

    Cite this