AmritaDGA: A comprehensive data set for domain generation algorithms (DGAs) based domain name detection systems and application of deep learning

R. Vinayakumar, K. P. Soman, Prabaharan Poornachandran, Mamoun Alazab, Sabu M. Thampi

    Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

    Abstract

    In recent days, botnet plays an important role in malware distribution. This has been used as a primary approach for the proliferation of the malicious activities via the internet by attackers. To evade blacklisting, recent botnets make use of domain flux or internet protocol (IP) flux. This work focuses on domain flux. Domain flux uses domain generation algorithms (DGAs) to generate a list of domain names based on a seed and these domain names contacts command and control (C&C) server till it gets access permission to the system. This work presents the fully labeled domain name data set entitled as AmritaDGA which can be used for doing research in the field of detecting domain names which are generated using DGAs. We evaluate the efficacy of deep learning architectures with Keras embedding as domain name representation method on AmritaDGA. AmritaDGA is composed of two data sets. The first data set is collected from the publicly available sources. The second data set is collected from an internal real-time network. The performance of the trained model on public data set is evaluated on unseen samples of a public data set and private corpora. Deep learning architectures performed well in most of the cases of test experiments. The baseline system has been made publicly available and the data set is distributed for Detecting Malicious Domain names (DMD 2018) shared task.

    Original languageEnglish
    Title of host publicationBig Data Recommender Systems
    Subtitle of host publicationApplication Paradigms
    EditorsOsman Khalid, Samee U. Khan, Albert Y. Zomaya
    Place of PublicationStevenage
    PublisherInstitution of Engineering and Technology
    Chapter22
    Pages455-485
    Number of pages31
    Volume2
    Edition1
    ISBN (Electronic)9781785619779
    ISBN (Print)9781785619779
    DOIs
    Publication statusPublished - 1 Jan 2019

    Fingerprint

    Dive into the research topics of 'AmritaDGA: A comprehensive data set for domain generation algorithms (DGAs) based domain name detection systems and application of deep learning'. Together they form a unique fingerprint.

    Cite this