Robust Intelligent Malware Detection Using Deep Learning

R. Vinayakumar, Mamoun Alazab, K. P. Soman, Prabaharan Poornachandran, Sitalakshmi Venkatraman

Research output: Contribution to journalArticleResearchpeer-review

Abstract

Security breaches due to attacks by malicious software (malware) continue to escalate posing a major security concern in this digital age. With many computer users, corporations, and governments affected due to an exponential growth in malware attacks, malware detection continues to be a hot research topic. Current malware detection solutions that adopt the static and dynamic analysis of malware signatures and behavior patterns are time consuming and have proven to be ineffective in identifying unknown malwares in real-time. Recent malwares use polymorphic, metamorphic, and other evasive techniques to change the malware behaviors quickly and to generate a large number of new malwares. Such new malwares are predominantly variants of existing malwares, and machine learning algorithms (MLAs) are being employed recently to conduct an effective malware analysis. However, such approaches are time consuming as they require extensive feature engineering, feature learning, and feature representation. By using the advanced MLAs such as deep learning, the feature engineering phase can be completely avoided. Recently reported research studies in this direction show the performance of their algorithms with a biased training data, which limits their practical use in real-time situations. There is a compelling need to mitigate bias and evaluate these methods independently in order to arrive at a new enhanced method for effective zero-day malware detection. To fill the gap in the literature, this paper, first, evaluates the classical MLAs and deep learning architectures for malware detection, classification, and categorization using different public and private datasets. Second, we remove all the dataset bias removed in the experimental analysis by having different splits of the public and private datasets to train and test the model in a disjoint way using different timescales. Third, our major contribution is in proposing a novel image processing technique with optimal parameters for MLAs and deep learning architectures to arrive at an effective zero-day malware detection model. A comprehensive comparative study of our model demonstrates that our proposed deep learning architectures outperform classical MLAs. Our novelty in combining visualization and deep learning architectures for static, dynamic, and image processing-based hybrid approach applied in a big data environment is the first of its kind toward achieving robust intelligent zero-day malware detection. Overall, this paper paves way for an effective visual detection of malware using a scalable and hybrid deep learning framework for real-time deployments.

Original languageEnglish
Article number18616035
Pages (from-to)46717-46738
Number of pages22
JournalIEEE Access
Volume7
Early online date3 Apr 2019
DOIs
Publication statusPublished - 18 Apr 2019

Fingerprint

Learning algorithms
Learning systems
Deep learning
Malware
Image processing
Static analysis
Dynamic analysis
Visualization
Industry

Cite this

Vinayakumar, R., Alazab, M., Soman, K. P., Poornachandran, P., & Venkatraman, S. (2019). Robust Intelligent Malware Detection Using Deep Learning. IEEE Access, 7, 46717-46738. [18616035 ]. https://doi.org/10.1109/ACCESS.2019.2906934
Vinayakumar, R. ; Alazab, Mamoun ; Soman, K. P. ; Poornachandran, Prabaharan ; Venkatraman, Sitalakshmi. / Robust Intelligent Malware Detection Using Deep Learning. In: IEEE Access. 2019 ; Vol. 7. pp. 46717-46738.
@article{e4f73a1dfbd1462d822ab50254aa0fe0,
title = "Robust Intelligent Malware Detection Using Deep Learning",
abstract = "Security breaches due to attacks by malicious software (malware) continue to escalate posing a major security concern in this digital age. With many computer users, corporations, and governments affected due to an exponential growth in malware attacks, malware detection continues to be a hot research topic. Current malware detection solutions that adopt the static and dynamic analysis of malware signatures and behavior patterns are time consuming and have proven to be ineffective in identifying unknown malwares in real-time. Recent malwares use polymorphic, metamorphic, and other evasive techniques to change the malware behaviors quickly and to generate a large number of new malwares. Such new malwares are predominantly variants of existing malwares, and machine learning algorithms (MLAs) are being employed recently to conduct an effective malware analysis. However, such approaches are time consuming as they require extensive feature engineering, feature learning, and feature representation. By using the advanced MLAs such as deep learning, the feature engineering phase can be completely avoided. Recently reported research studies in this direction show the performance of their algorithms with a biased training data, which limits their practical use in real-time situations. There is a compelling need to mitigate bias and evaluate these methods independently in order to arrive at a new enhanced method for effective zero-day malware detection. To fill the gap in the literature, this paper, first, evaluates the classical MLAs and deep learning architectures for malware detection, classification, and categorization using different public and private datasets. Second, we remove all the dataset bias removed in the experimental analysis by having different splits of the public and private datasets to train and test the model in a disjoint way using different timescales. Third, our major contribution is in proposing a novel image processing technique with optimal parameters for MLAs and deep learning architectures to arrive at an effective zero-day malware detection model. A comprehensive comparative study of our model demonstrates that our proposed deep learning architectures outperform classical MLAs. Our novelty in combining visualization and deep learning architectures for static, dynamic, and image processing-based hybrid approach applied in a big data environment is the first of its kind toward achieving robust intelligent zero-day malware detection. Overall, this paper paves way for an effective visual detection of malware using a scalable and hybrid deep learning framework for real-time deployments.",
keywords = "Artificial intelligence, Cyber security, Cybercrime, Deep learning, Image processing, Machine learning, Malware detection, Scalable and hybrid framework, Static and dynamic analysis",
author = "R. Vinayakumar and Mamoun Alazab and Soman, {K. P.} and Prabaharan Poornachandran and Sitalakshmi Venkatraman",
year = "2019",
month = "4",
day = "18",
doi = "10.1109/ACCESS.2019.2906934",
language = "English",
volume = "7",
pages = "46717--46738",
journal = "IEEE Access",
issn = "2169-3536",
publisher = "IEEE, Institute of Electrical and Electronics Engineers",

}

Vinayakumar, R, Alazab, M, Soman, KP, Poornachandran, P & Venkatraman, S 2019, 'Robust Intelligent Malware Detection Using Deep Learning', IEEE Access, vol. 7, 18616035 , pp. 46717-46738. https://doi.org/10.1109/ACCESS.2019.2906934

Robust Intelligent Malware Detection Using Deep Learning. / Vinayakumar, R.; Alazab, Mamoun; Soman, K. P.; Poornachandran, Prabaharan; Venkatraman, Sitalakshmi.

In: IEEE Access, Vol. 7, 18616035 , 18.04.2019, p. 46717-46738.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Robust Intelligent Malware Detection Using Deep Learning

AU - Vinayakumar, R.

AU - Alazab, Mamoun

AU - Soman, K. P.

AU - Poornachandran, Prabaharan

AU - Venkatraman, Sitalakshmi

PY - 2019/4/18

Y1 - 2019/4/18

N2 - Security breaches due to attacks by malicious software (malware) continue to escalate posing a major security concern in this digital age. With many computer users, corporations, and governments affected due to an exponential growth in malware attacks, malware detection continues to be a hot research topic. Current malware detection solutions that adopt the static and dynamic analysis of malware signatures and behavior patterns are time consuming and have proven to be ineffective in identifying unknown malwares in real-time. Recent malwares use polymorphic, metamorphic, and other evasive techniques to change the malware behaviors quickly and to generate a large number of new malwares. Such new malwares are predominantly variants of existing malwares, and machine learning algorithms (MLAs) are being employed recently to conduct an effective malware analysis. However, such approaches are time consuming as they require extensive feature engineering, feature learning, and feature representation. By using the advanced MLAs such as deep learning, the feature engineering phase can be completely avoided. Recently reported research studies in this direction show the performance of their algorithms with a biased training data, which limits their practical use in real-time situations. There is a compelling need to mitigate bias and evaluate these methods independently in order to arrive at a new enhanced method for effective zero-day malware detection. To fill the gap in the literature, this paper, first, evaluates the classical MLAs and deep learning architectures for malware detection, classification, and categorization using different public and private datasets. Second, we remove all the dataset bias removed in the experimental analysis by having different splits of the public and private datasets to train and test the model in a disjoint way using different timescales. Third, our major contribution is in proposing a novel image processing technique with optimal parameters for MLAs and deep learning architectures to arrive at an effective zero-day malware detection model. A comprehensive comparative study of our model demonstrates that our proposed deep learning architectures outperform classical MLAs. Our novelty in combining visualization and deep learning architectures for static, dynamic, and image processing-based hybrid approach applied in a big data environment is the first of its kind toward achieving robust intelligent zero-day malware detection. Overall, this paper paves way for an effective visual detection of malware using a scalable and hybrid deep learning framework for real-time deployments.

AB - Security breaches due to attacks by malicious software (malware) continue to escalate posing a major security concern in this digital age. With many computer users, corporations, and governments affected due to an exponential growth in malware attacks, malware detection continues to be a hot research topic. Current malware detection solutions that adopt the static and dynamic analysis of malware signatures and behavior patterns are time consuming and have proven to be ineffective in identifying unknown malwares in real-time. Recent malwares use polymorphic, metamorphic, and other evasive techniques to change the malware behaviors quickly and to generate a large number of new malwares. Such new malwares are predominantly variants of existing malwares, and machine learning algorithms (MLAs) are being employed recently to conduct an effective malware analysis. However, such approaches are time consuming as they require extensive feature engineering, feature learning, and feature representation. By using the advanced MLAs such as deep learning, the feature engineering phase can be completely avoided. Recently reported research studies in this direction show the performance of their algorithms with a biased training data, which limits their practical use in real-time situations. There is a compelling need to mitigate bias and evaluate these methods independently in order to arrive at a new enhanced method for effective zero-day malware detection. To fill the gap in the literature, this paper, first, evaluates the classical MLAs and deep learning architectures for malware detection, classification, and categorization using different public and private datasets. Second, we remove all the dataset bias removed in the experimental analysis by having different splits of the public and private datasets to train and test the model in a disjoint way using different timescales. Third, our major contribution is in proposing a novel image processing technique with optimal parameters for MLAs and deep learning architectures to arrive at an effective zero-day malware detection model. A comprehensive comparative study of our model demonstrates that our proposed deep learning architectures outperform classical MLAs. Our novelty in combining visualization and deep learning architectures for static, dynamic, and image processing-based hybrid approach applied in a big data environment is the first of its kind toward achieving robust intelligent zero-day malware detection. Overall, this paper paves way for an effective visual detection of malware using a scalable and hybrid deep learning framework for real-time deployments.

KW - Artificial intelligence

KW - Cyber security

KW - Cybercrime

KW - Deep learning

KW - Image processing

KW - Machine learning

KW - Malware detection

KW - Scalable and hybrid framework

KW - Static and dynamic analysis

UR - http://www.scopus.com/inward/record.url?scp=85065084940&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2019.2906934

DO - 10.1109/ACCESS.2019.2906934

M3 - Article

VL - 7

SP - 46717

EP - 46738

JO - IEEE Access

JF - IEEE Access

SN - 2169-3536

M1 - 18616035

ER -

Vinayakumar R, Alazab M, Soman KP, Poornachandran P, Venkatraman S. Robust Intelligent Malware Detection Using Deep Learning. IEEE Access. 2019 Apr 18;7:46717-46738. 18616035 . https://doi.org/10.1109/ACCESS.2019.2906934