TY - JOUR
T1 - A Pseudo Feedback-Based Annotated TF-IDF Technique for Dynamic Crypto-Ransomware Pre-Encryption Boundary Delineation and Features Extraction
AU - Al-Rimy, Bander Ali Saleh
AU - Maarof, Mohd Aiziani
AU - Alazab, Mamoun
AU - Alsolami, Fawaz
AU - Shaid, Syed Zainudeen Mohd
AU - Ghaleb, Fuad A.
AU - Al-Hadhrami, Tawfik
AU - Ali, Abdullah Marish
PY - 2020/7/29
Y1 - 2020/7/29
N2 - The cryptography employed against user files makes the effect of crypto-ransomware attacks irreversible even after detection and removal. Thus, detecting such attacks early, i.e. during pre-encryption phase before the encryption takes place is necessary. Existing crypto-ransomware early detection solutions use a fixed time-based thresholding approach to determine the pre-encryption phase boundaries. However, the fixed time thresholding approach implies that all samples start the encryption at the same time. Such assumption does not necessarily hold for all samples as the time for the main sabotage to start varies among different crypto-ransomware families due to the obfuscation techniques employed by the malware to change its attack strategies and evade detection, which generates different attack behaviors. Additionally, the lack of sufficient data at the early phases of the attack adversely affects the ability of feature extraction techniques in early detection models to perceive the characteristics of the attacks, which, consequently, decreases the detection accuracy. Therefore, this paper proposes a Dynamic Pre-encryption Boundary Delineation and Feature Extraction (DPBD-FE) scheme that determines the boundary of the pre-encryption phase, from which the features are extracted and selected more accurately. Unlike the fixed thresholding employed by the extant works, DPBD-FE tracks the pre-encryption phase for each instance individually based on the first occurrence of any cryptography-related APIs. Then, an annotated Term Frequency-Inverse Document Frequency (aTF-IDF) technique was utilized to extract the features from runtime data generated during the pre-encryption phase of crypto-ransomware attacks. The aTF-IDF overcomes the challenge of insufficient attack patterns during the early phases of the attack lifecycle. The experimental evaluation shows that DPBD-FE was able to determine the pre-encryption boundaries and extract the features related to this phase more accurately compared to related works.
AB - The cryptography employed against user files makes the effect of crypto-ransomware attacks irreversible even after detection and removal. Thus, detecting such attacks early, i.e. during pre-encryption phase before the encryption takes place is necessary. Existing crypto-ransomware early detection solutions use a fixed time-based thresholding approach to determine the pre-encryption phase boundaries. However, the fixed time thresholding approach implies that all samples start the encryption at the same time. Such assumption does not necessarily hold for all samples as the time for the main sabotage to start varies among different crypto-ransomware families due to the obfuscation techniques employed by the malware to change its attack strategies and evade detection, which generates different attack behaviors. Additionally, the lack of sufficient data at the early phases of the attack adversely affects the ability of feature extraction techniques in early detection models to perceive the characteristics of the attacks, which, consequently, decreases the detection accuracy. Therefore, this paper proposes a Dynamic Pre-encryption Boundary Delineation and Feature Extraction (DPBD-FE) scheme that determines the boundary of the pre-encryption phase, from which the features are extracted and selected more accurately. Unlike the fixed thresholding employed by the extant works, DPBD-FE tracks the pre-encryption phase for each instance individually based on the first occurrence of any cryptography-related APIs. Then, an annotated Term Frequency-Inverse Document Frequency (aTF-IDF) technique was utilized to extract the features from runtime data generated during the pre-encryption phase of crypto-ransomware attacks. The aTF-IDF overcomes the challenge of insufficient attack patterns during the early phases of the attack lifecycle. The experimental evaluation shows that DPBD-FE was able to determine the pre-encryption boundaries and extract the features related to this phase more accurately compared to related works.
KW - Cybersecurity
KW - early detection
KW - malware
KW - ransomware
KW - TF-IDF
UR - http://www.scopus.com/inward/record.url?scp=85089495556&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2020.3012674
DO - 10.1109/ACCESS.2020.3012674
M3 - Article
AN - SCOPUS:85089495556
SN - 2169-3536
VL - 8
SP - 140586
EP - 140598
JO - IEEE Access
JF - IEEE Access
M1 - 9151929
ER -