A robust and dynamic malware detection and classification model using behavioral-based analysis and BERT technique
A. H. Alhazmi
TLDR
These findings confirm the effectiveness of the proposed behavior-based approach using BERT features, with SVM and Random Forest proving to be the most reliable classifiers.
Abstract
Malware classification is a challenging task due to the constantly evolving nature of malicious software. Traditional signature-based methods and static analysis often fail to detect sophisticated threats, making behavior-based analysis crucial. This study proposes a malware detection model that analyzes the behavior of executable files (.exe) to classify them as malware. The model submits the file to VirusTotal, where it runs in a secure environment to monitor actions such as file modifications, registry changes, or network connections. To enhance detection accuracy, the BERT model is applied to extract key features from these behavior logs. After 100 training epochs, the model achieved 92.25% accuracy and an F1-score of 91.22%, demonstrating strong overall performance. Class-wise evaluation was also conducted, treating each malware family as a distinct class to assess specific detection accuracy. Furthermore, a correlation matrix was analyzed to explore inter-class relationships and identify overlapping behaviors. Experimental results show that SVM achieved the highest F1-Scores for Adware (0.98) and BackDoor (0.91), while Random Forest showed comparable performance. Naïve Bayes, however, performed poorly for FakeAlert (F1-Score: 0.64). These findings confirm the effectiveness of the proposed behavior-based approach using BERT features, with SVM and Random Forest proving to be the most reliable classifiers.
