A Deep Learning Framework for Malware Classification using NLP Techniques
Bishwajit Prasad Gond,A. Singh,D. Mohapatra
TLDR
This study proposes the use of n-grams of API call sequences, including both API names and their arguments, as a means to characterize malware behavior, and develops a Deep Learning-based classification model, demonstrating significant improvements in classification accuracy and robustness compared to traditional methods.
Abstract
Malware classification is a crucial aspect of cybersecurity, vital for recognizing and addressing potential threats. This study introduces a fresh perspective on malware classification utilizing Natural Language Processing (NLP) methods in conjunction with Deep Learning. We propose the use of n-grams of API call sequences, including both API names and their arguments, as a means to characterize malware behavior. By employing Deep Learning techniques, we can effectively capture the nuanced patterns and distinctions in malware behaviors, enabling a more comprehensive understanding of malware characteristics and facilitating accurate identification and classification of diverse malware strains. The key contributions of this study include leveraging n-grams of API call sequences as a novel feature representation, developing a Deep Learning-based classification model, and evaluating the proposed approach on a large-scale malware dataset, demonstrating significant improvements in classification accuracy and robustness compared to traditional methods. The findings of this research present a promising avenue for improving malware classification, ultimately strengthening the overall cybersecurity landscape by combining the strengths of NLP and Deep Learning to combat evolving cyber threats.
