UPDF AI

Identifying Useful Features for Malware Detection in the Ember Dataset

Y. Oyama,T. Miyashita,Hirotaka Kokubo

2019 · DOI: 10.1109/CANDARW.2019.00069
18 Citations

TLDR

A case study of feature selection in malware detection based on supervised machine learning was presented and useful combinations of features were identified in terms of accuracy, learning time, and data size.

Abstract

Many studies have been conducted to detect malware based on machine learning of program features extracted using static analysis. In this study, we consider the task of distinguishing between malware and benign programs by learning their surface features, such as general file information and imported functions. To make such attempts practical, a good balance among accuracy, learning time, and feature-data sizes is required. Although using only a subset of features can reduce the required time and data sizes, it is not trivial to select an appropriate subset of features. In this paper, we present a case study of feature selection in malware detection based on supervised machine learning. We used the Ember dataset as the target data and identified useful combinations of features in terms of accuracy, learning time, and data size.

Cited Papers
Citing Papers