UPDF AI

Fusing multi-model features for malware classification

Zijian Yang,Dan Wei,Kainan Hong

2025 · DOI: 10.1109/ITOEC63606.2025.10969063
0 Citations

TLDR

Fu-Malk characterizes malicious code as multi-modal features, fully utilizing the complementary characteristics of different forms of features, and achieves an average accuracy of 99.77% on Microsoft's public dataset.

Abstract

Over the past few decades, many malware classification methods have emerged. Methods based on machine learning rely on manually extracted features, while those based on deep learning can reduce manual intervention and improve model generalization capabilities. However, these methods mostly characterize malicious code as a single type of feature, with a single source and form of features, resulting in poor performance compared to multimodal methods. In this work, we propose FuMalk, which combines three data modalities: (1) opcode calls statistically derived from malicious code assembly files, (2) grayscale images representing malicious code, and (3) control flow graphs extracted from malicious code assembly files. Fu-Malk characterizes malicious code as multi-modal features, fully utilizing the complementary characteristics of different forms of features. This paper uses ten-fold cross-validation to evaluate the proposed method, and the experimental results show that FuMalk achieves an average accuracy of 99.77% on Microsoft’s public dataset.

Cited Papers