UPDF AI

Behavior Speaks Louder: Rethinking Malware Analysis Beyond Family Classification

Fei Zhang,Xiaohong Li,Sen Chen,Ruitao Feng

2024 · DOI: 10.1109/TrustCom63139.2024.00048
International Conference on Trust, Security and Privacy in Computing and Communications · 0 Citations

TLDR

This study compared family definitions from various antivirus companies and found significant inconsistencies in the level of detail and descriptions of malicious behaviors, and designed the AMBL frame-work, which automates the generation of behavior labels for malware.

Abstract

The classification of malicious families is essential in Android malware analysis. However, inconsistent naming standards across different antivirus companies hinder accurate identification and understanding of malicious behaviors. This study conducts an extensive analysis of Android malware families to address these challenges. First, we compared family definitions from various antivirus companies and found significant inconsistencies in the level of detail and descriptions of malicious behaviors. These inconsistencies undermine effective malware classification and analysis. Second, we assessed the alignment between described and exhibited malicious behaviors, revealing that family definitions often provide only a broad outline, omitting critical details. Additionally, evolving malware behaviors often surpass existing family definitions. To address these issues, we propose using specific behavior labels to directly indicate malicious behaviors in malware attack chains. Leveraging large language models (LLMs) and a detailed analysis of Android malicious behaviors, we identified six key behavior labels. To streamline the labeling process, we designed the AMBL frame-work, which automates the generation of behavior labels for malware. Our novel feedback mechanism-based LLM analysis method establishes relationships between APIs and behavior labels, crucial for accurate label updating. Through AMBL, a dataset with behavior analysis reports has been outputed and open sourced. An online survey and manual analysis are also conducted to validate the effectiveness of the AMBL framework and the reliability of the dataset.

Cited Papers
Citing Papers