ViTDroid: Vision Transformers for Efficient, Explainable Attention to Malicious Behavior in Android Binaries

被引：0

作者：

Syed, Toqeer Ali ^{[1
]}

Nauman, Mohammad ^{[2
]}

Khan, Sohail ^{[2
]}

Jan, Salman ^{[3
,4
]}

Zuhairi, Megat F. ^{[5
]}

机构：

[1] Islamic Univ Madinah, Fac Comp & Informat Syst, Madinah 42351, Saudi Arabia

[2] Effat Univ, Effat Coll Engn, Dept Comp Sci, Jeddah 22332, Saudi Arabia

[3] Alburaimi Univ Coll, Dept Informat Technol, Alburaimi 512, Oman

[4] Univ Technol Bahrain, Coll Comp Studies, Salmabad 18041, Bahrain

[5] Univ Kuala Lumpur, Malaysian Inst Informat Technol, Kuala Lumpur 50250, Malaysia

来源：

SENSORS | 2024年 / 24卷 / 20期

关键词：

malware; vision transformers; android; security;

D O I：

10.3390/s24206690

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

Smartphones are intricately connected to the modern society. The two widely used mobile phone operating systems, iOS and Android, profoundly affect the lives of millions of people. Android presently holds a market share of close to 71% among these two. As a result, if personal information is not securely protected, it is at tremendous risk. On the other hand, mobile malware has seen a year-on-year increase of more than 42% globally in 2022 mid-year. Any group of human professionals would have a very tough time detecting and removing all of this malware. For this reason, deep learning in particular has been used recently to overcome this problem. Deep learning models, however, were primarily created for picture analysis. Despite the fact that these models have shown promising findings in the field of vision, it has been challenging to fully comprehend what the characteristics recovered by deep learning models are in the area of malware. Furthermore, the actual potential of deep learning for malware analysis has not yet been fully realized due to the translation invariance trait of well-known models based on CNN. In this paper, we present ViTDroid, a novel model based on vision transformers for the deep learning-based analysis of opcode sequences of Android malware samples from large real-world datasets. We have been able to achieve a false positive rate of 0.0019 as compared to the previous best of 0.0021. However, this incremental improvement is not the major contribution of our work. Our model aims to make explainable predictions, i.e., it not only performs the classification of malware with high accuracy, but it also provides insights into the reasons for this classification. The model is able to pinpoint the malicious behavior-causing instructions in the malware samples. This means that our model can actually aid in the field of malware analysis itself by providing insights to human experts, thus leading to further improvements in this field.

引用

页数：18

共 8 条

[1] Efficient Vision Transformers with Partial Attention
Vo, Xuan-Thuy
Nguyen, Duy-Linh
Priadana, Adri
Jo, Kang-Hyun
COMPUTER VISION - ECCV 2024, PT LXXXIII, 2025, 15141 : 298 - 317
[2] AttnZero: Efficient Attention Discovery for Vision Transformers
Li, Lujun
Wei, Zimian
Dong, Peijie
Luo, Wenhan
Xue, Wei
Liu, Qifeng
Guo, Yike
COMPUTER VISION - ECCV 2024, PT V, 2025, 15063 : 20 - 37
[3] Multimodal Vision Transformers with Forced Attention for Behavior Analysis
Agrawal, Tanay
Balazia, Michal
Muller, Philipp
Bremond, Francois
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 3381 - 3391
[4] Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers
Wei, Cong
Duke, Brendan
Jiang, Ruowei
Aarabi, Parham
Taylor, Graham W.
Shkurti, Florian
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22680 - 22689
[5] Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers
Lee, Sanghyeok
Choi, Joonmyung
Kim, Hyunwoo J.
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 15741 - 15750
[6] Efficient data-driven behavior identification based on vision transformers for human activity understanding
Yang, Jiachen
Zhang, Zhuo
Xiao, Shuai
Ma, Shukun
Li, Yang
Lu, Wen
Gao, Xinbo
NEUROCOMPUTING, 2023, 530 : 104 - 115
[7] RNA-ViT: Reduced-Dimension Approximate Normalized Attention Vision Transformers for Latency Efficient Private Inference
Chen, Dake
Zhang, Yuke
Kundu, Souvik
Li, Chenghao
Beerel, Peter A.
2023 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD, 2023,
[8] COVID-Attention: Efficient COVID19 Detection Using Pre-trained Deep Models Based on Vision Transformers and X-ray Images
Haouli, Imed-Eddine
Hariri, Walid
Seridi-Bouchelaghem, Hassina
INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2023, 32 (08)

← 1 →