ViTDroid: Vision Transformers for Efficient, Explainable Attention to Malicious Behavior in Android Binaries

被引:0
|
作者
Syed, Toqeer Ali [1 ]
Nauman, Mohammad [2 ]
Khan, Sohail [2 ]
Jan, Salman [3 ,4 ]
Zuhairi, Megat F. [5 ]
机构
[1] Islamic Univ Madinah, Fac Comp & Informat Syst, Madinah 42351, Saudi Arabia
[2] Effat Univ, Effat Coll Engn, Dept Comp Sci, Jeddah 22332, Saudi Arabia
[3] Alburaimi Univ Coll, Dept Informat Technol, Alburaimi 512, Oman
[4] Univ Technol Bahrain, Coll Comp Studies, Salmabad 18041, Bahrain
[5] Univ Kuala Lumpur, Malaysian Inst Informat Technol, Kuala Lumpur 50250, Malaysia
关键词
malware; vision transformers; android; security;
D O I
10.3390/s24206690
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Smartphones are intricately connected to the modern society. The two widely used mobile phone operating systems, iOS and Android, profoundly affect the lives of millions of people. Android presently holds a market share of close to 71% among these two. As a result, if personal information is not securely protected, it is at tremendous risk. On the other hand, mobile malware has seen a year-on-year increase of more than 42% globally in 2022 mid-year. Any group of human professionals would have a very tough time detecting and removing all of this malware. For this reason, deep learning in particular has been used recently to overcome this problem. Deep learning models, however, were primarily created for picture analysis. Despite the fact that these models have shown promising findings in the field of vision, it has been challenging to fully comprehend what the characteristics recovered by deep learning models are in the area of malware. Furthermore, the actual potential of deep learning for malware analysis has not yet been fully realized due to the translation invariance trait of well-known models based on CNN. In this paper, we present ViTDroid, a novel model based on vision transformers for the deep learning-based analysis of opcode sequences of Android malware samples from large real-world datasets. We have been able to achieve a false positive rate of 0.0019 as compared to the previous best of 0.0021. However, this incremental improvement is not the major contribution of our work. Our model aims to make explainable predictions, i.e., it not only performs the classification of malware with high accuracy, but it also provides insights into the reasons for this classification. The model is able to pinpoint the malicious behavior-causing instructions in the malware samples. This means that our model can actually aid in the field of malware analysis itself by providing insights to human experts, thus leading to further improvements in this field.
引用
收藏
页数:18
相关论文
共 8 条
  • [1] Efficient Vision Transformers with Partial Attention
    Vo, Xuan-Thuy
    Nguyen, Duy-Linh
    Priadana, Adri
    Jo, Kang-Hyun
    COMPUTER VISION - ECCV 2024, PT LXXXIII, 2025, 15141 : 298 - 317
  • [2] AttnZero: Efficient Attention Discovery for Vision Transformers
    Li, Lujun
    Wei, Zimian
    Dong, Peijie
    Luo, Wenhan
    Xue, Wei
    Liu, Qifeng
    Guo, Yike
    COMPUTER VISION - ECCV 2024, PT V, 2025, 15063 : 20 - 37
  • [3] Multimodal Vision Transformers with Forced Attention for Behavior Analysis
    Agrawal, Tanay
    Balazia, Michal
    Muller, Philipp
    Bremond, Francois
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 3381 - 3391
  • [4] Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers
    Wei, Cong
    Duke, Brendan
    Jiang, Ruowei
    Aarabi, Parham
    Taylor, Graham W.
    Shkurti, Florian
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22680 - 22689
  • [5] Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers
    Lee, Sanghyeok
    Choi, Joonmyung
    Kim, Hyunwoo J.
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 15741 - 15750
  • [6] Efficient data-driven behavior identification based on vision transformers for human activity understanding
    Yang, Jiachen
    Zhang, Zhuo
    Xiao, Shuai
    Ma, Shukun
    Li, Yang
    Lu, Wen
    Gao, Xinbo
    NEUROCOMPUTING, 2023, 530 : 104 - 115
  • [7] RNA-ViT: Reduced-Dimension Approximate Normalized Attention Vision Transformers for Latency Efficient Private Inference
    Chen, Dake
    Zhang, Yuke
    Kundu, Souvik
    Li, Chenghao
    Beerel, Peter A.
    2023 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD, 2023,
  • [8] COVID-Attention: Efficient COVID19 Detection Using Pre-trained Deep Models Based on Vision Transformers and X-ray Images
    Haouli, Imed-Eddine
    Hariri, Walid
    Seridi-Bouchelaghem, Hassina
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2023, 32 (08)