Asteria-Pro: Enhancing Deep Learning-based Binary Code Similarity Detection by Incorporating Domain Knowledge

被引:0
|
作者
Yang, Shouguo [1 ,2 ]
Dong, Chaopeng [1 ,2 ]
Xiao, Yang [1 ,2 ]
Cheng, Yiran [1 ,2 ]
Shi, Zhiqiang [1 ,2 ]
Li, Zhi [1 ,2 ]
Sun, Limin [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, 19 Shucun Rd, Beijing 100085, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, 1 Yanqihu East Rd, Beijing 101408, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Binary code similarity detection; pre-fitering; re-ranking; abstract syntactic tree; graph neural network; SEARCH;
D O I
10.1145/3604611
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Widespread code reuse allows vulnerabilities to proliferate among a vast variety of firmware. There is an urgent need to detect these vulnerable codes effectively and efficiently. By measuring code similarities, AI-based binary code similarity detection is applied to detecting vulnerable code at scale. Existing studies have proposed various function features to capture the commonality for similarity detection. Nevertheless, the significant code syntactic variability induced by the diversity of IoT hardware architectures diminishes the accuracy of binary code similarity detection. In our earlier study and the tool Asteria, we adopted a Tree-LSTM network to summarize function semantics as function commonality, and the evaluation result indicates an advanced performance. However, it still has utility concerns due to excessive time costs and inadequate precision while searching for large-scale firmware bugs. To this end, we propose a novel deep learning-enhancement architecture by incorporating domain knowledge-based pre-filtration and re-ranking modules, and we develop a prototype named ASTERIA-PRO based on Asteria. The pre-filtration module eliminates dissimilar functions, thus reducing the subsequent deep learning-model calculations. The re-ranking module boosts the rankings of vulnerable functions among candidates generated by the deep learning model. Our evaluation indicates that the pre-filtration module cuts the calculation time by 96.9%, and the re-ranking module improves MRR and Recall by 23.71% and 36.4%, respectively. By incorporating these modules, ASTERIA-PRO outperforms existing state-of-the-art approaches in the bug search task by a significant margin. Furthermore, our evaluation shows that embedding baseline methods with pre-filtration and re-ranking modules significantly improves their precision. We conduct a large-scale real-world firmware bug search, and ASTERIA-PRO manages to detect 1,482 vulnerable functions with a high precision 91.65%.
引用
收藏
页数:40
相关论文
共 50 条
  • [41] Deep Learning-based Method for Enhancing the Detection of Arabic Authorship Attribution using Acoustic and Textual-based Features
    Al-Sarem, Mohammed
    Saeed, Faisal
    Qasem, Sultan Noman
    Albarrak, Abdullah M.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (07) : 41 - 51
  • [42] Deep learning-based intrusion detection system for in-vehicle networks with knowledge graph and statistical methods
    Alqahtani, Hamed
    Kumar, Gulshan
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024,
  • [43] Enhancing PIR-Based Multi-Person Localization Through Combining Deep Learning With Domain Knowledge
    Yang, Tianye
    Guo, Peng
    Liu, Wenyu
    Liu, Xuefeng
    Hao, Tianyu
    IEEE SENSORS JOURNAL, 2021, 21 (04) : 4874 - 4886
  • [44] Binary class and multi-class plant disease detection using ensemble deep learning-based approach
    Sunil, C. K.
    Jaidhar, C. D.
    Patil, Nagamma
    INTERNATIONAL JOURNAL OF SUSTAINABLE AGRICULTURAL MANAGEMENT AND INFORMATICS, 2022, 8 (04) : 385 - 407
  • [45] Enhancing a fog-oriented IoT authentication and encryption platform through deep learning-based attack detection
    dos Santos, Fabio Coutinho
    Duarte-Figueiredo, Fatima
    De Grande, Robson E.
    dos Santos, Aldri L.
    INTERNET OF THINGS, 2024, 27
  • [46] Using domain knowledge for robust and generalizable deep learning-based CT-free PET attenuation and scatter correction
    Rui Guo
    Song Xue
    Jiaxi Hu
    Hasan Sari
    Clemens Mingels
    Konstantinos Zeimpekis
    George Prenosil
    Yue Wang
    Yu Zhang
    Marco Viscione
    Raphael Sznitman
    Axel Rominger
    Biao Li
    Kuangyu Shi
    Nature Communications, 13
  • [47] Using domain knowledge for robust and generalizable deep learning-based CT-free PET attenuation and scatter correction
    Guo, Rui
    Xue, Song
    Hu, Jiaxi
    Sari, Hasan
    Mingels, Clemens
    Zeimpekis, Konstantinos
    Prenosil, George
    Wang, Yue
    Zhang, Yu
    Viscione, Marco
    Sznitman, Raphael
    Rominger, Axel
    Li, Biao
    Shi, Kuangyu
    NATURE COMMUNICATIONS, 2022, 13 (01)
  • [48] Enhancing Road Safety: Deep Learning-Based Intelligent Driver Drowsiness Detection for Advanced Driver-Assistance Systems
    Yang, Eunmok
    Yi, Okyeon
    ELECTRONICS, 2024, 13 (04)
  • [49] Deep learning-based algorithm for the detection of idiopathic full thickness macular holes in spectral domain optical coherence tomography
    Valentim, Carolina C. S.
    Wu, Anna K.
    Yu, Sophia
    Manivannan, Niranchana
    Zhang, Qinqin
    Cao, Jessica
    Song, Weilin
    Wang, Victoria
    Kang, Hannah
    Kalur, Aneesha
    Iyer, Amogh I.
    Conti, Thais
    Singh, Rishi P.
    Talcott, Katherine E.
    INTERNATIONAL JOURNAL OF RETINA AND VITREOUS, 2024, 10 (01)
  • [50] Deep learning-based algorithm for the detection of idiopathic full thickness macular holes in spectral domain optical coherence tomography
    Carolina C. S. Valentim
    Anna K. Wu
    Sophia Yu
    Niranchana Manivannan
    Qinqin Zhang
    Jessica Cao
    Weilin Song
    Victoria Wang
    Hannah Kang
    Aneesha Kalur
    Amogh I. Iyer
    Thais Conti
    Rishi P. Singh
    Katherine E. Talcott
    International Journal of Retina and Vitreous, 10