CDNM: Clustering-Based Data Normalization Method For Automated Vulnerability Detection

被引:0
|
作者
Wu, Tongshuai [1 ,2 ]
Chen, Liwei [1 ,2 ]
Du, Gewangzi [1 ,2 ]
Zhu, Chenguang [1 ,2 ]
Cui, Ningning [1 ,2 ]
Shi, Gang [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
来源
COMPUTER JOURNAL | 2024年 / 67卷 / 04期
基金
中国国家自然科学基金;
关键词
Data Normalization; Clustering; Vulnerability Detection; Deep Learning;
D O I
10.1093/comjnl/bxad080
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The key to deep learning vulnerability detection framework is pre-processing source code and learning vulnerability features. Traditional source code representation techniques take a complete normalization to user-defined symbols but ignore the semantic information associated with vulnerabilities. The current mainstream vulnerability feature learning model is Recurrent Neural Network (RNN), whose time-series structure determines its insufficient remote information acquisition capability. This paper proposes a new vulnerability detection framework to solve the above problems. We propose a new data normalization method in the source code pre-processing phase. The user-defined symbols are clustered using the unsupervised clustering algorithm K-means. The normalized classification is performed according to the clustering results, which preserves the primary semantic information in the source code and ensures the smoothness of the sample data. In the feature extraction stage, we input the source code after performing text representation into Bidirectional Encoder Representations for Transformers (BERT) for feature automation learning, which enhances semantic information extraction and remote information acquisition. Experimental results show that the vulnerability detection precision of this method is 18.3% higher than that of the current mainstream vulnerability detection framework in the real-world data collected by ourselves. Further, our method improves the precision of the state-of-the-art method by 4.2%.
引用
收藏
页码:1538 / 1549
页数:12
相关论文
共 50 条
  • [21] CLUSTERING-BASED SUBSET ENSEMBLE LEARNING METHOD FOR IMBALANCED DATA
    Hu, Xiao-Sheng
    Zhang, Run-Jing
    PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4, 2013, : 35 - 39
  • [22] Clustering-based Automated Requirement Trace Retrieval
    Al-walidi, Nejood Hashim
    Azab, Shahira Shaaban
    Khamis, Abdelaziz
    Darwish, Nagy Ramadan
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (12) : 783 - 792
  • [23] K-Means Clustering-Based Automated Change Detection in Color Images
    G-Michael, Tesfaye
    Gunzburger, Max
    Peterson, Janet
    Yannakopoulos, Anna
    DETECTION AND SENSING OF MINES, EXPLOSIVE OBJECTS, AND OBSCURED TARGETS XXIII, 2018, 10628
  • [24] Adaptive Clustering-Based Marine Radar Sea Clutter Normalization
    Xu, Yong
    Jia, Tao
    Cao, Dong
    Guo, Pengyu
    Ma, Yue
    Yan, Hongtao
    JOURNAL OF SENSORS, 2021, 2021
  • [25] CLUSTERING-BASED NETWORK INTRUSION DETECTION
    Zhong, Shi
    Khoshgoftaar, Taghi M.
    Seliya, Naeem
    INTERNATIONAL JOURNAL OF RELIABILITY QUALITY AND SAFETY ENGINEERING, 2007, 14 (02) : 169 - 187
  • [26] Clustering-Based Trajectory Outlier Detection
    Eldawy, Eman O.
    Mokhtar, Hoda M. O.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (05) : 133 - 139
  • [27] Efficient Clustering-Based Outlier Detection Algorithm for Dynamic Data Stream
    Elahi, Manzoor
    Li, Kun
    Nisar, Wasif
    Lv, Xinjie
    Wang, Hongan
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 5, PROCEEDINGS, 2008, : 298 - 304
  • [28] Clustering-based dome detection in lunar images using DTM data
    Micheal, Anto A.
    Vani, K.
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2018, 39 (18) : 5794 - 5808
  • [29] An incremental nonparametric Bayesian clustering-based traversable region detection method
    Lee, Honggu
    Kwak, Kiho
    Jo, Sungho
    AUTONOMOUS ROBOTS, 2017, 41 (04) : 795 - 810
  • [30] An incremental nonparametric Bayesian clustering-based traversable region detection method
    Honggu Lee
    Kiho Kwak
    Sungho Jo
    Autonomous Robots, 2017, 41 : 795 - 810