CDNM: Clustering-Based Data Normalization Method For Automated Vulnerability Detection

被引:0
|
作者
Wu, Tongshuai [1 ,2 ]
Chen, Liwei [1 ,2 ]
Du, Gewangzi [1 ,2 ]
Zhu, Chenguang [1 ,2 ]
Cui, Ningning [1 ,2 ]
Shi, Gang [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
来源
COMPUTER JOURNAL | 2024年 / 67卷 / 04期
基金
中国国家自然科学基金;
关键词
Data Normalization; Clustering; Vulnerability Detection; Deep Learning;
D O I
10.1093/comjnl/bxad080
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The key to deep learning vulnerability detection framework is pre-processing source code and learning vulnerability features. Traditional source code representation techniques take a complete normalization to user-defined symbols but ignore the semantic information associated with vulnerabilities. The current mainstream vulnerability feature learning model is Recurrent Neural Network (RNN), whose time-series structure determines its insufficient remote information acquisition capability. This paper proposes a new vulnerability detection framework to solve the above problems. We propose a new data normalization method in the source code pre-processing phase. The user-defined symbols are clustered using the unsupervised clustering algorithm K-means. The normalized classification is performed according to the clustering results, which preserves the primary semantic information in the source code and ensures the smoothness of the sample data. In the feature extraction stage, we input the source code after performing text representation into Bidirectional Encoder Representations for Transformers (BERT) for feature automation learning, which enhances semantic information extraction and remote information acquisition. Experimental results show that the vulnerability detection precision of this method is 18.3% higher than that of the current mainstream vulnerability detection framework in the real-world data collected by ourselves. Further, our method improves the precision of the state-of-the-art method by 4.2%.
引用
收藏
页码:1538 / 1549
页数:12
相关论文
共 50 条
  • [41] Clustering-based data detection for spectral signature multiplexing in multispectral camera communication
    Moreno, Daniel
    Guerra, Victor
    Rufo, Julio
    Rabadan, Jose
    Perez-Jimenez, Rafael
    OPTICS LETTERS, 2022, 47 (05) : 1053 - 1056
  • [42] A comparative evaluation of clustering-based outlier detection
    Vinces, Braulio V. Sanchez
    Schubert, Erich
    Zimek, Arthur
    Cordeiro, Robson L. F.
    DATA MINING AND KNOWLEDGE DISCOVERY, 2025, 39 (02)
  • [43] Fuzzy Clustering-Based Approach for Outlier Detection
    Al-Zoubi, Moh'd Belal
    Ali, Al-Dahoud
    Yahya, Abdelfatah A.
    RECENT ADVANCES AND APPLICATIONS OF COMPUTER ENGINEERING: PROCEEDINGS OF THE 9TH WSEAS INTERNATIONAL CONFERENCE (ACE 10), 2010, : 192 - +
  • [44] Clustering Microarray Data to Determine Normalization Method
    Vendettuoli, Marie
    Doyle, Erin
    Hofmann, Heike
    SOFTWARE TOOLS AND ALGORITHMS FOR BIOLOGICAL SYSTEMS, 2011, 696 : 145 - 153
  • [45] Spam Detection Using Clustering-Based SVM
    Pandya, Darshit
    PROCEEDINGS OF THE 2019 2ND INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND MACHINE INTELLIGENCE (MLMI 2019), 2019, : 12 - 15
  • [46] Adaptive Clustering-Based Collusion Detection in Crowdsourcing
    Xu, Ruoyu
    Li, Gaoxiang
    Jin, Wei
    Chen, Austin
    Sheng, Victor S.
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT IV, 2023, 14089 : 261 - 275
  • [47] Clustering-based Anomaly Detection for Smartphone Applications
    El Attar, Ali
    Khatoun, Rida
    Lemercier, Marc
    2014 IEEE NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM (NOMS), 2014,
  • [48] Clustering-Based Network Intrusion Detection System
    Fan, Chun-I
    Lai, Yen-Lin
    Shie, Cheng-Han
    2022 5TH IEEE CONFERENCE ON DEPENDABLE AND SECURE COMPUTING (IEEE DSC 2022), 2022,
  • [49] Clustering-Based Discriminant Analysis for Eye Detection
    Chen, Shuo
    Liu, Chengjun
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2014, 23 (04) : 1629 - 1638
  • [50] Vibration signal demodulation and bearing fault detection: A clustering-based segmentation method
    Hou, Shumin
    Liang, Ming
    Zhang, Yi
    Li, Chuan
    PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART C-JOURNAL OF MECHANICAL ENGINEERING SCIENCE, 2014, 228 (11) : 1888 - 1899