CDNM: Clustering-Based Data Normalization Method For Automated Vulnerability Detection

被引:0
|
作者
Wu, Tongshuai [1 ,2 ]
Chen, Liwei [1 ,2 ]
Du, Gewangzi [1 ,2 ]
Zhu, Chenguang [1 ,2 ]
Cui, Ningning [1 ,2 ]
Shi, Gang [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
来源
COMPUTER JOURNAL | 2024年 / 67卷 / 04期
基金
中国国家自然科学基金;
关键词
Data Normalization; Clustering; Vulnerability Detection; Deep Learning;
D O I
10.1093/comjnl/bxad080
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The key to deep learning vulnerability detection framework is pre-processing source code and learning vulnerability features. Traditional source code representation techniques take a complete normalization to user-defined symbols but ignore the semantic information associated with vulnerabilities. The current mainstream vulnerability feature learning model is Recurrent Neural Network (RNN), whose time-series structure determines its insufficient remote information acquisition capability. This paper proposes a new vulnerability detection framework to solve the above problems. We propose a new data normalization method in the source code pre-processing phase. The user-defined symbols are clustered using the unsupervised clustering algorithm K-means. The normalized classification is performed according to the clustering results, which preserves the primary semantic information in the source code and ensures the smoothness of the sample data. In the feature extraction stage, we input the source code after performing text representation into Bidirectional Encoder Representations for Transformers (BERT) for feature automation learning, which enhances semantic information extraction and remote information acquisition. Experimental results show that the vulnerability detection precision of this method is 18.3% higher than that of the current mainstream vulnerability detection framework in the real-world data collected by ourselves. Further, our method improves the precision of the state-of-the-art method by 4.2%.
引用
收藏
页码:1538 / 1549
页数:12
相关论文
共 50 条
  • [1] Clustering-Based Outlier Detection Method
    Jiang, Sheng-yi
    An, Qing-bo
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 429 - 433
  • [2] Clustering-Based Subgroup Detection for Automated Fairness Analysis
    Schaefer, Jero
    Wiese, Lena
    NEW TRENDS IN DATABASE AND INFORMATION SYSTEMS, ADBIS 2022, 2022, 1652 : 45 - 55
  • [3] Data clustering-based fault detection in WSNs
    Yang, Yang
    Liu, Qian
    Gao, Zhipeng
    Qiu, Xuesong
    Rui, Lanlan
    2015 SEVENTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTATIONAL INTELLIGENCE (ICACI), 2015, : 334 - 339
  • [4] Implementation of a Clustering-Based LDDoS Detection Method
    Hussain, Tariq
    Saeed, Muhammad Irfan
    Khan, Irfan Ullah
    Aslam, Nida
    Aljameel, Sumayh S.
    ELECTRONICS, 2022, 11 (18)
  • [5] Clustering-based method for data envelopment analysis
    Najadat, H
    Nygard, K
    Schesvold, D
    MSV '05: Proceedings of the 2005 International Conference on Modeling, Simulation and Visualization Methods, 2005, : 255 - 261
  • [6] Clustering-Based Score Normalization for Speaker Verification
    Gu, Bin
    Guo, Wu
    Liu, Yao
    Sun, Jian
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 553 - 557
  • [7] An automated process for supporting decisions in clustering-based data analysis
    Bernabe-Diaz, Jose Antonio
    Franco, Manuel
    Vivo, Juana-Maria
    Quesada-Martinez, Manuel
    Fernandez-Breis, Jesualdo T.
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2022, 219
  • [8] An improved unsupervised clustering-based intrusion detection method
    Hai, YJ
    Wu, Y
    Wang, GY
    Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2005, 2005, 5812 : 52 - 60
  • [9] A Clustering-Based Method for Intrusion Detection in Web Servers
    Pereira, Hermano
    Jamhour, Edgard
    2013 20TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS (ICT), 2013,
  • [10] A Hybrid Unsupervised Clustering-Based Anomaly Detection Method
    Guo Pu
    Lijuan Wang
    Jun Shen
    Fang Dong
    Tsinghua Science and Technology, 2021, 26 (02) : 146 - 153