Vulnerability Detection Methods Based on Natural Language Processing

被引:0
|
作者
Yang Y. [1 ,2 ]
Li Y. [1 ,2 ]
Chen K. [1 ,2 ,3 ]
机构
[1] State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, Beijing
[2] School of Cyber Security, University of Chinese Academy of Sciences, Beijing
[3] Beijing Academy of Artificial Intelligence, Beijing
基金
中国国家自然科学基金;
关键词
Natural language processing; Security; Static detection; Survey; Vulnerability detection;
D O I
10.7544/issn1000-1239.20210627
中图分类号
学科分类号
摘要
With the number of the official reported vulnerabilities is exponentially increasing, the researches aiming at the techniques of vulnerability detection is arising. The diversity of vulnerability types and the unicity of detection methods result in the limitation of the vulnerability detection achievement. The main streams of the research on vulnerability detection methods are static detection and dynamic detection. Static detection includes document analysis, cross validation, and program analysis, etc. With the natural language processing is rising and the knowledge is booming, the researchers explore the possibility of vulnerability detection on multiple data resources with the help of natural language processing technique. In this paper, the literatures are classified into four parts which are official document, code, code comment and the vulnerability-related information based on the sources of information. Firstly, we extract the technical details and classify the research achievement by conducting an investigation on the related researches of the vulnerability detection methods based on natural language processing in recent 10 years, and then we summarize the relative merits of the research achievement by comparing and analyzing the researches originated from various data sources. Finally, through conducting cross comparison and in-depth exploration researches, we conclude eight types of limitations of vulnerability detection methods based on natural language processing and then discuss the solutions on the level of data, technique and effect, and meanwhile propose the future research trends. © 2022, Science Press. All right reserved.
引用
收藏
页码:2649 / 2666
页数:17
相关论文
共 117 条
  • [21] Levy O, Goldberg Y., Dependency-based word embeddings, Proc of the 52nd Annual Meeting of the Association for Computational Linguistics, 2, (2014)
  • [22] Rahutomo F, Kitasuka T, Aritsugi M., Semantic cosine similarity, Proc of the 7th Int Student Conf on Advanced Science and Technology(ICAST), pp. 1-2, (2012)
  • [23] Pennington J, Socher R, Manning C D., Glove: Global vectors for word representation, Proc of the 19th Conf on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532-1543, (2014)
  • [24] Peters M E, Neumann M, Iyyer M, Et al., Deep contextualized word representations, (2018)
  • [25] Mnih V, Heess N, Graves A, Et al., Recurrent models of visual attention, (2014)
  • [26] Bahdanau D, Cho K, Bengio Y., Neural machine translation by jointly learning to align and translate[J], (2014)
  • [27] Vaswani A, Shazeer N, Parmar N, Et al., Attention is all you need, (2017)
  • [28] Devlin J, Chang M W, Lee K, Et al., Bert: Pre-training of deep bidirectional transformers for language understanding, (2018)
  • [29] Misra K, Ettinger A, Rayz J T., Exploring BERT's sensitivity to lexical cues using tests from semantic priming, (2020)
  • [30] Joshi C, Singh U K, Tarey K., A review on taxonomies of attacks and vulnerability in computer and network system, International Journal, 5, 1, pp. 1-6, (2015)