Vulnerability Detection Methods Based on Natural Language Processing

被引:0
|
作者
Yang Y. [1 ,2 ]
Li Y. [1 ,2 ]
Chen K. [1 ,2 ,3 ]
机构
[1] State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, Beijing
[2] School of Cyber Security, University of Chinese Academy of Sciences, Beijing
[3] Beijing Academy of Artificial Intelligence, Beijing
基金
中国国家自然科学基金;
关键词
Natural language processing; Security; Static detection; Survey; Vulnerability detection;
D O I
10.7544/issn1000-1239.20210627
中图分类号
学科分类号
摘要
With the number of the official reported vulnerabilities is exponentially increasing, the researches aiming at the techniques of vulnerability detection is arising. The diversity of vulnerability types and the unicity of detection methods result in the limitation of the vulnerability detection achievement. The main streams of the research on vulnerability detection methods are static detection and dynamic detection. Static detection includes document analysis, cross validation, and program analysis, etc. With the natural language processing is rising and the knowledge is booming, the researchers explore the possibility of vulnerability detection on multiple data resources with the help of natural language processing technique. In this paper, the literatures are classified into four parts which are official document, code, code comment and the vulnerability-related information based on the sources of information. Firstly, we extract the technical details and classify the research achievement by conducting an investigation on the related researches of the vulnerability detection methods based on natural language processing in recent 10 years, and then we summarize the relative merits of the research achievement by comparing and analyzing the researches originated from various data sources. Finally, through conducting cross comparison and in-depth exploration researches, we conclude eight types of limitations of vulnerability detection methods based on natural language processing and then discuss the solutions on the level of data, technique and effect, and meanwhile propose the future research trends. © 2022, Science Press. All right reserved.
引用
收藏
页码:2649 / 2666
页数:17
相关论文
共 117 条
  • [1] Anquanke, The security trend analysis report of CVE vulnerability
  • [2] CVE details
  • [3] Zhong Hao, Zhang Lu, Xie Tao, Et al., Inferring specifications for resources from natural language API documentation, Automated Software Engineering, 18, 3, pp. 227-261, (2011)
  • [4] Tan Lin, Yuan Ding, Krishna G, Et al., iComment: Bugs or bad comments?, Proc of the 21st ACM Symp on Operating Systems Principles (SOSP 2007), pp. 145-158, (2007)
  • [5] You Wei, Zong Peiyuan, Chen Kai, Et al., SemFuzz: Semantics-based automatic generation of proof-of-concept exploits, Proc of the 24th ACM SIGSAC Conf on Computer and Communications Security(CCS), pp. 2139-2154, (2017)
  • [6] Feng Xuan, LiaoXiaojing, Wang Xiaofeng, Et al., Understanding and securing device vulnerabilities through automated bug report analysis, Proc of the 28th USENIX Security Symp, pp. 887-903, (2019)
  • [7] Jia Peiyang, Sun Hongyu, Cao Wanying, Et al., Open source software vulnerability data base overview, Journal of Information Security Research, 7, 6, pp. 566-574, (2021)
  • [8] Li Yun, Huang Chenling, Wang Zhongfeng, Et al., Survey of software vulnerability mining methods based on machine learning, Journal of Software, 31, 7, pp. 2040-2061, (2020)
  • [9] Kubler S, McDonald R, Nivre J., Dependency parsing, Synthesis Lectures on Human Language Technologies, 1, 1, pp. 1-127, (2009)
  • [10] Xu Na, Ma Lin, Wang Li, Et al., Extracting domain knowledge elements of construction safety management: Rule-based approach using chinese natural language processing[J/OL], Journal of Management in Engineering, (2021)