A Comparative Study of Deep Learning based Named Entity Recognition Algorithms for Cybersecurity

被引:16
|
作者
Dasgupta, Soham [2 ]
Piplai, Aritran [1 ]
Kotal, Anantaa [1 ]
Joshi, Anupam [1 ]
机构
[1] Univ Maryland Baltimore Cty, Dept Comp Sci & Elect Engn, Baltimore, MD 21228 USA
[2] Mallya Aditi Int Sch, Bengaluru, Karnataka, India
关键词
Named Entity Recognition; Deep Learning; Cybersecurity; Artificial Intelligence;
D O I
10.1109/BigData50022.2020.9378482
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Named Entity Recognition (NER) is important in the cybersecurity domain. It helps researchers extract cyber threat information from unstructured text sources. The extracted cyber-entities or key expressions can be used to model a cyber-attack described in an open-source text. A large number of generalpurpose NER algorithms have been published that work well in text analysis. These algorithms do not perform well when applied to the cybersecurity domain. In the field of cybersecurity, the open-source text available varies greatly in complexity and underlying structure of the sentences. General-purpose NER algorithms can misrepresent domain-specific words, such as "malicious" and "javascript". In this paper, we compare the recent deep learning-based NER algorithms on a cybersecurity dataset. We created a cybersecurity dataset collected from various sources, including "Microsoft Security Bulletin" and "Adobe Security Updates". Some of these approaches proposed in literature were not used for Cybersecurity. Others are innovations proposed by us. This comparative study helps us identify the NER algorithms that are robust and can work well in sentences taken from a large number of cybersecurity sources. We tabulate their performance on the test set and identify the best NER algorithm for a cybersecurity corpus. We also discuss the different embedding strategies that aid in the process of NER for the chosen deep learning algorithms.
引用
收藏
页码:2596 / 2604
页数:9
相关论文
共 50 条
  • [31] Cybersecurity Named Entity Recognition Using Multi-Modal Ensemble Learning
    Yi, Feng
    Jiang, Bo
    Wang, Lu
    Wu, Jianjun
    IEEE ACCESS, 2020, 8 : 63214 - 63224
  • [32] Vocabulary Matters: An Annotation Pipeline and Four Deep Learning Algorithms for Enzyme Named Entity Recognition
    Wang, Meiqi
    Vijayaraghavan, Avish
    Beck, Tim
    Posma, Joram M.
    JOURNAL OF PROTEOME RESEARCH, 2024, 23 (06) : 1915 - 1925
  • [33] Deep learning-based methods for natural hazard named entity recognition
    Junlin Sun
    Yanrong Liu
    Jing Cui
    Handong He
    Scientific Reports, 12
  • [34] Language model based on deep learning network for biomedical named entity recognition
    Hou, Guan
    Jian, Yuhao
    Zhao, Qingqing
    Quan, Xiongwen
    Zhang, Han
    METHODS, 2024, 226 : 71 - 77
  • [35] A Method of Network Attack Named Entity Recognition based on Deep Active Learning
    Wang, Li
    Ma, Yunxiao
    Li, Mingyue
    Li, Hua
    Zhang, Peilong
    2024 IEEE 24TH INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY, QRS, 2024, : 376 - 387
  • [36] Deep learning-based methods for natural hazard named entity recognition
    Sun, Junlin
    Liu, Yanrong
    Cui, Jing
    He, Handong
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [37] Named Entity Recognition for Amharic Using Stack-Based Deep Learning
    Sikdar, Utpal Kumar
    Gambac, Bjorn
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2017), PT I, 2018, 10761 : 276 - 287
  • [38] A Comparative Study of Named Entity Recognition for Arabic Using Ensemble Learning Approaches
    El bazi, Ismail
    Laachfoubi, Nabil
    2015 IEEE/ACS 12TH INTERNATIONAL CONFERENCE OF COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2015,
  • [39] Textual adversarial attacks in cybersecurity named entity recognition
    Jiang, Tian
    Liu, Yunqi
    Cui, Xiaohui
    COMPUTERS & SECURITY, 2025, 150
  • [40] A deep learning method for named entity recognition in bidding document
    Ji, Yunfei
    Tong, Chao
    Liang, Jun
    Yang, Xi
    Zhao, Zheng
    Wang, Xu
    2018 INTERNATIONAL CONFERENCE ON COMPUTER INFORMATION SCIENCE AND APPLICATION TECHNOLOGY, 2019, 1168