Enhancing Cyber Threat Intelligence with Named Entity Recognition using BERT-CRF

被引:3
|
作者
Chen, Sheng-Shan [1 ]
Hwang, Ren-Hung [2 ]
Sun, Chin-Yu [1 ]
Lin, Ying-Dar [3 ]
Pai, Tun-Wen [1 ]
机构
[1] Natl Taipei Univ Technol, Dept Comp Sci & Informat Engn, Taipei, Taiwan
[2] Natl Yang Ming Chiao Tung Univ, Coll Artificial Intelligence, Tainan, Taiwan
[3] Natl Yang Ming Chiao Tung Univ, Dept Comp, Hsinchu, Taiwan
关键词
cyber threat intelligence; deep learning; cyber security;
D O I
10.1109/GLOBECOM54140.2023.10436853
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Cyber Threat Intelligence (CTI) helps organizations understand the tactics, techniques, and procedures used by potential cyber criminals to defend against cyber threats. To protect the core systems and services of organizations, security analysts must analyze information about threats and vulnerabilities. However, analyzing large amounts of data requires significant time and effort. To streamline this process, we propose an enhanced architecture, BERT-CRF, by removing the BiLSTM layer from the conventional BERT-BiLSTM-CRF model. This model leverages the strengths of deep learning-based language models to extract critical threat intelligence and novel information from threats effectively. In our BERT-CRF model, the token embeddings generated by BERT are directly fed into the Conditional Random Field (CRF) layer for efficient Named Entity Recognition (NER), thus preventing the need for an intermediate BiLSTM layer. We train and evaluate the model with three publicly available threat entity databases. We also collect open-source threat intelligence data from recent years for evaluating the applicability of the constructed model in a real-world environment. Furthermore, we compare our model with the most popular GPT-3.5 and the most downloaded open-source BERT question-and-answer models. Through this study, our proposed model demonstrated robust usability and outperformed other models, signifying its potential for application in CTI. In a real-world scenario, our model achieved an accuracy of 82.64%, while with malware-specific threat intelligence data, it achieved an impressive accuracy of 93.95%. The code for this research is publicly available at https://github.com/stwater20/ner bert crf open version.
引用
收藏
页码:7532 / 7537
页数:6
相关论文
共 50 条
  • [31] DNRTI: A Large-scale Dataset for Named Entity Recognition in Threat Intelligence
    Wang, Xuren
    Liu, Xinpei
    Ao, Shengqin
    Li, Ning
    Jiang, Zhengwei
    Xu, Zongyi
    Xiong, Zihan
    Xiong, Mengbo
    Zhang, Xiaoqing
    2020 IEEE 19TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2020), 2020, : 1842 - 1848
  • [32] Enhanced Crime and Threat Intelligence Hunter with Named Entity Recognition and Sentiment Analysis
    Ng, James H.
    Loh, Peter K. K.
    SOFT COMPUTING FOR SECURITY APPLICATIONS, ICSCS 2022, 2023, 1428 : 299 - 313
  • [33] Chinese named entity recognition model based on BERT
    Liu, Hongshuai
    Jun, Ge
    Zheng, Yuanyuan
    2020 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE COMMUNICATION AND NETWORK SECURITY (CSCNS2020), 2021, 336
  • [34] A review on cyber security named entity recognition
    Gao, Chen
    Zhang, Xuan
    Han, Mengting
    Liu, Hui
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2021, 22 (09) : 1153 - 1168
  • [35] LSTM-CRF Models for Named Entity Recognition
    Lee, Changki
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2017, E100D (04): : 882 - 887
  • [36] ATBBC: Named entity recognition in emergency domains based on joint BERT-BILSTM-CRF adversarial training
    Cai, Buqing
    Tian, Shengwei
    Yu, Long
    Long, Jun
    Zhou, Tiejun
    Wang, Bo
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2024, 46 (02) : 4063 - 4076
  • [37] Fine-tuned BERT-BiLSTM-CRF approach for named entity recognition in geological disaster texts
    Yange Li
    Li Luo
    XinRui Zeng
    Zheng Han
    Earth Science Informatics, 2025, 18 (2)
  • [38] Text Summarization based Named Entity Recognition for Certain Application using BERT
    Tummala, Indira Priyadarshini
    2024 SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT CYBER PHYSICAL SYSTEMS AND INTERNET OF THINGS, ICOICI 2024, 2024, : 1136 - 1141
  • [39] Biomedical named entity recognition using BERT in the machine reading comprehension framework
    Sun, Cong
    Yang, Zhihao
    Wang, Lei
    Zhang, Yin
    Lin, Hongfei
    Wang, Jian
    JOURNAL OF BIOMEDICAL INFORMATICS, 2021, 118
  • [40] Named Entity Recognition Using BERT with Whole World Masking in Cybersecurity Domain
    Zhou, Shicheng
    Liu, Jingju
    Zhong, Xiaofeng
    Zhao, Wendian
    2021 IEEE 6TH INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS (ICBDA 2021), 2021, : 316 - 320