Chinese Cyber Threat Intelligence Named Entity Recognition via RoBERTa-wwm-RDCNN-CRF

被引:3
|
作者
Zhen, Zhen [1 ]
Gao, Jian [1 ,2 ]
机构
[1] Peoples Publ Secur Univ China, Sch Informat Network Secur, Beijing 100038, Peoples R China
[2] Minist Publ Secur, Key Lab Safety Precaut & Risk Assessment, Beijing, Peoples R China
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2023年 / 77卷 / 01期
关键词
Cybersecurity; cyber threat intelligence; named entity recognition;
D O I
10.32604/cmc.2023.042090
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, cyber attacks have been intensifying and causing great harm to individuals, companies, and countries. The mining of cyber threat intelligence (CTI) can facilitate intelligence integration and serve well in combating cyber attacks. Named Entity Recognition (NER), as a crucial component of text mining, can structure complex CTI text and aid cybersecurity professionals in effectively countering threats. However, current CTI NER research has mainly focused on studying English CTI. In the limited studies conducted on Chinese text, existing models have shown poor performance. To fully utilize the power of Chinese pre-trained language models (PLMs) and conquer the problem of lengthy infrequent English words mixing in the Chinese CTIs, we propose a residual dilated convolutional neural network (RDCNN) with a conditional random field (CRF) based on a robustly optimized bidirectional encoder representation from transformers pre-training approach with whole word masking (RoBERTa-wwm), abbreviated as RoBERTa-wwm-RDCNN-CRF. We are the first to experiment on the relevant open source dataset and achieve an F1-score of 82.35%, which exceeds the common baseline model bidirectional encoder representation from transformers (BERT)-bidirectional long short-term memory (BiLSTM)-CRF in this field by about 19.52% and exceeds the current state-of-the-art model, BERT-RDCNN-CRF, by about 3.53%. In addition, we conducted an ablation study on the encoder part of the model to verify the effectiveness of the proposed model and an in-depth investigation of the PLMs and encoder part of the model to verify the effectiveness of the proposed model. The RoBERTa-wwm-RDCNN-CRF model, the shared pre-processing, and augmentation methods can serve the subsequent fundamental tasks such as cybersecurity information extraction and knowledge graph construction, contributing to important applications in downstream tasks such as intrusion detection and advanced persistent threat (APT) attack detection.
引用
收藏
页码:299 / 323
页数:25
相关论文
共 50 条
  • [1] Named Entity Recognition for Chinese EMR with RoBERTa-WWM-BiLSTM-CRF
    Fangcong Z.
    Qiuli Q.
    Yong J.
    Runtao Z.
    Data Analysis and Knowledge Discovery, 2022, 6 (2-3) : 251 - 262
  • [2] Enhancing Cyber Threat Intelligence with Named Entity Recognition using BERT-CRF
    Chen, Sheng-Shan
    Hwang, Ren-Hung
    Sun, Chin-Yu
    Lin, Ying-Dar
    Pai, Tun-Wen
    IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 7532 - 7537
  • [3] An Effective Approach of Named Entity Recognition for Cyber Threat Intelligence
    Wu, Han
    Li, Xiaoyong
    Gao, Yali
    PROCEEDINGS OF 2020 IEEE 4TH INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2020), 2020, : 1370 - 1374
  • [4] Named Entity Recognition of Chinese Crop Diseases and Pests Based on RoBERTa-wwm with Adversarial Training
    Liang, Jianqin
    Li, Daichao
    Lin, Yiting
    Wu, Sheng
    Huang, Zongcai
    AGRONOMY-BASEL, 2023, 13 (03):
  • [5] Named Entity Recognition in Cyber Threat Intelligence Using Transformer-based Models
    Evangelatos, Pavlos
    Iliou, Christos
    Mavropoulos, Thanassis
    Apostolou, Konstantinos
    Tsikrika, Theodora
    Vrochidis, Stefanos
    Kompatsiaris, Ioannis
    PROCEEDINGS OF THE 2021 IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND RESILIENCE (IEEE CSR), 2021, : 348 - 353
  • [6] Chinese Named Entity Recognition for IC Patent Domain Based on RoBERTa-wwm-ext, GCN and Efficient Global Pointer
    Lin, Yunxiao
    Tang, Jiahao
    Huang, Wenjun
    Ding, Yanyu
    Hu, Jianguo
    2024 5TH INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKS AND INTERNET OF THINGS, CNIOT 2024, 2024, : 234 - 240
  • [7] A Unified Model for Chinese Cyber Threat Intelligence Flat Entity and Nested Entity Recognition
    Yu, Jiayi
    Lu, Yuliang
    Zhang, Yongheng
    Xie, Yi
    Cheng, Mingjie
    Yang, Guozheng
    ELECTRONICS, 2024, 13 (21)
  • [8] Named Entity Recognition for Equipment Fault Diagnosis Based on RoBERTa-wwm-ext and Deep Learning Integration
    Gao, Feifei
    Zhang, Lin
    Wang, Wenfeng
    Zhang, Bo
    Liu, Wei
    Zhang, Jingyi
    Xie, Le
    ELECTRONICS, 2024, 13 (19)
  • [9] Research on Named Entity Recognition Method of Network Threat Intelligence
    Zhang, Keke
    Chen, Xu
    Jing, Yongjun
    Wang, Shuyang
    Tang, Lijun
    CYBER SECURITY, CNCERT 2022, 2022, 1699 : 213 - 224
  • [10] CRF-based Active Learning for Chinese Named Entity Recognition
    Yao, Lin
    Sun, Chengjie
    Li, Shaofeng
    Wang, Xiaolong
    Wang, Xuan
    2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 2009, : 1557 - +