Chinese Cyber Threat Intelligence Named Entity Recognition via RoBERTa-wwm-RDCNN-CRF

被引：3

作者：

Zhen, Zhen ^{[1
]}

Gao, Jian ^{[1
,2
]}

机构：

[1] Peoples Publ Secur Univ China, Sch Informat Network Secur, Beijing 100038, Peoples R China

[2] Minist Publ Secur, Key Lab Safety Precaut & Risk Assessment, Beijing, Peoples R China

来源：

CMC-COMPUTERS MATERIALS & CONTINUA | 2023年 / 77卷 / 01期

关键词：

Cybersecurity; cyber threat intelligence; named entity recognition;

D O I：

10.32604/cmc.2023.042090

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In recent years, cyber attacks have been intensifying and causing great harm to individuals, companies, and countries. The mining of cyber threat intelligence (CTI) can facilitate intelligence integration and serve well in combating cyber attacks. Named Entity Recognition (NER), as a crucial component of text mining, can structure complex CTI text and aid cybersecurity professionals in effectively countering threats. However, current CTI NER research has mainly focused on studying English CTI. In the limited studies conducted on Chinese text, existing models have shown poor performance. To fully utilize the power of Chinese pre-trained language models (PLMs) and conquer the problem of lengthy infrequent English words mixing in the Chinese CTIs, we propose a residual dilated convolutional neural network (RDCNN) with a conditional random field (CRF) based on a robustly optimized bidirectional encoder representation from transformers pre-training approach with whole word masking (RoBERTa-wwm), abbreviated as RoBERTa-wwm-RDCNN-CRF. We are the first to experiment on the relevant open source dataset and achieve an F1-score of 82.35%, which exceeds the common baseline model bidirectional encoder representation from transformers (BERT)-bidirectional long short-term memory (BiLSTM)-CRF in this field by about 19.52% and exceeds the current state-of-the-art model, BERT-RDCNN-CRF, by about 3.53%. In addition, we conducted an ablation study on the encoder part of the model to verify the effectiveness of the proposed model and an in-depth investigation of the PLMs and encoder part of the model to verify the effectiveness of the proposed model. The RoBERTa-wwm-RDCNN-CRF model, the shared pre-processing, and augmentation methods can serve the subsequent fundamental tasks such as cybersecurity information extraction and knowledge graph construction, contributing to important applications in downstream tasks such as intrusion detection and advanced persistent threat (APT) attack detection.

引用

页码：299 / 323

页数：25

共 50 条

[41] Product named entity recognition for Chinese query questions based on a skip-chain CRF model
Zhifeng Hao
Hongfei Wang
Ruichu Cai
Wen Wen
Neural Computing and Applications, 2013, 23 : 371 - 379
[42] Five-Stroke Based CNN-BiRNN-CRF Network for Chinese Named Entity Recognition
Yang, Fan
Zhang, Jianhu
Liu, Gongshen
Zhou, Jie
Zhou, Cheng
Sun, Huanrong
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT I, 2018, 11108 : 184 - 195
[43] A Multi-Task BERT-BiLSTM-AM-CRF Strategy for Chinese Named Entity Recognition
Tang, Xiaoyong
Huang, Yong
Xia, Meng
Long, Chengfeng
NEURAL PROCESSING LETTERS, 2023, 55 (02) : 1209 - 1229
[44] Contrastive learning for nested Chinese Named Entity Recognition via template words
Wang, Yuke
Liu, Qiao
Dai, Tingting
Lang, Junjie
Lu, Ling
Chen, Yinong
CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2025,
[45] Character-Based LSTM-CRF with Radical-Level Features for Chinese Named Entity Recognition
Dong, Chuanhai
Zhang, Jiajun
Zong, Chengqing
Hattori, Masanori
Di, Hui
NATURAL LANGUAGE UNDERSTANDING AND INTELLIGENT APPLICATIONS (NLPCC 2016), 2016, 10102 : 239 - 250
[46] Named Entity Recognition in Traditional Chinese Medicine Clinical Cases Combining BiLSTM-CRF with Knowledge Graph
Jin, Zhe
Zhang, Yin
Kuang, Haodan
Yao, Liang
Zhang, Wenjin
Pan, Yunhe
KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2019, PT I, 2019, 11775 : 537 - 548
[47] Fine-Grained Chinese Named Entity Recognition Based on MacBERT-Attn-BiLSTM-CRF Model
Wang, Jueyang
Li, Shuzhen
Agyemang-Duah, Edward
Feng, Xingyu
Xu, Chun
Ji, Yuao
Liu, Junqiang
2022 IEEE 12TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2022, : 125 - 131
[48] A multi-feature fusion method based on bilstm-attention-crf for chinese named entity recognition
Zhang, Zhiyuan
Sun, Shuihua
Xu, Shiao
Xu, Fan
Liu, Jianhua
Journal of Network Intelligence, 2021, 6 (03): : 518 - 534
[49] Leverage Lexical Knowledge for Chinese Named Entity Recognition via Collaborative Graph Network
Sui, Dianbo
Chen, Yubo
Liu, Kang
Zhao, Jun
Liu, Shengping
2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 3830 - 3840
[50] Leveraging Lexical Features for Chinese Named Entity Recognition via Static and Dynamic Weighting
Zhang, Dong
Chi, Chengying
Zhan, Xuegang
IAENG International Journal of Computer Science, 2021, 48 (01)

← 1 2 3 4 5 →