Chinese Cyber Threat Intelligence Named Entity Recognition via RoBERTa-wwm-RDCNN-CRF

被引:3
|
作者
Zhen, Zhen [1 ]
Gao, Jian [1 ,2 ]
机构
[1] Peoples Publ Secur Univ China, Sch Informat Network Secur, Beijing 100038, Peoples R China
[2] Minist Publ Secur, Key Lab Safety Precaut & Risk Assessment, Beijing, Peoples R China
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2023年 / 77卷 / 01期
关键词
Cybersecurity; cyber threat intelligence; named entity recognition;
D O I
10.32604/cmc.2023.042090
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, cyber attacks have been intensifying and causing great harm to individuals, companies, and countries. The mining of cyber threat intelligence (CTI) can facilitate intelligence integration and serve well in combating cyber attacks. Named Entity Recognition (NER), as a crucial component of text mining, can structure complex CTI text and aid cybersecurity professionals in effectively countering threats. However, current CTI NER research has mainly focused on studying English CTI. In the limited studies conducted on Chinese text, existing models have shown poor performance. To fully utilize the power of Chinese pre-trained language models (PLMs) and conquer the problem of lengthy infrequent English words mixing in the Chinese CTIs, we propose a residual dilated convolutional neural network (RDCNN) with a conditional random field (CRF) based on a robustly optimized bidirectional encoder representation from transformers pre-training approach with whole word masking (RoBERTa-wwm), abbreviated as RoBERTa-wwm-RDCNN-CRF. We are the first to experiment on the relevant open source dataset and achieve an F1-score of 82.35%, which exceeds the common baseline model bidirectional encoder representation from transformers (BERT)-bidirectional long short-term memory (BiLSTM)-CRF in this field by about 19.52% and exceeds the current state-of-the-art model, BERT-RDCNN-CRF, by about 3.53%. In addition, we conducted an ablation study on the encoder part of the model to verify the effectiveness of the proposed model and an in-depth investigation of the PLMs and encoder part of the model to verify the effectiveness of the proposed model. The RoBERTa-wwm-RDCNN-CRF model, the shared pre-processing, and augmentation methods can serve the subsequent fundamental tasks such as cybersecurity information extraction and knowledge graph construction, contributing to important applications in downstream tasks such as intrusion detection and advanced persistent threat (APT) attack detection.
引用
收藏
页码:299 / 323
页数:25
相关论文
共 50 条
  • [31] Chinese Named Entity Recognition via Joint Identification and Categorization
    Zhou Junsheng
    Qu Weiguang
    Zhang Fen
    CHINESE JOURNAL OF ELECTRONICS, 2013, 22 (02): : 225 - 230
  • [32] Threat intelligence named entity recognition techniques based on few-shot learning
    Wang, Haiyan
    Yang, Weimin
    Feng, Wenying
    Zeng, Liyi
    Gu, Zhaoquan
    ARRAY, 2024, 23
  • [33] An Attention-Based BiLSTM-CRF Model for Chinese Clinic Named Entity Recognition
    Wu, Guohua
    Tang, Guangen
    Wang, Zhongru
    Zhang, Zhen
    Wang, Zhen
    IEEE ACCESS, 2019, 7 (113942-113949) : 113942 - 113949
  • [34] An RG-FLAT-CRF Model for Named Entity Recognition of Chinese Electronic Clinical Records
    Li, Jiakang
    Liu, Ruixia
    Chen, Changfang
    Zhou, Shuwang
    Shang, Xiaoyi
    Wang, Yinglong
    ELECTRONICS, 2022, 11 (08)
  • [35] Chinese clinical named entity recognition via multi-head self-attention based BiLSTM-CRF
    An, Ying
    Xia, Xianyun
    Chen, Xianlai
    Wu, Fang-Xiang
    Wang, Jianxin
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2022, 127
  • [36] Multi-features based Semantic Augmentation Networks for Named Entity Recognition in Threat Intelligence
    Liu, Peipei
    Li, Hong
    Wang, Zuoguang
    Liu, Jie
    Ren, Yimo
    Zhu, Hongsong
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 1557 - 1563
  • [37] A Chinese Named Entity Recognition Method Based on ERNIE-BiLSTM-CRF for Food Safety Domain
    Yuan, Taiping
    Qin, Xizhong
    Wei, Chunji
    APPLIED SCIENCES-BASEL, 2023, 13 (05):
  • [38] A Multi-Task BERT-BiLSTM-AM-CRF Strategy for Chinese Named Entity Recognition
    Xiaoyong Tang
    Yong Huang
    Meng Xia
    Chengfeng Long
    Neural Processing Letters, 2023, 55 : 1209 - 1229
  • [39] Product named entity recognition for Chinese query questions based on a skip-chain CRF model
    Hao, Zhifeng
    Wang, Hongfei
    Cai, Ruichu
    Wen, Wen
    NEURAL COMPUTING & APPLICATIONS, 2013, 23 (02): : 371 - 379
  • [40] Effects of Hyper-parameters Setting in Bi-LSTM-CRF on Chinese Named Entity Recognition
    Zhang, Taozheng
    Ma, Pingping
    FOURTEENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING, ICGIP 2022, 2022, 12705