Chinese Cyber Threat Intelligence Named Entity Recognition via RoBERTa-wwm-RDCNN-CRF

被引:3
|
作者
Zhen, Zhen [1 ]
Gao, Jian [1 ,2 ]
机构
[1] Peoples Publ Secur Univ China, Sch Informat Network Secur, Beijing 100038, Peoples R China
[2] Minist Publ Secur, Key Lab Safety Precaut & Risk Assessment, Beijing, Peoples R China
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2023年 / 77卷 / 01期
关键词
Cybersecurity; cyber threat intelligence; named entity recognition;
D O I
10.32604/cmc.2023.042090
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, cyber attacks have been intensifying and causing great harm to individuals, companies, and countries. The mining of cyber threat intelligence (CTI) can facilitate intelligence integration and serve well in combating cyber attacks. Named Entity Recognition (NER), as a crucial component of text mining, can structure complex CTI text and aid cybersecurity professionals in effectively countering threats. However, current CTI NER research has mainly focused on studying English CTI. In the limited studies conducted on Chinese text, existing models have shown poor performance. To fully utilize the power of Chinese pre-trained language models (PLMs) and conquer the problem of lengthy infrequent English words mixing in the Chinese CTIs, we propose a residual dilated convolutional neural network (RDCNN) with a conditional random field (CRF) based on a robustly optimized bidirectional encoder representation from transformers pre-training approach with whole word masking (RoBERTa-wwm), abbreviated as RoBERTa-wwm-RDCNN-CRF. We are the first to experiment on the relevant open source dataset and achieve an F1-score of 82.35%, which exceeds the common baseline model bidirectional encoder representation from transformers (BERT)-bidirectional long short-term memory (BiLSTM)-CRF in this field by about 19.52% and exceeds the current state-of-the-art model, BERT-RDCNN-CRF, by about 3.53%. In addition, we conducted an ablation study on the encoder part of the model to verify the effectiveness of the proposed model and an in-depth investigation of the PLMs and encoder part of the model to verify the effectiveness of the proposed model. The RoBERTa-wwm-RDCNN-CRF model, the shared pre-processing, and augmentation methods can serve the subsequent fundamental tasks such as cybersecurity information extraction and knowledge graph construction, contributing to important applications in downstream tasks such as intrusion detection and advanced persistent threat (APT) attack detection.
引用
收藏
页码:299 / 323
页数:25
相关论文
共 50 条
  • [41] Product named entity recognition for Chinese query questions based on a skip-chain CRF model
    Zhifeng Hao
    Hongfei Wang
    Ruichu Cai
    Wen Wen
    Neural Computing and Applications, 2013, 23 : 371 - 379
  • [42] Five-Stroke Based CNN-BiRNN-CRF Network for Chinese Named Entity Recognition
    Yang, Fan
    Zhang, Jianhu
    Liu, Gongshen
    Zhou, Jie
    Zhou, Cheng
    Sun, Huanrong
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT I, 2018, 11108 : 184 - 195
  • [43] A Multi-Task BERT-BiLSTM-AM-CRF Strategy for Chinese Named Entity Recognition
    Tang, Xiaoyong
    Huang, Yong
    Xia, Meng
    Long, Chengfeng
    NEURAL PROCESSING LETTERS, 2023, 55 (02) : 1209 - 1229
  • [44] Contrastive learning for nested Chinese Named Entity Recognition via template words
    Wang, Yuke
    Liu, Qiao
    Dai, Tingting
    Lang, Junjie
    Lu, Ling
    Chen, Yinong
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2025,
  • [45] Character-Based LSTM-CRF with Radical-Level Features for Chinese Named Entity Recognition
    Dong, Chuanhai
    Zhang, Jiajun
    Zong, Chengqing
    Hattori, Masanori
    Di, Hui
    NATURAL LANGUAGE UNDERSTANDING AND INTELLIGENT APPLICATIONS (NLPCC 2016), 2016, 10102 : 239 - 250
  • [46] Named Entity Recognition in Traditional Chinese Medicine Clinical Cases Combining BiLSTM-CRF with Knowledge Graph
    Jin, Zhe
    Zhang, Yin
    Kuang, Haodan
    Yao, Liang
    Zhang, Wenjin
    Pan, Yunhe
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2019, PT I, 2019, 11775 : 537 - 548
  • [47] Fine-Grained Chinese Named Entity Recognition Based on MacBERT-Attn-BiLSTM-CRF Model
    Wang, Jueyang
    Li, Shuzhen
    Agyemang-Duah, Edward
    Feng, Xingyu
    Xu, Chun
    Ji, Yuao
    Liu, Junqiang
    2022 IEEE 12TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2022, : 125 - 131
  • [48] A multi-feature fusion method based on bilstm-attention-crf for chinese named entity recognition
    Zhang, Zhiyuan
    Sun, Shuihua
    Xu, Shiao
    Xu, Fan
    Liu, Jianhua
    Journal of Network Intelligence, 2021, 6 (03): : 518 - 534
  • [49] Leverage Lexical Knowledge for Chinese Named Entity Recognition via Collaborative Graph Network
    Sui, Dianbo
    Chen, Yubo
    Liu, Kang
    Zhao, Jun
    Liu, Shengping
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 3830 - 3840
  • [50] Leveraging Lexical Features for Chinese Named Entity Recognition via Static and Dynamic Weighting
    Zhang, Dong
    Chi, Chengying
    Zhan, Xuegang
    IAENG International Journal of Computer Science, 2021, 48 (01)