A knowledge extraction framework for domain-specific application with simplified pre-trained language model and attention-based feature extractor

被引:2
|
作者
Zhang, Jian [1 ]
Qin, Bo [1 ]
Zhang, Yufei [1 ]
Zhou, Junhua [2 ]
Wang, Hongwei [1 ]
机构
[1] Zhejiang Univ, ZJU UIUC Inst, Haining 314400, Zhejiang, Peoples R China
[2] Beijing Inst Elect Syst Engn, Beijing Simulat Ctr, Beijing 100000, Peoples R China
关键词
Knowledge extraction; Named entity recognition; Pre-trained language model; Attention mechanism;
D O I
10.1007/s11761-022-00337-5
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
With the advancement of industrial informatics, intelligent algorithms are increasingly applied in various industrial products and applications. In this paper, we proposed a knowledge extraction framework for domain-specific text. This framework can extract entities from text the subsequent tasks such as knowledge graph construction. The proposed framework contains three modules, namely domain feature pre-trained model, LSTM-based named entity recognition and the attention-based nested named entity recognition. The domain feature pre-trained model can effectively learn the features of domain corpus such as professional terms that are not included in the general domain corpus. Flat named entity recognition can use the vector from pre-trained model to obtain the entity from domain-specific text. The nested named entity recognition based on the attention mechanism and the weight sliding balance strategy can effectively identify entity types with higher nesting rates. The framework achieves good results in the field of nuclear power plant maintenance reports, and the methods for domain pre-trained model and LSTM-based flat named entity recognition have been successfully applied to practical tasks.
引用
收藏
页码:121 / 131
页数:11
相关论文
共 37 条
  • [1] A knowledge extraction framework for domain-specific application with simplified pre-trained language model and attention-based feature extractor
    Jian Zhang
    Bo Qin
    Yufei Zhang
    Junhua Zhou
    Hongwei Wang
    Service Oriented Computing and Applications, 2022, 16 : 121 - 131
  • [2] Domain-specific language models pre-trained on construction management systems corpora
    Zhong, Yunshun
    Goodfellow, Sebastian D.
    AUTOMATION IN CONSTRUCTION, 2024, 160 (160)
  • [3] Continual Learning with Bayesian Model Based on a Fixed Pre-trained Feature Extractor
    Yang, Yang
    Cui, Zhiying
    Xu, Junjie
    Zhong, Changhong
    Wang, Ruixuan
    Zheng, Wei-Shi
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT V, 2021, 12905 : 397 - 406
  • [4] Continual learning with Bayesian model based on a fixed pre-trained feature extractor
    Yang Yang
    Zhiying Cui
    Junjie Xu
    Changhong Zhong
    Wei-Shi Zheng
    Ruixuan Wang
    Visual Intelligence, 1 (1):
  • [5] Schema matching based on energy domain pre-trained language model
    Pan Z.
    Yang M.
    Monti A.
    Energy Informatics, 2023, 6 (Suppl 1)
  • [6] Predicting medical specialty from text based on a domain-specific pre-trained BERT
    Kim, Yoojoong
    Kim, Jong-Ho
    Kim, Young-Min
    Song, Sanghoun
    Joo, Hyung Joon
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2023, 170
  • [7] SOAP classifier for free-text clinical notes with domain-specific pre-trained language models
    de Oliveira, Jezer Machado
    Antunes, Rodolfo Stoffel
    da Costa, Cristiano Andre
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 245
  • [8] A Pre-trained Language Model for Medical Question Answering Based on Domain Adaption
    Liu, Lang
    Ren, Junxiang
    Wu, Yuejiao
    Song, Ruilin
    Cheng, Zhen
    Wang, Sibo
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT II, 2022, 13552 : 216 - 227
  • [9] Commonsense Knowledge Base Completion with Relational Graph Attention Network and Pre-trained Language Model
    Ju, Jinghao
    Yang, Deqing
    Liu, Jingping
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 4104 - 4108
  • [10] NMT Enhancement based on Knowledge Graph Mining with Pre-trained Language Model
    Yang, Hao
    Qin, Ying
    Deng, Yao
    Wang, Minghan
    2020 22ND INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT): DIGITAL SECURITY GLOBAL AGENDA FOR SAFE SOCIETY!, 2020, : 185 - 189