A knowledge extraction framework for domain-specific application with simplified pre-trained language model and attention-based feature extractor

被引：2

作者：

Zhang, Jian ^{[1
]}

Qin, Bo ^{[1
]}

Zhang, Yufei ^{[1
]}

Zhou, Junhua ^{[2
]}

Wang, Hongwei ^{[1
]}

机构：

[1] Zhejiang Univ, ZJU UIUC Inst, Haining 314400, Zhejiang, Peoples R China

[2] Beijing Inst Elect Syst Engn, Beijing Simulat Ctr, Beijing 100000, Peoples R China

来源：

SERVICE ORIENTED COMPUTING AND APPLICATIONS | 2022年 / 16卷 / 02期

关键词：

Knowledge extraction; Named entity recognition; Pre-trained language model; Attention mechanism;

D O I：

10.1007/s11761-022-00337-5

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

With the advancement of industrial informatics, intelligent algorithms are increasingly applied in various industrial products and applications. In this paper, we proposed a knowledge extraction framework for domain-specific text. This framework can extract entities from text the subsequent tasks such as knowledge graph construction. The proposed framework contains three modules, namely domain feature pre-trained model, LSTM-based named entity recognition and the attention-based nested named entity recognition. The domain feature pre-trained model can effectively learn the features of domain corpus such as professional terms that are not included in the general domain corpus. Flat named entity recognition can use the vector from pre-trained model to obtain the entity from domain-specific text. The nested named entity recognition based on the attention mechanism and the weight sliding balance strategy can effectively identify entity types with higher nesting rates. The framework achieves good results in the field of nuclear power plant maintenance reports, and the methods for domain pre-trained model and LSTM-based flat named entity recognition have been successfully applied to practical tasks.

引用

页码：121 / 131

页数：11

共 37 条

[1] A knowledge extraction framework for domain-specific application with simplified pre-trained language model and attention-based feature extractor
Jian Zhang
Bo Qin
Yufei Zhang
Junhua Zhou
Hongwei Wang
Service Oriented Computing and Applications, 2022, 16 : 121 - 131
[2] Domain-specific language models pre-trained on construction management systems corpora
Zhong, Yunshun
Goodfellow, Sebastian D.
AUTOMATION IN CONSTRUCTION, 2024, 160 (160)
[3] Continual Learning with Bayesian Model Based on a Fixed Pre-trained Feature Extractor
Yang, Yang
Cui, Zhiying
Xu, Junjie
Zhong, Changhong
Wang, Ruixuan
Zheng, Wei-Shi
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT V, 2021, 12905 : 397 - 406
[4] Continual learning with Bayesian model based on a fixed pre-trained feature extractor
Yang Yang
Zhiying Cui
Junjie Xu
Changhong Zhong
Wei-Shi Zheng
Ruixuan Wang
Visual Intelligence, 1 (1):
[5] Schema matching based on energy domain pre-trained language model
Pan Z.
Yang M.
Monti A.
Energy Informatics, 2023, 6 (Suppl 1)
[6] Predicting medical specialty from text based on a domain-specific pre-trained BERT
Kim, Yoojoong
Kim, Jong-Ho
Kim, Young-Min
Song, Sanghoun
Joo, Hyung Joon
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2023, 170
[7] SOAP classifier for free-text clinical notes with domain-specific pre-trained language models
de Oliveira, Jezer Machado
Antunes, Rodolfo Stoffel
da Costa, Cristiano Andre
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 245
[8] A Pre-trained Language Model for Medical Question Answering Based on Domain Adaption
Liu, Lang
Ren, Junxiang
Wu, Yuejiao
Song, Ruilin
Cheng, Zhen
Wang, Sibo
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT II, 2022, 13552 : 216 - 227
[9] Commonsense Knowledge Base Completion with Relational Graph Attention Network and Pre-trained Language Model
Ju, Jinghao
Yang, Deqing
Liu, Jingping
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 4104 - 4108
[10] NMT Enhancement based on Knowledge Graph Mining with Pre-trained Language Model
Yang, Hao
Qin, Ying
Deng, Yao
Wang, Minghan
2020 22ND INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT): DIGITAL SECURITY GLOBAL AGENDA FOR SAFE SOCIETY!, 2020, : 185 - 189

← 1 2 3 4 →