Subsequence and distant supervision based active learning for relation extraction of Chinese medical texts

被引:1
|
作者
Ye, Qi [1 ]
Cai, Tingting [1 ]
Ji, Xiang [1 ]
Ruan, Tong [1 ]
Zheng, Hong [1 ]
机构
[1] East China Univ Sci & Technol, Sch Informat Sci & Technol, Shanghai 200237, Peoples R China
关键词
Active learning; Sequence tagging; Relation extraction; Distant supervision; Medical texts;
D O I
10.1186/s12911-023-02127-1
中图分类号
R-058 [];
学科分类号
摘要
In recent years, relation extraction on unstructured texts has become an important task in medical research. However, relation extraction requires a large amount of labeled corpus, manually annotating sequences is time consuming and expensive. Therefore, efficient and economical methods for annotating sequences are required to ensure the performance of relational extraction. This paper proposes a method of subsequence and distant supervision based active learning. The method is annotated by selecting information-rich subsequences as a sampling unit instead of the full sentences in traditional active learning. Additionally, the method saves the labeled subsequence texts and their corresponding labels in a dictionary which is continuously updated and maintained, and pre-labels the unlabeled set through text matching based on the idea of distant supervision. Finally, the method combines a Chinese-RoBERTa-CRF model for relation extraction in Chinese medical texts. Experimental results test on the CMeIE dataset achieves the best performance compared to existing methods. And the best F1 value obtained between different sampling strategies is 55.96%.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Distant Supervision for Relation Extraction beyond the Sentence Boundary
    Quirk, Chris
    Poon, Hoifung
    15TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2017), VOL 1: LONG PAPERS, 2017, : 1171 - 1182
  • [22] Risks of misinterpretation in the evaluation of Distant Supervision for Relation Extraction
    Garcia-Mendoza, Juan-Luis
    Villasenor-Pineda, Luis
    Orihuela-Espina, Felipe
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2022, (68): : 71 - 83
  • [24] Distant Supervision for Relation Extraction via Sparse Representation
    Zeng, Daojian
    Lai, Siwei
    Wang, Xuepeng
    Liu, Kang
    Zhao, Jun
    Lv, Xueqiang
    CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, CCL 2014, 2014, 8801 : 151 - 162
  • [25] Adversarial Discriminative Denoising for Distant Supervision Relation Extraction
    Liu, Bing
    Gao, Huan
    Qi, Guilin
    Duan, Shangfu
    Wu, Tianxing
    Wang, Meng
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, 2019, 11448 : 282 - 286
  • [26] Distant Supervision for Relation Extraction via Group Selection
    Xiang, Yang
    Wang, Xiaolong
    Zhang, Yaoyun
    Qin, Yang
    Fan, Shixi
    NEURAL INFORMATION PROCESSING, PT II, 2015, 9490 : 250 - 258
  • [27] Relation Extraction from the Web Using Distant Supervision
    Augenstein, Isabelle
    Maynard, Diana
    Ciravegna, Fabio
    KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT, EKAW 2014, 2014, 8876 : 26 - 41
  • [28] Distant supervision for relation extraction with hierarchical selective attention
    Zhou, Peng
    Xu, Jiaming
    Qi, Zhenyu
    Bao, Hongyun
    Chen, Zhineng
    Xu, Bo
    NEURAL NETWORKS, 2018, 108 : 240 - 247
  • [29] Combining Distant and Direct Supervision for Neural Relation Extraction
    Beltagy, Iz
    Lo, Kyle
    Ammar, Waleed
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 1858 - 1867
  • [30] Improving Distant Supervision of Relation Extraction with Unsupervised Methods
    Peng, Min
    Huang, Jimin
    Sun, Zhaoyu
    Wang, Shizhong
    Wang, Hua
    Zhuo, Guangping
    Tian, Gang
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2016, PT I, 2016, 10041 : 561 - 568