BioWordVec, improving biomedical word embeddings with subword information and MeSH

被引:248
|
作者
Zhang, Yijia [1 ,2 ]
Chen, Qingyu [1 ]
Yang, Zhihao [2 ]
Lin, Hongfei [2 ]
Lu, Zhiyong [1 ]
机构
[1] NIH, NCBI, NLM, Bethesda, MD 20894 USA
[2] Dalian Univ Technol, Sch Comp Sci & Technol, Dalian 116023, Liaoning, Peoples R China
关键词
DRUG INTERACTION EXTRACTION;
D O I
10.1038/s41597-019-0055-0
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Distributed word representations have become an essential foundation for biomedical natural language processing (BioNLP), text mining and information retrieval. Word embeddings are traditionally computed at the word level from a large corpus of unlabeled text, ignoring the information present in the internal structure of words or any information available in domain specific structured resources such as ontologies. However, such information holds potentials for greatly improving the quality of the word representation, as suggested in some recent studies in the general domain. Here we present BioWordVec: an open set of biomedical word vectors/embeddings that combines subword information from unlabeled biomedical text with a widely-used biomedical controlled vocabulary called Medical Subject Headings (MeSH). We assess both the validity and utility of our generated word embeddings over multiple NLP tasks in the biomedical domain. Our benchmarking results demonstrate that our word embeddings can result in significantly improved performance over the previous state of the art in those challenging tasks.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] Analyzing word embeddings and improving POS tagger of Tigrinya
    Tedla, Yemane
    Yamamoto, Kazuhide
    2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 115 - 118
  • [42] Improving interpretability of word embeddings by generating definition and usage
    Zhang, Haitong
    Du, Yongping
    Sun, Jiaxin
    Li, Qingxiao
    EXPERT SYSTEMS WITH APPLICATIONS, 2020, 160 (160)
  • [43] Improving Word Embeddings via Combining with Complementary Languages
    Li, Changliang
    Xu, Bo
    Wu, Gaowei
    Zhuang, Tao
    Wang, Xiuying
    Ge, Wendong
    ADVANCES IN ARTIFICIAL INTELLIGENCE, CANADIAN AI 2014, 2014, 8436 : 313 - 318
  • [44] Improving word embeddings projection for Turkish hypernym extraction
    Yildirim, Savas
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2019, 27 (06) : 4418 - 4428
  • [45] Improving accuracy of an existing semantic word labelling tool using word embeddings
    Sanjurjo-Gonzalez, Hugo
    PROCEEDINGS OF 2021 16TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI'2021), 2021,
  • [46] Quantifying 60 Years of Gender Bias in Biomedical Research with Word Embeddings
    Rios, Anthony
    Joshi, Reenam
    Shin, Hejin
    19TH SIGBIOMED WORKSHOP ON BIOMEDICAL LANGUAGE PROCESSING (BIONLP 2020), 2020, : 1 - 13
  • [47] Deep learning with word embeddings improves biomedical named entity recognition
    Habibi, Maryam
    Weber, Leon
    Neves, Mariana
    Wiegandt, David Luis
    Leser, Ulf
    BIOINFORMATICS, 2017, 33 (14) : I37 - I48
  • [48] Thesaurus-based word embeddings for automated biomedical literature classification
    Dimitrios A. Koutsomitropoulos
    Andreas D. Andriopoulos
    Neural Computing and Applications, 2022, 34 : 937 - 950
  • [49] Training Word Embeddings for Deep Learning in Biomedical Text Mining Tasks
    Jiang, Zhenchao
    Li, Lishuang
    Huang, Degen
    Jin, Liuke
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2015, : 625 - 628
  • [50] Combining word embeddings to extract chemical and drug entities in biomedical literature
    Pilar López-Úbeda
    Manuel Carlos Díaz-Galiano
    L. Alfonso Ureña-López
    M. Teresa Martín-Valdivia
    BMC Bioinformatics, 22