BioWordVec, improving biomedical word embeddings with subword information and MeSH

被引:248
|
作者
Zhang, Yijia [1 ,2 ]
Chen, Qingyu [1 ]
Yang, Zhihao [2 ]
Lin, Hongfei [2 ]
Lu, Zhiyong [1 ]
机构
[1] NIH, NCBI, NLM, Bethesda, MD 20894 USA
[2] Dalian Univ Technol, Sch Comp Sci & Technol, Dalian 116023, Liaoning, Peoples R China
关键词
DRUG INTERACTION EXTRACTION;
D O I
10.1038/s41597-019-0055-0
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Distributed word representations have become an essential foundation for biomedical natural language processing (BioNLP), text mining and information retrieval. Word embeddings are traditionally computed at the word level from a large corpus of unlabeled text, ignoring the information present in the internal structure of words or any information available in domain specific structured resources such as ontologies. However, such information holds potentials for greatly improving the quality of the word representation, as suggested in some recent studies in the general domain. Here we present BioWordVec: an open set of biomedical word vectors/embeddings that combines subword information from unlabeled biomedical text with a widely-used biomedical controlled vocabulary called Medical Subject Headings (MeSH). We assess both the validity and utility of our generated word embeddings over multiple NLP tasks in the biomedical domain. Our benchmarking results demonstrate that our word embeddings can result in significantly improved performance over the previous state of the art in those challenging tasks.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Quality of word and concept embeddings in targetted biomedical domains
    Giancani, Salvatore
    Albertoni, Riccardo
    Catalano, Chiara Eva
    HELIYON, 2023, 9 (06)
  • [22] Biomedical entities recognition in Spanish combining word embeddings
    Lopez-Ubeda, Pilar
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2022, (68): : 149 - 152
  • [23] Word embeddings for biomedical natural language processing: A survey
    Chiu, Billy
    Baker, Simon
    LANGUAGE AND LINGUISTICS COMPASS, 2020, 14 (12):
  • [24] Enhancing biomedical word embeddings by retrofitting to verb clusters
    Chiu, Billy
    Baker, Simon
    Palmer, Martha
    Korhonen, Anna
    SIGBIOMED WORKSHOP ON BIOMEDICAL NATURAL LANGUAGE PROCESSING (BIONLP 2019), 2019, : 125 - 134
  • [25] A comparison of word embeddings for the biomedical natural language processing
    Wang, Yanshan
    Liu, Sijia
    Afzal, Naveed
    Rastegar-Mojarad, Majid
    Wang, Liwei
    Shen, Feichen
    Kingsbury, Paul
    Liu, Hongfang
    JOURNAL OF BIOMEDICAL INFORMATICS, 2018, 87 : 12 - 20
  • [26] Improving Word Embeddings Using Kernel PCA
    Gupta, Vishwani
    Giesselbach, Sven
    Rueping, Stefan
    Bauckhage, Christian
    4TH WORKSHOP ON REPRESENTATION LEARNING FOR NLP (REPL4NLP-2019), 2019, : 200 - 208
  • [27] Improving biterm topic model with word embeddings
    Jiajia Huang
    Min Peng
    Pengwei Li
    Zhiwei Hu
    Chao Xu
    World Wide Web, 2020, 23 : 3099 - 3124
  • [28] Improving biterm topic model with word embeddings
    Huang, Jiajia
    Peng, Min
    Li, Pengwei
    Hu, Zhiwei
    Xu, Chao
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2020, 23 (06): : 3099 - 3124
  • [29] Improving semantic similarity retrieval with word embeddings
    Yan, Fengqi
    Fan, Qiaoqing
    Lu, Mingming
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (23):
  • [30] Word Embeddings for improving REST services discoverability
    Lizarralde, Ignacio
    Rodriguez, Juan Manuel
    Mateos, Cristian
    Zunino, Alejandro
    2017 XLIII LATIN AMERICAN COMPUTER CONFERENCE (CLEI), 2017,