BioWordVec, improving biomedical word embeddings with subword information and MeSH

被引:248
|
作者
Zhang, Yijia [1 ,2 ]
Chen, Qingyu [1 ]
Yang, Zhihao [2 ]
Lin, Hongfei [2 ]
Lu, Zhiyong [1 ]
机构
[1] NIH, NCBI, NLM, Bethesda, MD 20894 USA
[2] Dalian Univ Technol, Sch Comp Sci & Technol, Dalian 116023, Liaoning, Peoples R China
关键词
DRUG INTERACTION EXTRACTION;
D O I
10.1038/s41597-019-0055-0
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Distributed word representations have become an essential foundation for biomedical natural language processing (BioNLP), text mining and information retrieval. Word embeddings are traditionally computed at the word level from a large corpus of unlabeled text, ignoring the information present in the internal structure of words or any information available in domain specific structured resources such as ontologies. However, such information holds potentials for greatly improving the quality of the word representation, as suggested in some recent studies in the general domain. Here we present BioWordVec: an open set of biomedical word vectors/embeddings that combines subword information from unlabeled biomedical text with a widely-used biomedical controlled vocabulary called Medical Subject Headings (MeSH). We assess both the validity and utility of our generated word embeddings over multiple NLP tasks in the biomedical domain. Our benchmarking results demonstrate that our word embeddings can result in significantly improved performance over the previous state of the art in those challenging tasks.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] Improving Document Ranking with Dual Word Embeddings
    Nalisnick, Eric
    Mitra, Bhaskar
    Craswell, Nick
    Caruana, Rich
    PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'16 COMPANION), 2016, : 83 - 84
  • [32] Improving semantic change analysis by combining word embeddings and word frequencies
    Englhardt, Adrian
    Willkomm, Jens
    Schaeler, Martin
    Boehm, Klemens
    INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES, 2020, 21 (03) : 247 - 264
  • [33] Improving semantic change analysis by combining word embeddings and word frequencies
    Adrian Englhardt
    Jens Willkomm
    Martin Schäler
    Klemens Böhm
    International Journal on Digital Libraries, 2020, 21 : 247 - 264
  • [34] Improving Cross-Domain Chinese Word Segmentation with Word Embeddings
    Ye, Yuxiao
    Zhang, Yue
    Li, Weikang
    Qiu, Likun
    Sun, Jian
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 2726 - 2735
  • [35] On Using Composite Word Embeddings To Improve Biomedical Term Similarity
    Singh, Abhishek
    Jin, Wei
    2020 IEEE 20TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE 2020), 2020, : 281 - 287
  • [36] Enriching Portuguese Word Embeddings with Visual Information
    Consoli, Bernardo Scapini
    Vieira, Renata
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022, 2022, 13208 : 435 - 440
  • [37] Extracting Biomedical Event with Dual Decomposition Integrating Word Embeddings
    Li, Lishuang
    Liu, Shanshan
    Qin, Meiyue
    Wang, Yiwen
    Huang, Degen
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2016, 13 (04) : 669 - 677
  • [38] Specializing Word Embeddings (for Parsing) by Information Bottleneck
    Li, Xiang Lisa
    Eisner, Jason
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 4745 - 4749
  • [39] Learning Word Embeddings Using Spatial Information
    Joko, Hideaki
    Oka, Ryunosuke
    Uchide, Hayato
    Itsui, Hiroyasu
    Otsuka, Takahiro
    2019 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2019, : 2959 - 2964
  • [40] Keyphrase extraction in biomedical publications using mesh and intraphrase word co-occurrence information
    School of Computer Science and Engineering, Soongsil University, Seoul 156-743, Korea, Republic of
    Int Conf Inf Knowledge Manage, (63-66):