DeepMeSH: deep semantic representation for improving large-scale MeSH indexing

被引:84
|
作者
Peng, Shengwen [1 ,2 ]
You, Ronghui [1 ,2 ]
Wang, Hongning [3 ]
Zhai, Chengxiang [4 ]
Mamitsuka, Hiroshi [5 ,6 ]
Zhu, Shanfeng [1 ,2 ,7 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai 200433, Peoples R China
[2] Fudan Univ, Shanghai Key Lab Intelligent Informat Proc, Shanghai 200433, Peoples R China
[3] Univ Virginia, Dept Comp Sci, Charlottesville, VA 22904 USA
[4] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
[5] Kyoto Univ, Inst Chem Res, Bioinformat Ctr, Uji 6110011, Japan
[6] Aalto Univ, Dept Comp Sci, Espoo, Finland
[7] Fudan Univ, Ctr Computat Syst Biol, Shanghai 200433, Peoples R China
基金
美国国家科学基金会; 中国国家自然科学基金; 美国国家卫生研究院;
关键词
LIBRARY;
D O I
10.1093/bioinformatics/btw294
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Medical Subject Headings (MeSH) indexing, which is to assign a set of MeSH main headings to citations, is crucial for many important tasks in biomedical text mining and information retrieval. Large-scale MeSH indexing has two challenging aspects: the citation side and MeSH side. For the citation side, all existing methods, including Medical Text Indexer (MTI) by National Library of Medicine and the state-of-the-art method, MeSHLabeler, deal with text by bag-of-words, which cannot capture semantic and context-dependent information well. Methods: We propose DeepMeSH that incorporates deep semantic information for large-scale MeSH indexing. It addresses the two challenges in both citation and MeSH sides. The citation side challenge is solved by a new deep semantic representation, D2V-TFIDF, which concatenates both sparse and dense semantic representations. The MeSH side challenge is solved by using the 'learning to rank' framework of MeSHLabeler, which integrates various types of evidence generated from the new semantic representation. Results: DeepMeSH achieved a Micro F-measure of 0.6323, 2% higher than 0.6218 of MeSHLabeler and 12% higher than 0.5637 of MTI, for BioASQ3 challenge data with 6000 citations.
引用
收藏
页码:70 / 79
页数:10
相关论文
共 50 条
  • [1] FullMeSH: improving large-scale MeSH indexing with full text
    Dai, Suyang
    You, Ronghui
    Lu, Zhiyong
    Huang, Xiaodi
    Mamitsuka, Hiroshi
    Zhu, Shanfeng
    BIOINFORMATICS, 2020, 36 (05) : 1533 - 1541
  • [2] MeSHProbeNet-P: Improving Large-scale MeSH Indexing with Personalizable MeSH Probes
    Xun, Guangxu
    Jha, Kishlay
    Zhang, Aidong
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2021, 15 (01)
  • [3] BERTMeSH: deep contextual representation learning for large-scale high-performance MeSH indexing with full text
    You, Ronghui
    Liu, Yuxuan
    Mamitsuka, Hiroshi
    Zhu, Shanfeng
    BIOINFORMATICS, 2021, 37 (05) : 684 - 692
  • [4] MeSHLabeler: improving the accuracy of large-scale MeSH indexing by integrating diverse evidence
    Liu, Ke
    Peng, Shengwen
    Wu, Junqiu
    Zhai, Chengxiang
    Mamitsuka, Hiroshi
    Zhu, Shanfeng
    BIOINFORMATICS, 2015, 31 (12) : 339 - 347
  • [5] Large-scale information retrieval with latent semantic indexing
    Letsche, TA
    Berry, MW
    INFORMATION SCIENCES, 1997, 100 (1-4) : 105 - 137
  • [6] DeepText2GO: Improving large-scale protein function prediction with deep semantic text representation
    You, Ronghui
    Huang, Xiaodi
    Zhu, Shanfeng
    METHODS, 2018, 145 : 82 - 90
  • [7] DeepText2Go: Improving Large-scale Protein Function Prediction with Deep Semantic Text Representation
    You, Ronghui
    Zhu, Shanfeng
    2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2017, : 42 - 49
  • [8] Semantic Representation For Navigation In Large-Scale Environments
    Drouilly, Romain
    Rives, Patrick
    Morisset, Benoit
    2015 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2015, : 1106 - 1111
  • [9] High Throughput Indexing for Large-scale Semantic Web Data
    Cheng, Long
    Kotoulas, Spyros
    Ward, Tomas E.
    Theodoropoulos, Georgios
    30TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, VOLS I AND II, 2015, : 416 - 422
  • [10] A Fast Approximate Algorithm for Large-Scale Latent Semantic Indexing
    Zhang, Dell
    Zhu, Zheng
    2008 THIRD INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION MANAGEMENT, VOLS 1 AND 2, 2008, : 639 - 644