Towards a unified search: Improving PubMed retrieval with full text

被引:3
|
作者
Kim W. [1 ]
Yeganova L. [1 ]
Comeau D.C. [1 ]
Wilbur W.J. [1 ]
Lu Z. [1 ]
机构
[1] National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, 20894, MD
基金
美国国家卫生研究院;
关键词
Combining abstract with full text; Full text search; Information retrieval; PubMed search engine; Search relevance gold standard;
D O I
10.1016/j.jbi.2022.104211
中图分类号
学科分类号
摘要
Objective: A significant number of recent articles in PubMed have full text available in PubMed Central®, and the availability of full texts has been consistently growing. However, it is not currently possible for a user to simultaneously query the contents of both databases and receive a single integrated search result. In this study, we investigate how to score full text articles given a multitoken query and how to combine those full text article scores with scores originating from abstracts and achieve an overall improved retrieval performance. Materials and methods: For scoring full text articles, we propose a method to combine information coming from different sections by converting the traditionally used BM25 scores into log odds ratio scores which can be treated uniformly. We further propose a method that successfully combines scores from two heterogenous retrieval sources – full text articles and abstract only articles – by balancing the contributions of their respective scores through a probabilistic transformation. We use PubMed click data that consists of queries sampled from PubMed user logs along with a subset of retrieved and clicked documents to train the probabilistic functions and to evaluate retrieval effectiveness. Results and conclusions: Random ranking achieves 0.579 MAP score on our PubMed click data. BM25 ranking on PubMed abstracts improves the MAP by 10.6%. For full text documents, experiments confirm that BM25 section scores are of different value depending on the section type and are not directly comparable. Naïvely using the body text of articles along with abstract text degrades the overall quality of the search. The proposed log odds ratio scores normalize and combine the contributions of occurrences of query tokens in different sections. By including full text where available, we gain another 0.67%, or 7% relative improvement over abstract alone. We find an advantage in the more accurate estimate of the value of BM25 scores depending on the section from which they were produced. Taking the sum of top three section scores performs the best. © 2022
引用
收藏
相关论文
共 50 条
  • [1] Improving Full Text Search with Text Mining Tools
    Piao, Scott
    Rea, Brian
    McNaught, John
    Ananiadou, Sophia
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, 2010, 5723 : 301 - 302
  • [2] A method for improving full text search using signature files
    Yamakawa, Y
    Fuketa, M
    Morita, K
    Aoe, J
    INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 2001, 77 (01) : 73 - 88
  • [3] MIIS-BASED FULL TEXT SEARCH RETRIEVAL-SYSTEM
    VASTA, BM
    CASEY, SM
    JOHNSON, NS
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1985, 189 (APR-): : 31 - CINF
  • [4] An exploratory analysis of PubMed's free full-text limit on citation retrieval for clinical questions
    Krieger, Mary M.
    Richter, Randy R.
    Austin, Tricia M.
    JOURNAL OF THE MEDICAL LIBRARY ASSOCIATION, 2008, 96 (04) : 351 - 355
  • [5] Towards Unified Text-based Person Retrieval: A Large-scale Multi-Attribute and Language Search Benchmark
    Yang, Shuyu
    Zhou, Yinan
    Zheng, Zhedong
    Wang, Yaxiong
    Zhu, Li
    Wu, Yujiao
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4492 - 4501
  • [6] Impact of PubMed search filters on the retrieval of evidence by physicians
    Shariff, Salimah Z.
    Sontrop, Jessica M.
    Haynes, R. Brian
    Iansavichus, Arthur V.
    McKibbon, K. Ann
    Wilczynski, Nancy L.
    Weir, Matthew A.
    Speechley, Mark R.
    Thind, Amardeep
    Garg, Amit X.
    CANADIAN MEDICAL ASSOCIATION JOURNAL, 2012, 184 (03) : E184 - E190
  • [7] Using advanced search tools on PubMed for citation retrieval
    Sood, A
    Erwin, PJ
    Ebbert, JO
    MAYO CLINIC PROCEEDINGS, 2004, 79 (10) : 1295 - 1299
  • [8] IMPROVING FULL-TEXT SEARCH PERFORMANCE THROUGH TEXTUAL ANALYSIS
    MOLTO, M
    INFORMATION PROCESSING & MANAGEMENT, 1993, 29 (05) : 615 - 632
  • [9] FULL TEXT RETRIEVAL FROM STRUCTURED TEXT
    GOLDSTEIN, CM
    BULLETIN OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1989, 15 (06): : 11 - 11
  • [10] A Comparative Study of Text Classification Approaches for Personalized Retrieval in PubMed
    Pitigala, Sachintha
    Li, Cen
    Seo, Suk
    2011 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOPS, 2011, : 919 - 921