Towards a unified search: Improving PubMed retrieval with full text

被引:3
|
作者
Kim W. [1 ]
Yeganova L. [1 ]
Comeau D.C. [1 ]
Wilbur W.J. [1 ]
Lu Z. [1 ]
机构
[1] National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, 20894, MD
基金
美国国家卫生研究院;
关键词
Combining abstract with full text; Full text search; Information retrieval; PubMed search engine; Search relevance gold standard;
D O I
10.1016/j.jbi.2022.104211
中图分类号
学科分类号
摘要
Objective: A significant number of recent articles in PubMed have full text available in PubMed Central®, and the availability of full texts has been consistently growing. However, it is not currently possible for a user to simultaneously query the contents of both databases and receive a single integrated search result. In this study, we investigate how to score full text articles given a multitoken query and how to combine those full text article scores with scores originating from abstracts and achieve an overall improved retrieval performance. Materials and methods: For scoring full text articles, we propose a method to combine information coming from different sections by converting the traditionally used BM25 scores into log odds ratio scores which can be treated uniformly. We further propose a method that successfully combines scores from two heterogenous retrieval sources – full text articles and abstract only articles – by balancing the contributions of their respective scores through a probabilistic transformation. We use PubMed click data that consists of queries sampled from PubMed user logs along with a subset of retrieved and clicked documents to train the probabilistic functions and to evaluate retrieval effectiveness. Results and conclusions: Random ranking achieves 0.579 MAP score on our PubMed click data. BM25 ranking on PubMed abstracts improves the MAP by 10.6%. For full text documents, experiments confirm that BM25 section scores are of different value depending on the section type and are not directly comparable. Naïvely using the body text of articles along with abstract text degrades the overall quality of the search. The proposed log odds ratio scores normalize and combine the contributions of occurrences of query tokens in different sections. By including full text where available, we gain another 0.67%, or 7% relative improvement over abstract alone. We find an advantage in the more accurate estimate of the value of BM25 scores depending on the section from which they were produced. Taking the sum of top three section scores performs the best. © 2022
引用
收藏
相关论文
共 50 条
  • [21] RESEARCH INTO FULL-TEXT RETRIEVAL
    OJALA, M
    DATABASE, 1990, 13 (04): : 78 - 80
  • [22] FULL-TEXT ONLINE RETRIEVAL
    COLBERT, AW
    ONLINE, 1988, 12 (02): : 91 - 91
  • [23] ZYINDEX - FULL TEXT RETRIEVAL POWER
    HOLLAND, MP
    ONLINE, 1985, 9 (04): : 38 - 42
  • [24] Full Text Retrieval Using PowerBuilder
    Wang, Xianbing
    2010 INTERNATIONAL CONFERENCE ON INFORMATION, ELECTRONIC AND COMPUTER SCIENCE, VOLS 1-3, 2010, : 1930 - 1932
  • [25] FULL-TEXT INFORMATION RETRIEVAL
    FAY, RJ
    LAW LIBRARY JOURNAL, 1971, 64 (02): : 167 - 175
  • [26] Harvesting for full-text retrieval
    Simeoni, F
    Yakici, M
    Neely, S
    Crestani, F
    DIGITAL LIBRARIES: IMPLEMENTING STRATEGIES AND SHARING EXPERIENCES, PROCEEDINGS, 2005, 3815 : 204 - 213
  • [27] FULL TEXT DATABASE RETRIEVAL PERFORMANCE
    TENOPIR, C
    ONLINE REVIEW, 1985, 9 (02): : 149 - 164
  • [28] META: A Unified Toolkit for Text Retrieval and Analysis
    Massung, Sean
    Geigle, Chase
    Zhai, ChengXiang
    PROCEEDINGS OF 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL-2016): SYSTEM DEMONSTRATIONS, 2016, : 91 - 96
  • [29] A unified approach towards text recognition
    Hong, T
    Hull, JJ
    Srihari, SN
    DOCUMENT RECOGNITION III, 1996, 2660 : 27 - 36
  • [30] Improving text memory by organizing interfering text at retrieval
    Mann, T
    Brenner, LA
    AMERICAN JOURNAL OF PSYCHOLOGY, 1996, 109 (04): : 539 - 549