Towards a unified search: Improving PubMed retrieval with full text

被引：3

作者：

Kim W. ^{[1
]}

Yeganova L. ^{[1
]}

Comeau D.C. ^{[1
]}

Wilbur W.J. ^{[1
]}

Lu Z. ^{[1
]}

机构：

[1] National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, 20894, MD

来源：

Journal of Biomedical Informatics | 2022年 / 134卷

基金：

美国国家卫生研究院;

关键词：

Combining abstract with full text; Full text search; Information retrieval; PubMed search engine; Search relevance gold standard;

D O I：

10.1016/j.jbi.2022.104211

中图分类号：

学科分类号：

摘要：

Objective: A significant number of recent articles in PubMed have full text available in PubMed Central®, and the availability of full texts has been consistently growing. However, it is not currently possible for a user to simultaneously query the contents of both databases and receive a single integrated search result. In this study, we investigate how to score full text articles given a multitoken query and how to combine those full text article scores with scores originating from abstracts and achieve an overall improved retrieval performance. Materials and methods: For scoring full text articles, we propose a method to combine information coming from different sections by converting the traditionally used BM25 scores into log odds ratio scores which can be treated uniformly. We further propose a method that successfully combines scores from two heterogenous retrieval sources – full text articles and abstract only articles – by balancing the contributions of their respective scores through a probabilistic transformation. We use PubMed click data that consists of queries sampled from PubMed user logs along with a subset of retrieved and clicked documents to train the probabilistic functions and to evaluate retrieval effectiveness. Results and conclusions: Random ranking achieves 0.579 MAP score on our PubMed click data. BM25 ranking on PubMed abstracts improves the MAP by 10.6%. For full text documents, experiments confirm that BM25 section scores are of different value depending on the section type and are not directly comparable. Naïvely using the body text of articles along with abstract text degrades the overall quality of the search. The proposed log odds ratio scores normalize and combine the contributions of occurrences of query tokens in different sections. By including full text where available, we gain another 0.67%, or 7% relative improvement over abstract alone. We find an advantage in the more accurate estimate of the value of BM25 scores depending on the section from which they were produced. Taking the sum of top three section scores performs the best. © 2022

引用

共 50 条

[1] Improving Full Text Search with Text Mining Tools
Piao, Scott
Rea, Brian
McNaught, John
Ananiadou, Sophia
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, 2010, 5723 : 301 - 302
[2] A method for improving full text search using signature files
Yamakawa, Y
Fuketa, M
Morita, K
Aoe, J
INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 2001, 77 (01) : 73 - 88
[3] MIIS-BASED FULL TEXT SEARCH RETRIEVAL-SYSTEM
VASTA, BM
CASEY, SM
JOHNSON, NS
ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1985, 189 (APR-): : 31 - CINF
[4] An exploratory analysis of PubMed's free full-text limit on citation retrieval for clinical questions
Krieger, Mary M.
Richter, Randy R.
Austin, Tricia M.
JOURNAL OF THE MEDICAL LIBRARY ASSOCIATION, 2008, 96 (04) : 351 - 355
[5] Towards Unified Text-based Person Retrieval: A Large-scale Multi-Attribute and Language Search Benchmark
Yang, Shuyu
Zhou, Yinan
Zheng, Zhedong
Wang, Yaxiong
Zhu, Li
Wu, Yujiao
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4492 - 4501
[6] Impact of PubMed search filters on the retrieval of evidence by physicians
Shariff, Salimah Z.
Sontrop, Jessica M.
Haynes, R. Brian
Iansavichus, Arthur V.
McKibbon, K. Ann
Wilczynski, Nancy L.
Weir, Matthew A.
Speechley, Mark R.
Thind, Amardeep
Garg, Amit X.
CANADIAN MEDICAL ASSOCIATION JOURNAL, 2012, 184 (03) : E184 - E190
[7] Using advanced search tools on PubMed for citation retrieval
Sood, A
Erwin, PJ
Ebbert, JO
MAYO CLINIC PROCEEDINGS, 2004, 79 (10) : 1295 - 1299
[8] IMPROVING FULL-TEXT SEARCH PERFORMANCE THROUGH TEXTUAL ANALYSIS
MOLTO, M
INFORMATION PROCESSING & MANAGEMENT, 1993, 29 (05) : 615 - 632
[9] FULL TEXT RETRIEVAL FROM STRUCTURED TEXT
GOLDSTEIN, CM
BULLETIN OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1989, 15 (06): : 11 - 11
[10] A Comparative Study of Text Classification Approaches for Personalized Retrieval in PubMed
Pitigala, Sachintha
Li, Cen
Seo, Suk
2011 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOPS, 2011, : 919 - 921

← 1 2 3 4 5 →