Using cited references to improve the retrieval of related biomedical documents

被引：14

作者：

Ortuno, Francisco M. ^{[1
]}

Rojas, Ignacio ^{[1
]}

Andrade-Navarro, Miguel A. ^{[2
]}

Fontaine, Jean-Fred ^{[2
]}

机构：

[1] Univ Granada, Comp Architecture & Comp Technol Dept, E-18071 Granada, Spain

[2] Max Delbruck Ctr Mol Med, D-13125 Berlin, Germany

来源：

BMC BIOINFORMATICS | 2013年 / 14卷

关键词：

Information retrieval; Text categorization; Citations; Full-text documents; Biomedical literature; Query expansion; Document classification; INFORMATION-RETRIEVAL; PROBABILISTIC MODEL; FULL-TEXT; ARTICLES; RANKING; CITATIONS; DATABASE; SEARCH; TERMS;

D O I：

10.1186/1471-2105-14-113

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Background: A popular query from scientists reading a biomedical abstract is to search for topic-related documents in bibliographic databases. Such a query is challenging because the amount of information attached to a single abstract is little, whereas classification-based retrieval algorithms are optimally trained with large sets of relevant documents. As a solution to this problem, we propose a query expansion method that extends the information related to a manuscript using its cited references. Results: Data on cited references and text sections in 249,108 full-text biomedical articles was extracted from the Open Access subset of the PubMed Central (R) database (PMC-OA). Of the five standard sections of a scientific article, the Introduction and Discussion sections contained most of the citations (mean = 10.2 and 9.9 citations, respectively). A large proportion of articles (98.4%) and their cited references (79.5%) were indexed in the PubMed (R) database. Using the MedlineRanker abstract classification tool, cited references allowed accurate retrieval of the citing document in a test set of 10,000 documents and also of documents related to six biomedical topics defined by particular MeSH (R) terms from the entire PMC-OA (p-value<0.01). Classification performance was sensitive to the topic and also to the text sections from which the references were selected. Classifiers trained on the baseline (i.e., only text from the query document and not from the references) were outperformed in almost all the cases. Best performance was often obtained when using all cited references, though using the references from Introduction and Discussion sections led to similarly good results. This query expansion method performed significantly better than pseudo relevance feedback in 4 out of 6 topics. Conclusions: The retrieval of documents related to a single document can be significantly improved by using the references cited by this document (p-value<0.01). Using references from Introduction and Discussion performs almost as well as using all references, which might be useful for methods that require reduced datasets due to computational limitations. Cited references from particular sections might not be appropriate for all topics. Our method could be a better alternative to pseudo relevance feedback though it is limited by full text availability.

引用

页数：12

共 50 条

[41] Using Content Based Image Retrieval Techniques for the Indexing and Retrieval of Thai Handwritten Documents
Sangsawad, Seksan
Fung, Chun Che
2010 2ND INTERNATIONAL CONFERENCE ON COMPUTER AND AUTOMATION ENGINEERING (ICCAE 2010), VOL 1, 2010, : 98 - 101
[42] Open-Vocabulary Spoken-Document Retrieval Based on Query Expansion Using Related Web Documents
Terao, Makoto
Koshinaka, Takafumi
Ando, Shinichi
Isotani, Ryosuke
Okumura, Akitoshi
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2171 - 2174
[43] Improved writer retrieval in handwritten documents using hybrid combination
Bouibed, Mohamed Lamine
Nemmour, Hassiba
Arab, Naouel
Chibani, Youcef
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (26) : 68671 - 68695
[44] Document retrieval using feedback of non-relevant documents
Murata, Hiroshi
Onoda, Takashi
Yamada, Seiji
NEW FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2007, 3609 : 205 - +
[45] Implementing Word Retrieval in Handwritten Documents using a Small Dataset
Liang, Y.
Guest, R. M.
Fairhurst, M. C.
13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012), 2012, : 728 - 733
[46] Semantics based information retrieval using conceptual indexing of documents
Manjula, D
Kulandaiyan, S
Sudarshan, S
Francis, A
Geetha, TV
INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING, 2003, 2690 : 685 - 692
[47] Toward a retrieval of HTML']HTML documents using a semantic approach
Ferri, F
Ghiselli, C
Grifoni, P
Padula, M
2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 1571 - 1574
[48] Using proximity and tag weights for focused retrieval in structured documents
Beigbeder, Michel
Gery, Mathias
Largeron, Christine
KNOWLEDGE AND INFORMATION SYSTEMS, 2015, 44 (01) : 51 - 76
[49] Using proximity and tag weights for focused retrieval in structured documents
Michel Beigbeder
Mathias Géry
Christine Largeron
Knowledge and Information Systems, 2015, 44 : 51 - 76
[50] Artificial Intelligent Information Retrieval Using Assigning Context of Documents
Liu Yong-Min
Cheng Shu
NSWCTC 2009: INTERNATIONAL CONFERENCE ON NETWORKS SECURITY, WIRELESS COMMUNICATIONS AND TRUSTED COMPUTING, VOL 2, PROCEEDINGS, 2009, : 592 - +

← 1 2 3 4 5 →