Identifying well-formed biomedical phrases in MEDLINE® text

被引:3
|
作者
Kim, Won [1 ]
Yeganova, Lana [1 ]
Comeau, Donald C. [1 ]
Wilbur, W. John [1 ]
机构
[1] Natl Lib Med, CBB, NCBI, NIH, Bethesda, MD 20894 USA
关键词
Machine learning; Imbalanced data; Biomedical phrases; Statistical phrase identification; Unified medical language system; Abbreviation full forms;
D O I
10.1016/j.jbi.2012.05.005
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In the modern world people frequently interact with retrieval systems to satisfy their information needs. Humanly understandable well-formed phrases represent a crucial interface between humans and the web, and the ability to index and search with such phrases is beneficial for human-web interactions. In this paper we consider the problem of identifying humanly understandable, well formed, and high quality biomedical phrases in MEDLINE documents. The main approaches used previously for detecting such phrases are syntactic, statistical, and a hybrid approach combining these two. In this paper we propose a supervised learning approach for identifying high quality phrases. First we obtain a set of known well-formed useful phrases from an existing source and label these phrases as positive. We then extract from MEDLINE a large set of multiword strings that do not contain stop words or punctuation. We believe this unlabeled set contains many well-formed phrases. Our goal is to identify these additional high quality phrases. We examine various feature combinations and several machine learning strategies designed to solve this problem. A proper choice of machine learning methods and features identifies in the large collection strings that are likely to be high quality phrases. We evaluate our approach by making human judgments on multiword strings extracted from MEDLINE using our methods. We find that over 85% of such extracted phrase candidates are humanly judged to be of high quality. Published by Elsevier Inc.
引用
收藏
页码:1035 / 1041
页数:7
相关论文
共 50 条
  • [41] XPEV: A storage model for well-formed XML documents
    Qin, J
    Zhao, SM
    Yang, SQ
    Dou, WH
    FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PT 1, PROCEEDINGS, 2005, 3613 : 360 - 369
  • [42] Generating Label Cohesive and Well-Formed Adversarial Claims
    Atanasova, Pepa
    Wright, Dustin
    Augenstein, Isabelle
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 3168 - 3177
  • [43] ELEMENTARY STUDENTS COMPREHENSION OF ANAPHORA IN WELL-FORMED STORIES
    JOHNSON, BE
    JOHNSON, DD
    JOURNAL OF EDUCATIONAL RESEARCH, 1985, 78 (04): : 221 - 223
  • [44] Mathematical and Musical Properties of Pairwise Well-Formed Scales
    Clampitt, David
    MATHEMATICS AND COMPUTATION IN MUSIC, 2009, 37 : 464 - 468
  • [45] Parallel algorithms for listing well-formed parentheses strings
    Vajnovszki, Vincent
    Pallo, Jean
    Parallel Processing Letters, 1998, 8 (01): : 19 - 28
  • [46] CONSTRUCTION OF WELL-FORMED PETRI NETS FROM STANDARD SUBNETS
    DRUZHININ, VA
    YUDITSKII, SA
    AUTOMATION AND REMOTE CONTROL, 1992, 53 (12) : 1922 - 1927
  • [47] The classification of smooth well-formed Fano weighted complete intersections
    Ovcharenko, Mikhail
    INTERNATIONAL JOURNAL OF MATHEMATICS, 2023, 34 (11)
  • [48] Ranking and Unranking of Well-formed Parenthesis Strings: A Unified Approach
    Wu, Ro-Yu
    Chang, Jou-Ming
    Chen, An-Hang
    Liu, Chun-Liang
    CHIANG MAI JOURNAL OF SCIENCE, 2012, 39 (04): : 648 - 659
  • [49] Two shortest path metrics on well-formed parentheses strings
    Germain, C
    Pallo, J
    INFORMATION PROCESSING LETTERS, 1996, 60 (06) : 283 - 287
  • [50] Efficient storing well-formed XML documents using RDBMS
    Qin, J
    Zhao, SM
    Yang, SQ
    Dou, WH
    2005 INTERNATIONAL CONFERENCE ON SERVICES SYSTEMS AND SERVICES MANAGEMENT, VOLS 1 AND 2, PROCEEDINGS, 2005, : 1075 - 1080