Identifying well-formed biomedical phrases in MEDLINE® text

被引:3
|
作者
Kim, Won [1 ]
Yeganova, Lana [1 ]
Comeau, Donald C. [1 ]
Wilbur, W. John [1 ]
机构
[1] Natl Lib Med, CBB, NCBI, NIH, Bethesda, MD 20894 USA
关键词
Machine learning; Imbalanced data; Biomedical phrases; Statistical phrase identification; Unified medical language system; Abbreviation full forms;
D O I
10.1016/j.jbi.2012.05.005
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In the modern world people frequently interact with retrieval systems to satisfy their information needs. Humanly understandable well-formed phrases represent a crucial interface between humans and the web, and the ability to index and search with such phrases is beneficial for human-web interactions. In this paper we consider the problem of identifying humanly understandable, well formed, and high quality biomedical phrases in MEDLINE documents. The main approaches used previously for detecting such phrases are syntactic, statistical, and a hybrid approach combining these two. In this paper we propose a supervised learning approach for identifying high quality phrases. First we obtain a set of known well-formed useful phrases from an existing source and label these phrases as positive. We then extract from MEDLINE a large set of multiword strings that do not contain stop words or punctuation. We believe this unlabeled set contains many well-formed phrases. Our goal is to identify these additional high quality phrases. We examine various feature combinations and several machine learning strategies designed to solve this problem. A proper choice of machine learning methods and features identifies in the large collection strings that are likely to be high quality phrases. We evaluate our approach by making human judgments on multiword strings extracted from MEDLINE using our methods. We find that over 85% of such extracted phrase candidates are humanly judged to be of high quality. Published by Elsevier Inc.
引用
收藏
页码:1035 / 1041
页数:7
相关论文
共 50 条
  • [1] Identifying Well-formed Natural Language Questions
    Faruqui, Manaal
    Das, Dipanjan
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 798 - 803
  • [2] Coherence and sameness in well-formed and pairwise well-formed scales
    Carey, Norman
    JOURNAL OF MATHEMATICS AND MUSIC, 2007, 1 (02) : 79 - 98
  • [3] Social Media Text Classification by Enhancing Well-Formed Text Trained Model
    Jotikabukkana, Phat
    Sornlertlamvanich, Virach
    Manabu, Okumura
    Haruechaiyasak, Choochart
    JOURNAL OF ICT RESEARCH AND APPLICATIONS, 2016, 10 (02) : 177 - 196
  • [4] The Web is not well-formed
    Schreiber, G
    IEEE INTELLIGENT SYSTEMS, 2002, 17 (02): : 78 - 78
  • [5] ASPECTS OF WELL-FORMED SCALES
    CAREY, N
    CLAMPITT, D
    MUSIC THEORY SPECTRUM, 1989, 11 (02) : 187 - 206
  • [6] ROSS PARADOX AND WELL-FORMED CODICES
    STENIUS, E
    THEORIA, 1982, 48 : 49 - 77
  • [7] Well-formed set representations of solids
    Shapiro, V
    INTERNATIONAL JOURNAL OF COMPUTATIONAL GEOMETRY & APPLICATIONS, 1999, 9 (02) : 125 - 150
  • [8] Transformations for Pairwise Well-Formed Modes
    Noll, Thomas
    Clampitt, David
    MATHEMATICS AND COMPUTATION IN MUSIC (MCM 2022), 2022, : 140 - 152
  • [9] Pairwise Well-Formed Modes and Transformations
    Clampitt, David
    Noll, Thomas
    MATHEMATICS AND COMPUTATION IN MUSIC, MCM 2017, 2017, 10527 : 26 - 37
  • [10] Ill-Formed to Well-Formed Question Generator
    Divate, Manisha Satish
    Salgaonkar, Ambuja
    2016 INTERNATIONAL CONFERENCE ON RECENT ADVANCES AND INNOVATIONS IN ENGINEERING (ICRAIE), 2016,