Dynamic categorization of clinical research eligibility criteria by hierarchical clustering

被引:34
|
作者
Luo, Zhihui [1 ]
Yetisgen-Yildiz, Meliha [2 ]
Weng, Chunhua [1 ]
机构
[1] Columbia Univ, Dept Biomed Informat, New York, NY 10032 USA
[2] Univ Washington, Seattle, WA 98195 USA
关键词
Clinical research eligibility criteria; Classification; Hierarchical clustering; Knowledge representation; Unified Medical Language System (UMLS); Machine learning; Feature representation; CLASSIFICATION;
D O I
10.1016/j.jbi.2011.06.001
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Objective: To semi-automatically induce semantic categories of eligibility criteria from text and to automatically classify eligibility criteria based on their semantic similarity. Design: The UMLS semantic types and a set of previously developed semantic preference rules were utilized to create an unambiguous semantic feature representation to induce eligibility criteria categories through hierarchical clustering and to train supervised classifiers. Measurements: We induced 27 categories and measured the prevalence of the categories in 27,278 eligibility criteria from 1578 clinical trials and compared the classification performance (i.e., precision, recall, and F1-score) between the UMLS-based feature representation and the "bag of words" feature representation among five common classifiers in Weka, including J48, Bayesian Network, Naive Bayesian, Nearest Neighbor, and instance-based learning classifier. Results: The UMLS semantic feature representation outperforms the "bag of words" feature representation in 89% of the criteria categories. Using the semantically induced categories, machine-learning classifiers required only 2000 instances to stabilize classification performance. The J48 classifier yielded the best F1-score and the Bayesian Network classifier achieved the best learning efficiency. Conclusion: The UMLS is an effective knowledge source and can enable an efficient feature representation for semi-automated semantic category induction and automatic categorization for clinical research eligibility criteria and possibly other clinical text. (C) 2011 Elsevier Inc. All rights reserved.
引用
收藏
页码:927 / 935
页数:9
相关论文
共 50 条
  • [31] Research on Text Hierarchical Topic Identification Algorithm Based on the Dynamic Diverse Thresholds Clustering
    Xu Yong-Dong
    Quan Guang-Ri
    Xu Zhi-Ming
    Wang Ya-Dong
    2009 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2009, : 206 - 210
  • [32] Eligibility criteria in clinical trials in breast cancer: a cohort study
    Szlezinger, Katarzyna
    Pogoda, Katarzyna
    Jagiello-Gruszfeld, Agnieszka
    Klosowska, Danuta
    Gorski, Andrzej
    Borysowski, Jan
    BMC MEDICINE, 2023, 21 (01)
  • [33] A Semantic Framework for Intelligent Matchmaking for Clinical Trial Eligibility Criteria
    Lee, Yugyung
    Krishnamoorthy, Saranya
    Dinakarpandian, Deendayal
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2013, 4 (04)
  • [34] Integrated Framework for Speech Categorization based on Clustering in Dynamic Environment
    Jayabharathy, J.
    KavithaKumar, R.
    Kanimozhi, J.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATICS AND ANALYTICS (ICIA' 16), 2016,
  • [36] Eligibility criteria in clinical trials in breast cancer: a cohort study
    Katarzyna Szlezinger
    Katarzyna Pogoda
    Agnieszka Jagiełło-Gruszfeld
    Danuta Kłosowska
    Andrzej Górski
    Jan Borysowski
    BMC Medicine, 21
  • [37] Potential Role of Clinical Trial Eligibility Criteria in Electronic Phenotyping
    Chen, Zhehuan
    Liu, Hao
    Butler, Alex
    Ostropolets, Anna
    Weng, Chunhua
    PUBLIC HEALTH AND INFORMATICS, PROCEEDINGS OF MIE 2021, 2021, 281 : 148 - 152
  • [38] The impact of eligibility criteria on enrollment in ICU sepsis clinical trials
    D Foster
    M Steinberg
    D Cook
    J Granton
    J Marshall
    Critical Care, 3 (Suppl 1):
  • [39] Chia, a large annotated corpus of clinical trial eligibility criteria
    Fabrício Kury
    Alex Butler
    Chi Yuan
    Li-heng Fu
    Yingcheng Sun
    Hao Liu
    Ida Sim
    Simona Carini
    Chunhua Weng
    Scientific Data, 7
  • [40] Data mining for text categorization with semi-supervised agglomerative hierarchical clustering
    Skarmeta, AG
    Bensaid, A
    Tazi, N
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2000, 15 (07) : 633 - 646