Metadata Discovery of Heterogeneous Biomedical Datasets Using Token-Based Features

被引:0
|
作者
Wen, Jingran [1 ]
Gouripeddi, Ramkiran [1 ,2 ]
Facelli, Julio C. [1 ,2 ]
机构
[1] Univ Utah, Dept Biomed Informat, Salt Lake City, UT 84108 USA
[2] Univ Utah, Ctr Clin & Translat Sci, Salt Lake City, UT 84108 USA
来源
基金
美国国家卫生研究院;
关键词
Metadata discovery; Text characterization; Data harmonization;
D O I
10.1007/978-981-10-6451-7_8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Metadata discovery is the process of recognizing semantics and descriptors of data elements and datasets. This study uses a machine-learning approach to classify biomedical dataset characteristics for metadata discovery. Four common types of biomedical data sources were included in this study genetic variant, protein structure, scientific publications, and general English corpus. Decision tree classification models were built using token-based features derived from these data files. These decision tree classification models are able to identify the four data sources with average F1 scores ranging from 0.935 to 1.000. This study demonstrates that biomedical data of different types have different distributions of token-based document structural features and that such structural features can be leveraged for metadata discovery.
引用
收藏
页码:60 / 67
页数:8
相关论文
共 50 条
  • [1] Multiple token-based neighbor discovery for directional sensor networks
    Nagaraju, Shamanth
    Gudino, Lucy J.
    Sood, Nipun
    Chandran, Jasmine G.
    Sreejith, V
    ETRI JOURNAL, 2020, 42 (03) : 351 - 365
  • [2] Efficient GPU Utilization in Heterogeneous Big Data Cluster Using Token-Based Scheduler
    Abdelhafez, Hazem A.
    Rehan, Mohamed M.
    Fahmy, Hossam A. H.
    2017 IEEE 30TH CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2017,
  • [3] Improving Quality of Code Review Datasets - Token-Based Feature Extraction Method
    Staron, Miroslaw
    Meding, Wilhelm
    Soder, Ola
    Ochodek, Miroslaw
    SOFTWARE QUALITY: FUTURE PERSPECTIVES ON SOFTWARE ENGINEERING QUALITY, SWQD 2021, 2021, 404 : 81 - 93
  • [4] TILAK: A token-based prevention approach for topology discovery threats in SDN
    Nehra, Ajay
    Tripathi, Meenakshi
    Gaur, Manoj Singh
    Battula, Ramesh Babu
    Lal, Chhagan
    INTERNATIONAL JOURNAL OF COMMUNICATION SYSTEMS, 2019, 32 (17)
  • [5] Token-based deep reinforcement learning for Heterogeneous VRP with Service Time Constraints
    Wang, Yujun
    Hong, Xiaopeng
    Wang, Yabin
    Zhao, Junzhou
    Sun, Guanghui
    Qin, Baoxing
    KNOWLEDGE-BASED SYSTEMS, 2024, 300
  • [6] Token-based atomic broadcast using unreliable failure detectors
    Ekwall, R
    Schiper, A
    Urbán, P
    23RD IEEE INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, PROCEEDINGS, 2004, : 52 - 65
  • [7] A token-based atomic broadcast protocol using deterministic merge
    Li, L.
    Wang, H. M.
    Liu, H.
    Shi, D. X.
    DYNAMICS OF CONTINUOUS DISCRETE AND IMPULSIVE SYSTEMS-SERIES B-APPLICATIONS & ALGORITHMS, 2006, 13E : 1138 - 1143
  • [8] An Empirical Study on Fault Prediction using Token-Based Approach
    Kaur, Ishleen
    Bajpai, Neha
    INTERNATIONAL CONFERENCE ON ADVANCES IN INFORMATION COMMUNICATION TECHNOLOGY & COMPUTING, 2016, 2016,
  • [9] Token-Based Authentication Using JSON']JSON Web Token on SIKASIR RESTful Web Service
    Haekal, Muhamad
    Eliyani
    2016 INTERNATIONAL CONFERENCE ON INFORMATICS AND COMPUTING (ICIC), 2016, : 175 - 179
  • [10] A Study on Home Network User Authentication Using Token-Based OTPn
    Park, Jung-Oh
    Jun, Moon-Seog
    Kim, Sang-Geun
    SECURITY-ENRICHED URBAN COMPUTING AND SMART GRID, 2010, 78 : 59 - +