Metadata Discovery of Heterogeneous Biomedical Datasets Using Token-Based Features

被引:0
|
作者
Wen, Jingran [1 ]
Gouripeddi, Ramkiran [1 ,2 ]
Facelli, Julio C. [1 ,2 ]
机构
[1] Univ Utah, Dept Biomed Informat, Salt Lake City, UT 84108 USA
[2] Univ Utah, Ctr Clin & Translat Sci, Salt Lake City, UT 84108 USA
来源
基金
美国国家卫生研究院;
关键词
Metadata discovery; Text characterization; Data harmonization;
D O I
10.1007/978-981-10-6451-7_8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Metadata discovery is the process of recognizing semantics and descriptors of data elements and datasets. This study uses a machine-learning approach to classify biomedical dataset characteristics for metadata discovery. Four common types of biomedical data sources were included in this study genetic variant, protein structure, scientific publications, and general English corpus. Decision tree classification models were built using token-based features derived from these data files. These decision tree classification models are able to identify the four data sources with average F1 scores ranging from 0.935 to 1.000. This study demonstrates that biomedical data of different types have different distributions of token-based document structural features and that such structural features can be leveraged for metadata discovery.
引用
收藏
页码:60 / 67
页数:8
相关论文
共 50 条
  • [21] Abdomen CT multi-organ segmentation using token-based MLP-Mixer
    Pan, Shaoyan
    Chang, Chih-Wei
    Wang, Tonghe
    Wynne, Jacob
    Hu, Mingzhe
    Lei, Yang
    Liu, Tian
    Patel, Pretesh
    Roper, Justin
    Yang, Xiaofeng
    MEDICAL PHYSICS, 2023, 50 (05) : 3027 - 3038
  • [22] A token-based software license protection framework using one-way hash functions
    Wang, LZ
    Bauer, M
    Perry, M
    EEE '05: Proceedings of the 2005 International Conference on E-Business, Enterprise Information Systems, E-Government, and Outsourcing, 2005, : 134 - 139
  • [23] A token-based authentication security scheme for Hadoop distributed file system using elliptic curve cryptography
    Jeong, Yoon-Su
    Kim, Yong-Tae
    JOURNAL OF COMPUTER VIROLOGY AND HACKING TECHNIQUES, 2015, 11 (03): : 137 - 142
  • [24] Patient-Centric Token-Based Healthcare Blockchain Implementation Using Secure Internet of Medical Things
    Dewangan, Narendra K.
    Chandrakar, Preeti
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2023, 10 (06) : 3109 - 3119
  • [25] Cybersecure and scalable, token-based renewable energy certificate framework using blockchain-enabled trading platform
    Umit Cali
    Murat Kuzlu
    D. Jonathan Sebastian-Cardenas
    Onur Elma
    Manisa Pipattanasomporn
    Ramesh Reddi
    Electrical Engineering, 2024, 106 : 1841 - 1852
  • [26] Recognition and Evaluation of Clinical Section Headings in Clinical Documents Using Token-Based Formulation with Conditional Random Fields
    Dai, Hong-Jie
    Syed-Abdul, Shabbir
    Chen, Chih-Wei
    Wu, Chieh-Chen
    BIOMED RESEARCH INTERNATIONAL, 2015, 2015
  • [27] Cybersecure and scalable, token-based renewable energy certificate framework using blockchain-enabled trading platform
    Cali, Umit
    Kuzlu, Murat
    Sebastian-Cardenas, D. Jonathan
    Elma, Onur
    Pipattanasomporn, Manisa
    Reddi, Ramesh
    ELECTRICAL ENGINEERING, 2024, 106 (02) : 1841 - 1852
  • [28] Interference-aware clustering approach improving QoS for linear WSNs using a token-based MAC protocol
    Ndoye, El Hadji Malick
    Diallo, Ousmane
    Hakem, Nadir
    Jacquet, Frederique
    Misson, Michel
    Rodrigues, Joel J. P. C.
    INTERNATIONAL JOURNAL OF COMMUNICATION SYSTEMS, 2020, 33 (11)
  • [29] A hybrid token-based distributed mutual exclusion algorithm using wraparound two-dimensional array logical topology
    Taheri, Hoda
    Neamatollahi, Peyman
    Naghibzadeh, Mahmoud
    INFORMATION PROCESSING LETTERS, 2011, 111 (17) : 841 - 847
  • [30] Using token-based semantic vector spaces for corpus-linguistic analyses: From practical applications to tests of theoretical claims
    Hilpert, Martin
    Saavedra, David Correia
    CORPUS LINGUISTICS AND LINGUISTIC THEORY, 2020, 16 (02) : 393 - 424