Predicting structured metadata from unstructured metadata

被引:5
|
作者
Posch, Lisa [1 ,2 ]
Panahiazar, Maryam [3 ]
Dumontier, Michel [3 ]
Gevaert, Olivier [3 ]
机构
[1] GESIS Leibniz Inst Social Sci, Cologne, Germany
[2] Univ Koblenz Landau, Inst Web Sci & Technol, Koblenz, Germany
[3] Stanford Univ, Dept Med, Stanford Ctr Biomed Informat Res, Stanford, CA 94305 USA
关键词
D O I
10.1093/database/baw080
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Enormous amounts of biomedical data have been and are being produced by investigators all over the world. However, one crucial and limiting factor in data reuse is accurate, structured and complete description of the data or data about the data-defined as metadata. We propose a framework to predict structured metadata terms from unstructured metadata for improving quality and quantity of metadata, using the Gene Expression Omnibus (GEO) microarray database. Our framework consists of classifiers trained using term frequency-inverse document frequency (TF-IDF) features and a second approach based on topics modeled using a Latent Dirichlet Allocation model (LDA) to reduce the dimensionality of the unstructured data. Our results on the GEO database show that structured metadata terms can be the most accurately predicted using the TF-IDF approach followed by LDA both outperforming the majority vote baseline. While some accuracy is lost by the dimensionality reduction of LDA, the difference is small for elements with few possible values, and there is a large improvement over the majority classifier baseline. Overall this is a promising approach for metadata prediction that is likely to be applicable to other datasets and has implications for researchers interested in biomedical metadata curation and metadata prediction.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Big Metadata,Smart Metadata,and Metadata Capital:Toward Greater Synergy Between Data Science and Metadata
    Jane Greenberg
    JournalofDataandInformationScience, 2017, 2 (03) : 19 - 36
  • [22] Metadata Analyser: Measuring Metadata Quality
    Inacio, Bruno
    Ferreira, Joao D.
    Couto, Francisco M.
    11TH INTERNATIONAL CONFERENCE ON PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY & BIOINFORMATICS, 2017, 616 : 197 - 204
  • [23] Repairing raw metadata for metadata management
    Khalid, Hiba
    Zim, Esteban
    INFORMATION SYSTEMS, 2024, 122
  • [24] Leveraging Structured Metadata for Improving Question Answering on the Web
    Du, Xinya
    Fourney, Adam
    Sim, Robert
    Cardie, Claire
    Bennett, Paul N.
    Awadallah, Ahmed Hassan
    1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020), 2020, : 551 - 556
  • [25] Structured Metadata for Representing and Managing Complex 'Narrative' Information
    Zarri, Gian Piero
    METADATA AND SEMANTIC RESEARCH, PROCEEDINGS, 2009, 46 : 151 - 163
  • [26] A METHOD FOR INTEGRATING MULTIMEDIA METADATA STANDARDS AND METADATA FORMATS WITH THE MULTIMEDIA METADATA ONTOLOGY
    Scherp, Ansgar
    Eiing, Daniel
    Saathoff, Carsten D.
    INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING, 2012, 6 (01) : 25 - 49
  • [27] Metadata
    Horn, Marguerite E.
    Landesman, Betty
    McFadden, T. G.
    Prieto, Adolfo G.
    Rosenblatt, Stephanie
    Malinowski, Teresa
    SERIALS REVIEW, 2010, 36 (04) : 271 - 277
  • [28] Metadata
    Zumer, Maja
    PROGRAM-ELECTRONIC LIBRARY AND INFORMATION SYSTEMS, 2009, 43 (01) : 108 - 109
  • [29] Metadata
    Schutte, Marietjie
    LIBRARY HI TECH, 2009, 27 (03) : 486 - 487
  • [30] Metadata
    Howarth, Lynne C.
    JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2017, 68 (09) : 2271 - 2274