Predicting structured metadata from unstructured metadata

被引:5
|
作者
Posch, Lisa [1 ,2 ]
Panahiazar, Maryam [3 ]
Dumontier, Michel [3 ]
Gevaert, Olivier [3 ]
机构
[1] GESIS Leibniz Inst Social Sci, Cologne, Germany
[2] Univ Koblenz Landau, Inst Web Sci & Technol, Koblenz, Germany
[3] Stanford Univ, Dept Med, Stanford Ctr Biomed Informat Res, Stanford, CA 94305 USA
关键词
D O I
10.1093/database/baw080
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Enormous amounts of biomedical data have been and are being produced by investigators all over the world. However, one crucial and limiting factor in data reuse is accurate, structured and complete description of the data or data about the data-defined as metadata. We propose a framework to predict structured metadata terms from unstructured metadata for improving quality and quantity of metadata, using the Gene Expression Omnibus (GEO) microarray database. Our framework consists of classifiers trained using term frequency-inverse document frequency (TF-IDF) features and a second approach based on topics modeled using a Latent Dirichlet Allocation model (LDA) to reduce the dimensionality of the unstructured data. Our results on the GEO database show that structured metadata terms can be the most accurately predicted using the TF-IDF approach followed by LDA both outperforming the majority vote baseline. While some accuracy is lost by the dimensionality reduction of LDA, the difference is small for elements with few possible values, and there is a large improvement over the majority classifier baseline. Overall this is a promising approach for metadata prediction that is likely to be applicable to other datasets and has implications for researchers interested in biomedical metadata curation and metadata prediction.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] METADATA
    Miller, Steven
    CATALOGING & CLASSIFICATION QUARTERLY, 2009, 47 (05) : 498 - 500
  • [32] Metadata
    Turvey-Welch, Michelle R.
    PORTAL-LIBRARIES AND THE ACADEMY, 2009, 9 (01) : 166 - 167
  • [33] Metadata
    Maceviciute, Elena
    INFORMATION RESEARCH-AN INTERNATIONAL ELECTRONIC JOURNAL, 2016, 21 (02):
  • [34] Metadata
    Swanson, Edward
    LIBRARY RESOURCES & TECHNICAL SERVICES, 2009, 53 (02): : 135 - 136
  • [35] METADATA
    Mayernik, Matthew
    ANALES DE DOCUMENTACION, 2023, 26 (01):
  • [36] Metadata
    Nauta, Laura
    JOURNAL OF ACADEMIC LIBRARIANSHIP, 2009, 35 (01): : 103 - 103
  • [37] Metadata
    Fitz-Gerald, Stuart J.
    INTERNATIONAL JOURNAL OF INFORMATION MANAGEMENT, 2009, 29 (05) : 420 - 420
  • [38] Metadata
    Mayernik, Matthew S.
    KNOWLEDGE ORGANIZATION, 2020, 47 (08): : 696 - 713
  • [39] Metadata
    Macgregor, George
    LIBRARY REVIEW, 2009, 58 (08) : 621 - +
  • [40] METADATA
    Paloff, Benjamin
    MICHIGAN QUARTERLY REVIEW, 2018, 57 (01) : 129 - 131