Predicting structured metadata from unstructured metadata

被引:5
|
作者
Posch, Lisa [1 ,2 ]
Panahiazar, Maryam [3 ]
Dumontier, Michel [3 ]
Gevaert, Olivier [3 ]
机构
[1] GESIS Leibniz Inst Social Sci, Cologne, Germany
[2] Univ Koblenz Landau, Inst Web Sci & Technol, Koblenz, Germany
[3] Stanford Univ, Dept Med, Stanford Ctr Biomed Informat Res, Stanford, CA 94305 USA
关键词
D O I
10.1093/database/baw080
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Enormous amounts of biomedical data have been and are being produced by investigators all over the world. However, one crucial and limiting factor in data reuse is accurate, structured and complete description of the data or data about the data-defined as metadata. We propose a framework to predict structured metadata terms from unstructured metadata for improving quality and quantity of metadata, using the Gene Expression Omnibus (GEO) microarray database. Our framework consists of classifiers trained using term frequency-inverse document frequency (TF-IDF) features and a second approach based on topics modeled using a Latent Dirichlet Allocation model (LDA) to reduce the dimensionality of the unstructured data. Our results on the GEO database show that structured metadata terms can be the most accurately predicted using the TF-IDF approach followed by LDA both outperforming the majority vote baseline. While some accuracy is lost by the dimensionality reduction of LDA, the difference is small for elements with few possible values, and there is a large improvement over the majority classifier baseline. Overall this is a promising approach for metadata prediction that is likely to be applicable to other datasets and has implications for researchers interested in biomedical metadata curation and metadata prediction.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Predicting structured metadata from unstructured metadata (May, baw080, 2016)
    Posch, Lisa
    Panahiazar, Maryam
    Dumontier, Michel
    Gevaert, Olivier
    DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2016,
  • [2] MultiBeeBrowse - Accessible browsing on unstructured metadata
    Kruk, Sebastian Ryszard
    Gzella, Adam
    Czaja, Filip
    Bultrowicz, Wladyslaw
    Kruk, Ewelina
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS 2007: COOPLS, DOA, ODBASE, GADA, AND IS, PT 1, PROCEEDINGS, 2007, 4803 : 1063 - 1080
  • [3] To metadata not to metadata
    Reamy, T
    ECONTENT, 2004, 27 (10) : 34 - 38
  • [4] Predicting the Popularity of Online News from Content Metadata
    Uddin, Md. Taufeeq
    Patwary, Muhammed Jamshed Alam
    Ahsan, Tanveer
    Alam, Mohammed Shamsul
    2016 INTERNATIONAL CONFERENCE ON INNOVATIONS IN SCIENCE, ENGINEERING AND TECHNOLOGY (ICISET 2016), 2016,
  • [5] Predicting poverty and wealth from mobile phone metadata
    Blumenstock, Joshua
    Cadamuro, Gabriel
    On, Robert
    SCIENCE, 2015, 350 (6264) : 1073 - 1076
  • [6] DBpedia Commons: Structured Multimedia Metadata from the Wikimedia Commons
    Vaidya, Gaurav
    Kontokostas, Dimitris
    Knuth, Magnus
    Lehmann, Jens
    Hellmann, Sebastian
    SEMANTIC WEB - ISWC 2015, PT II, 2015, 9367 : 281 - 289
  • [7] CERMINE: automatic extraction of structured metadata from scientific literature
    Tkaczyk, Dominika
    Szostek, Pawel
    Fedoryszak, Mateusz
    Dendek, Piotr Jan
    Bolikowski, Lukasz
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2015, 18 (04) : 317 - 335
  • [8] CERMINE: automatic extraction of structured metadata from scientific literature
    Dominika Tkaczyk
    Paweł Szostek
    Mateusz Fedoryszak
    Piotr Jan Dendek
    Łukasz Bolikowski
    International Journal on Document Analysis and Recognition (IJDAR), 2015, 18 : 317 - 335
  • [9] Designing a metadata model for unstructured document management in organizations
    Paganelli, F
    Khaled, OA
    Pettenati, MC
    Pirri, F
    Giuli, D
    Innovations Through Information Technology, Vols 1 and 2, 2004, : 575 - 579
  • [10] Scrabble: Converting Unstructured Metadata into Brick for Many Buildings
    Koh, Jason
    Sengupta, Dhiman
    McAuley, Julian
    Gupta, Rajesh
    Balaji, Bharathan
    Agarwal, Yuvraj
    BUILDSYS'17: PROCEEDINGS OF THE 4TH ACM INTERNATIONAL CONFERENCE ON SYSTEMS FOR ENERGY-EFFICIENT BUILT ENVIRONMENTS, 2017,