Computational metadata generation methods for biological specimen image collections

被引:2
|
作者
Karnani, Kevin [1 ]
Pepper, Joel [1 ]
Bakis, Yasin [2 ]
Wang, Xiaojun [2 ]
Bart, Henry, Jr. [2 ]
Breen, David E. [1 ]
Greenberg, Jane [3 ]
机构
[1] Drexel Univ, Comp Sci Dept, Philadelphia, PA 19104 USA
[2] Tulane Univ, Biodivers Res Inst, New Orleans, LA 70118 USA
[3] Drexel Univ, Informat Sci Dept, Philadelphia, PA 19104 USA
基金
美国国家科学基金会;
关键词
Bioinformatics; Metadata; Image analysis; Applied machine learning; Contrast enhancement; CLASSIFICATION; COLOR;
D O I
10.1007/s00799-022-00342-1
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
Metadata is a key data source for researchers seeking to apply machine learning (ML) to the vast collections of digitized biological specimens that can be found online. Unfortunately, the associated metadata is often sparse and, at times, erroneous. This paper extends previous research conducted with the Illinois Natural History Survey (INHS) collection (7244 specimen images) that uses computational approaches to analyze image quality, and then automatically generates 22 metadata properties representing the image quality and morphological features of the specimens. In the research reported here, we demonstrate the extension of our initial work to University of the Wisconsin Zoological Museum (UWZM) collection (4155 specimen images). Further, we enhance our computational methods in four ways: (1) augmenting the training set, (2) applying contrast enhancement, (3) upscaling small objects, and (4) refining our processing logic. Together these new methods improved our overall error rates from 4.6 to 1.1%. These enhancements also allowed us to compute an additional set of 17 image-based metadata properties. The new metadata properties provide supplemental features and information that may also be used to analyze and classify the fish specimens. Examples of these new features include convex area, eccentricity, perimeter, skew, etc. The newly refined process further outperforms humans in terms of time and labor cost, as well as accuracy, providing a novel solution for leveraging digitized specimens with ML. This research demonstrates the ability of computational methods to enhance the digital library services associated with the tens of thousands of digitized specimens stored in open-access repositories world-wide by generating accurate and valuable metadata for those repositories.
引用
收藏
页码:157 / 174
页数:18
相关论文
共 50 条
  • [1] Automatic Metadata Generation for Fish Specimen Image Collections
    Pepper, Joel
    Greenberg, Jane
    Bakis, Yasin
    Wang, Xiaojun
    Bart, Henry, Jr.
    Breen, David
    2021 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL 2021), 2021, : 31 - 40
  • [2] Semantic interoperability and metadata quality: An analysis of metadata item records of digital image collections
    Park, Jung-ran
    KNOWLEDGE ORGANIZATION, 2006, 33 (01): : 20 - 34
  • [3] Standardized metadata for biological samples could unlock the potential of collections
    Vojtěch Brlík
    Nature, 2024, 629 (8012) : 531 - 531
  • [4] More product, more process: metadata in digital image collections
    Therrell, Grace
    DIGITAL LIBRARY PERSPECTIVES, 2019, 35 (01) : 2 - 14
  • [5] UGESCO - A Hybrid Platform for Geo-Temporal Enrichment of Digital Photo Collections Based on Computational and Crowdsourced Metadata Generation
    Verstockt, Steven
    Nop, Samnang
    Vandecasteele, Florian
    Baert, Tim
    Van de Weghe, Nico
    Paulussen, Hans
    Rizza, Ettore
    Roeges, Mathieu
    DIGITAL HERITAGE: PROGRESS IN CULTURAL HERITAGE: DOCUMENTATION, PRESERVATION, AND PROTECTION, EUROMED 2018, PT I, 2018, 11196 : 113 - 124
  • [6] A tool for teaching principles of image metadata generation
    Achananuparp, Palakorn
    McCain, Katherine W.
    Allen, Robert B.
    OPENING INFORMATION HORIZONS, 2006, : 341 - 341
  • [7] A prototype implementation of metadata generation for image retrieval
    Sasaki, H
    Kiyoki, Y
    2004 INTERNATIONAL SYMPOSIUM ON APPLICATIONS AND THE INTERNET WORKSHOPS, PROCEEDINGS, 2004, : 460 - 466
  • [8] Automatic adaptive metadata generation for image retrieval
    Sasaki, H
    Kiyoki, Y
    2005 SYMPOSIUM ON APPLICATIONS AND THE INTERNET WORKSHOPS, PROCEEDINGS, 2005, : 426 - 429
  • [9] A user-centered functional metadata evaluation of Moving Image Collections
    Zhang, Ying
    Li, Yuelin
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2008, 59 (08): : 1331 - 1346
  • [10] Image embedded metadata in cultural heritage digital collections on the web: An analytical study
    Saleh, Emad Isa
    LIBRARY HI TECH, 2018, 36 (02) : 339 - 357