Computational metadata generation methods for biological specimen image collections

被引:2
|
作者
Karnani, Kevin [1 ]
Pepper, Joel [1 ]
Bakis, Yasin [2 ]
Wang, Xiaojun [2 ]
Bart, Henry, Jr. [2 ]
Breen, David E. [1 ]
Greenberg, Jane [3 ]
机构
[1] Drexel Univ, Comp Sci Dept, Philadelphia, PA 19104 USA
[2] Tulane Univ, Biodivers Res Inst, New Orleans, LA 70118 USA
[3] Drexel Univ, Informat Sci Dept, Philadelphia, PA 19104 USA
基金
美国国家科学基金会;
关键词
Bioinformatics; Metadata; Image analysis; Applied machine learning; Contrast enhancement; CLASSIFICATION; COLOR;
D O I
10.1007/s00799-022-00342-1
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
Metadata is a key data source for researchers seeking to apply machine learning (ML) to the vast collections of digitized biological specimens that can be found online. Unfortunately, the associated metadata is often sparse and, at times, erroneous. This paper extends previous research conducted with the Illinois Natural History Survey (INHS) collection (7244 specimen images) that uses computational approaches to analyze image quality, and then automatically generates 22 metadata properties representing the image quality and morphological features of the specimens. In the research reported here, we demonstrate the extension of our initial work to University of the Wisconsin Zoological Museum (UWZM) collection (4155 specimen images). Further, we enhance our computational methods in four ways: (1) augmenting the training set, (2) applying contrast enhancement, (3) upscaling small objects, and (4) refining our processing logic. Together these new methods improved our overall error rates from 4.6 to 1.1%. These enhancements also allowed us to compute an additional set of 17 image-based metadata properties. The new metadata properties provide supplemental features and information that may also be used to analyze and classify the fish specimens. Examples of these new features include convex area, eccentricity, perimeter, skew, etc. The newly refined process further outperforms humans in terms of time and labor cost, as well as accuracy, providing a novel solution for leveraging digitized specimens with ML. This research demonstrates the ability of computational methods to enhance the digital library services associated with the tens of thousands of digitized specimens stored in open-access repositories world-wide by generating accurate and valuable metadata for those repositories.
引用
收藏
页码:157 / 174
页数:18
相关论文
共 50 条
  • [31] Computational methods for nucleosynthesis and nuclear energy generation
    Hix, WR
    Thielemann, FK
    JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 1999, 109 (1-2) : 321 - 351
  • [32] Combining Patient Metadata Extraction and Automatic Image Parsing for the Generation of an Anatomic Atlas
    Moeller, Manuel
    Ernst, Patrick
    Sintek, Michael
    Seifert, Sascha
    Grimnes, Gunnar
    Cavallaro, Alexander
    Dengel, Andreas
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT I, 2010, 6276 : 290 - +
  • [33] Image Blending Methods for Defective PCB Image Generation
    Chiang, Ting-Hui
    Chang, Chun-Hao
    Chen, Li-Hsin
    Lin, Chun-Ju
    Luo, An-Chun
    Deng, Yu-Shan
    Chang, Po-Han
    Dai, Ming-Ji
    Tseng, Yu-Chee
    2022 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN, IEEE ICCE-TW 2022, 2022, : 261 - 262
  • [34] Are equally spaced specimen collections necessary to assess biological variation?: Evidence from renal transplant recipients
    Biosca, C
    Ricós, C
    Jiménez, CV
    Lauzurica, R
    Galimany, R
    CLINICA CHIMICA ACTA, 2000, 301 (1-2) : 79 - 85
  • [35] NEW METHODS FOR IMAGE GENERATION AND COMPRESSION
    CULIK, K
    DUBE, S
    LECTURE NOTES IN COMPUTER SCIENCE, 1991, 555 : 69 - 90
  • [36] An Overview of Image Caption Generation Methods
    Wang, Haoran
    Zhang, Yue
    Yu, Xiaosheng
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2020, 2020
  • [37] The study of dimensionality reduction methods in the task of browsing of digital image collections
    S.P.Korolyov Samara State Aerospace University, Image Processing Systems Institute, RAS, Russia
    Comput. Opt., 2008, 3 (296-301):
  • [38] Computational methods for identifying the critical nodes in biological networks
    Liu, Xiangrong
    Hong, Zengyan
    Liu, Juan
    Lin, Yuan
    Rodriguez-Paton, Alfonso
    Zou, Quan
    Zeng, Xiangxiang
    BRIEFINGS IN BIOINFORMATICS, 2020, 21 (02) : 486 - 497
  • [39] Development and Application of Computational Methods in Biological and Medical Data
    Ding, Hui
    COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING, 2020, 23 (06) : 525 - 526
  • [40] Computational Methods for Identification and Modelling of Complex Biological Systems
    Villaverde, Alejandro F.
    Cosentino, Carlo
    Gabor, Attila
    Szederkenyi, Gabor
    COMPLEXITY, 2019, 2019