Computational metadata generation methods for biological specimen image collections

被引:2
|
作者
Karnani, Kevin [1 ]
Pepper, Joel [1 ]
Bakis, Yasin [2 ]
Wang, Xiaojun [2 ]
Bart, Henry, Jr. [2 ]
Breen, David E. [1 ]
Greenberg, Jane [3 ]
机构
[1] Drexel Univ, Comp Sci Dept, Philadelphia, PA 19104 USA
[2] Tulane Univ, Biodivers Res Inst, New Orleans, LA 70118 USA
[3] Drexel Univ, Informat Sci Dept, Philadelphia, PA 19104 USA
基金
美国国家科学基金会;
关键词
Bioinformatics; Metadata; Image analysis; Applied machine learning; Contrast enhancement; CLASSIFICATION; COLOR;
D O I
10.1007/s00799-022-00342-1
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
Metadata is a key data source for researchers seeking to apply machine learning (ML) to the vast collections of digitized biological specimens that can be found online. Unfortunately, the associated metadata is often sparse and, at times, erroneous. This paper extends previous research conducted with the Illinois Natural History Survey (INHS) collection (7244 specimen images) that uses computational approaches to analyze image quality, and then automatically generates 22 metadata properties representing the image quality and morphological features of the specimens. In the research reported here, we demonstrate the extension of our initial work to University of the Wisconsin Zoological Museum (UWZM) collection (4155 specimen images). Further, we enhance our computational methods in four ways: (1) augmenting the training set, (2) applying contrast enhancement, (3) upscaling small objects, and (4) refining our processing logic. Together these new methods improved our overall error rates from 4.6 to 1.1%. These enhancements also allowed us to compute an additional set of 17 image-based metadata properties. The new metadata properties provide supplemental features and information that may also be used to analyze and classify the fish specimens. Examples of these new features include convex area, eccentricity, perimeter, skew, etc. The newly refined process further outperforms humans in terms of time and labor cost, as well as accuracy, providing a novel solution for leveraging digitized specimens with ML. This research demonstrates the ability of computational methods to enhance the digital library services associated with the tens of thousands of digitized specimens stored in open-access repositories world-wide by generating accurate and valuable metadata for those repositories.
引用
收藏
页码:157 / 174
页数:18
相关论文
共 50 条
  • [21] Computational Methods and Resources in Biological and Medical Data
    Lin, Hao
    CURRENT MEDICINAL CHEMISTRY, 2022, 29 (05) : 786 - 788
  • [22] Applying computational chemistry methods to biological systems
    Zhong, Shijun
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2014, 248
  • [23] Computational linguistics for metadata building (CLiMB): using text mining for the automatic identification, categorization, and disambiguation of subject terms for image metadata
    Klavans, Judith L.
    Sheffield, Carolyn
    Abels, Eileen
    Lin, Jimmy
    Passonneau, Rebecca
    Sidhu, Tandeep
    Soergel, Dagobert
    MULTIMEDIA TOOLS AND APPLICATIONS, 2009, 42 (01) : 115 - 138
  • [24] Computational linguistics for metadata building (CLiMB): using text mining for the automatic identification, categorization, and disambiguation of subject terms for image metadata
    Judith L. Klavans
    Carolyn Sheffield
    Eileen Abels
    Jimmy Lin
    Rebecca Passonneau
    Tandeep Sidhu
    Dagobert Soergel
    Multimedia Tools and Applications, 2009, 42 : 115 - 138
  • [25] Computational methods for image restoration, image segmentation, and texture modeling
    Chung, Ginmo
    Le, Triet M.
    Lieu, Linh H.
    Tanushev, Nicolay M.
    Vese, Luminita A.
    COMPUTATIONAL IMAGING IV, 2006, 6065
  • [26] ShapePheno: unsupervised extraction of shape phenotypes from biological image collections
    Karaletsos, Theofanis
    Stegle, Oliver
    Dreyer, Christine
    Winn, John
    Borgwardt, Karsten M.
    BIOINFORMATICS, 2012, 28 (07) : 1001 - 1008
  • [27] Advanced Computational Methods for Oncological Image Analysis
    Rundo, Leonardo
    Militello, Carmelo
    Conti, Vincenzo
    Zaccagna, Fulvio
    Han, Changhee
    JOURNAL OF IMAGING, 2021, 7 (11)
  • [28] Hybrid Computational Methods for Hyperspectral Image Analysis
    Veganzones, Miguel A.
    Grana, Manuel
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, PT II, 2012, 7209 : 424 - 435
  • [29] A mechanism to automate the generation of digital library image metadata and to provide content search
    Langiano, BDC
    Sunye, MS
    PROCEEDINGS OF THE FIFTH IASTED INTERNATIONAL CONFERENCE ON VISUALIZATION, IMAGING, AND IMAGE PROCESSING, 2005, : 67 - 70
  • [30] A survey of computational methods for iconic image analysis
    van Noord, Nanne
    DIGITAL SCHOLARSHIP IN THE HUMANITIES, 2022, 37 (04) : 1316 - 1338