Semi-supervised machine learning for automated species identification by collagen peptide mass fingerprinting

被引:13
|
作者
Gu, Muxin [1 ]
Buckley, Michael [2 ]
机构
[1] Univ Manchester, Fac Biol Med & Hlth, Michael Smith Bldg, Manchester M13 9PT, Lancs, England
[2] Univ Manchester, Sch Earth & Environm Sci, Manchester Inst Biotechnol, 131 Princess St, Manchester M1 7DN, Lancs, England
来源
BMC BIOINFORMATICS | 2018年 / 19卷
关键词
Collagen fingerprinting; Ancient bone identification; High-throughput species identification; Species biomarker identification; PCA; Hierarchical clustering; CLINICAL MICROBIOLOGY; BONE; PALAEOBIODIVERSITY; DIVERSITY; ENSEMBLES; BACTERIA;
D O I
10.1186/s12859-018-2221-3
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Biomolecular methods for species identification are increasingly being utilised in the study of changing environments, both at the microscopic and macroscopic levels. High-throughput peptide mass fingerprinting has been largely applied to bacterial identification, but increasingly used to identify archaeological and palaeontological skeletal material to yield information on past environments and human-animal interaction. However, as applications move away from predominantly domesticate and the more abundant wild fauna to a much wider range of less common taxa that do not yet have genetically-derived sequence information, robust methods of species identification and biomarker selection need to be determined. Results: Here we developed a supervised machine learning algorithm for classifying the species of ancient remains based on collagen fingerprinting. The aim was to minimise requirements on prior knowledge of known species while yielding satisfactory sensitivity and specificity. The algorithm uses iterations of a modified random forest classifier with a similarity scoring system to expand its identified samples. We tested it on a set of 6805 spectra and found that a high level of accuracy can be achieved with a training set of five identified specimens per taxon. Conclusions: This method consistently achieves higher accuracy than two-dimensional principal component analysis and similar accuracy with hierarchical clustering using optimised parameters, which greatly reduces requirements for human input. Within the vertebrata, we demonstrate that this method was able to achieve the taxonomic resolution of family or sub-family level whereas the genus- or species-level identification may require manual interpretation or further experiments. In addition, it also identifies additional species biomarkers than those previously published.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Semi-supervised machine learning for automated species identification by collagen peptide mass fingerprinting
    Muxin Gu
    Michael Buckley
    BMC Bioinformatics, 19
  • [2] Supervised and semi-supervised machine learning ranking
    Vittaut, Jean-Noel
    Gallinari, Patrick
    COMPARATIVE EVALUATION OF XML INFORMATION RETRIEVAL SYSTEMS, 2007, 4518 : 213 - 222
  • [3] Towards Automated Semi-Supervised Learning
    Li, Yu-Feng
    Wang, Hai
    Wei, Tong
    Tu, Wei-Wei
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 4237 - 4244
  • [4] Contaminant source identification using semi-supervised machine learning
    Vesselinov, Velimir V.
    Alexandrov, Boian S.
    O'Malley, Daniel
    JOURNAL OF CONTAMINANT HYDROLOGY, 2018, 212 : 134 - 142
  • [5] Radio Frequency Fingerprinting Identification Using Semi-Supervised Learning with Meta Labels
    Tiantian Zhang
    Pinyi Ren
    Dongyang Xu
    Zhanyi Ren
    China Communications, 2023, 20 (12) : 78 - 95
  • [6] Radio Frequency Fingerprinting Identification Using Semi-Supervised Learning with Meta Labels
    Zhang, Tiantian
    Ren, Pinyi
    Xu, Dongyang
    Ren, Zhanyi
    CHINA COMMUNICATIONS, 2023, 20 (12) : 78 - 95
  • [7] Semi-supervised learning for peptide identification from shotgun proteomics datasets
    Lukas Käll
    Jesse D Canterbury
    Jason Weston
    William Stafford Noble
    Michael J MacCoss
    Nature Methods, 2007, 4 : 923 - 925
  • [8] Semi-supervised learning for peptide identification from shotgun proteomics datasets
    Kall, Lukas
    Canterbury, Jesse D.
    Weston, Jason
    Noble, William Stafford
    MacCoss, Michael J.
    NATURE METHODS, 2007, 4 (11) : 923 - 925
  • [9] Lagrangian supervised and semi-supervised extreme learning machine
    Ma, Jun
    Wen, Yakun
    Yang, Liming
    APPLIED INTELLIGENCE, 2019, 49 (02) : 303 - 318
  • [10] Lagrangian supervised and semi-supervised extreme learning machine
    Jun Ma
    Yakun Wen
    Liming Yang
    Applied Intelligence, 2019, 49 : 303 - 318