A comparison of machine learning algorithms for chemical toxicity classification using a simulated multi-scale data model

被引:52
|
作者
Judson, Richard [1 ]
Elloumi, Fathi [1 ]
Setzer, R. Woodrow [1 ]
Li, Zhen [2 ]
Shah, Imran [1 ]
机构
[1] US EPA, Natl Ctr Computat Toxicol, Off Res & Dev, Res Triangle Pk, NC 27711 USA
[2] Univ N Carolina, Dept Biostat, Chapel Hill, NC 27599 USA
关键词
D O I
10.1186/1471-2105-9-241
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Bioactivity profiling using high-throughput in vitro assays can reduce the cost and time required for toxicological screening of environmental chemicals and can also reduce the need for animal testing. Several public efforts are aimed at discovering patterns or classifiers in high-dimensional bioactivity space that predict tissue, organ or whole animal toxicological endpoints. Supervised machine learning is a powerful approach to discover combinatorial relationships in complex in vitro/in vivo datasets. We present a novel model to simulate complex chemical-toxicology data sets and use this model to evaluate the relative performance of different machine learning (ML) methods. Results: The classification performance of Artificial Neural Networks (ANN), K-Nearest Neighbors (KNN), Linear Discriminant Analysis (LDA), Naive Bayes (NB), Recursive Partitioning and Regression Trees (RPART), and Support Vector Machines (SVM) in the presence and absence of filter-based feature selection was analyzed using K-way cross-validation testing and independent validation on simulated in vitro assay data sets with varying levels of model complexity, number of irrelevant features and measurement noise. While the prediction accuracy of all ML methods decreased as non-causal (irrelevant) features were added, some ML methods performed better than others. In the limit of using a large number of features, ANN and SVM were always in the top performing set of methods while RPART and KNN (k = 5) were always in the poorest performing set. The addition of measurement noise and irrelevant features decreased the classification accuracy of all ML methods, with LDA suffering the greatest performance degradation. LDA performance is especially sensitive to the use of feature selection. Filter-based feature selection generally improved performance, most strikingly for LDA. Conclusion: We have developed a novel simulation model to evaluate machine learning methods for the analysis of data sets in which in vitro bioassay data is being used to predict in vivo chemical toxicology. From our analysis, we can recommend that several ML methods, most notably SVM and ANN, are good candidates for use in real world applications in this area.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Chemical Effect Predictor: A tool to predict chemical toxicity using a multi-scale network
    Valls-Margarit, J.
    Pinero, J.
    Fuezi, B.
    Telleria-Zufiaur, J.
    Furlong, L. I.
    TOXICOLOGY LETTERS, 2023, 384 : S106 - S106
  • [22] Learning Multi-scale Representations for Material Classification
    Li, Wenbin
    PATTERN RECOGNITION, GCPR 2014, 2014, 8753 : 757 - 764
  • [23] Multi-scale Contrastive Learning for Gastroenteroscopy Classification
    Li, Dan
    Li, Xuechen
    Peng, Zhibin
    Chen, Wenting
    Shen, Linlin
    Wu, Guangyao
    2023 IEEE 36TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, CBMS, 2023, : 852 - +
  • [24] Improving Forecasting Accuracy of Multi-Scale Groundwater Level Fluctuations Using a Heterogeneous Ensemble of Machine Learning Algorithms
    Roy, Dilip Kumar
    Munmun, Tasnia Hossain
    Paul, Chitra Rani
    Haque, Mohamed Panjarul
    Al-Ansari, Nadhir
    Mattar, Mohamed A.
    WATER, 2023, 15 (20)
  • [25] Comparison of Machine Learning Algorithms for Classification Problems
    Sekeroglu, Boran
    Hasan, Shakar Sherwan
    Abdullah, Saman Mirza
    ADVANCES IN COMPUTER VISION, VOL 2, 2020, 944 : 491 - 499
  • [26] Comparison of Machine Learning Algorithms for Somatotype Classification
    Katovic, Darko
    Cvjetko, Miljenko
    ICSPORTS: PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON SPORT SCIENCES RESEARCH AND TECHNOLOGY SUPPORT, 2019, : 217 - 223
  • [27] Systematic multi-scale decomposition of ocean variability using machine learning
    Franzke, Christian L. E.
    Gugole, Federica
    Juricke, Stephan
    CHAOS, 2022, 32 (07)
  • [28] Medical Images Modality Classification using Multi-scale Dictionary Learning
    Srinivas, M.
    Mohan, C. Krishna
    2014 19TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2014, : 621 - 625
  • [29] Machine learning algorithms for analytes classification based on simulated spectra
    Acuna, Edgar
    Kendziora, Christopher A.
    Fustenberg, Robert
    Breshike, Christopher J.
    Kendziora, Drew
    ALGORITHMS, TECHNOLOGIES, AND APPLICATIONS FOR MULTISPECTRAL AND HYPERSPECTRAL IMAGING XXX, 2024, 13031
  • [30] Machine Condition Classification by Using Wavelet Packet Decomposition and Multi-scale Entropy
    Li, Hongkun
    Zhou, Shuai
    Chen, Yuzhen
    MECHATRONICS AND INFORMATION TECHNOLOGY, PTS 1 AND 2, 2012, 2-3 : 743 - 748