Gene expression data classification using topology and machine learning models

被引:2
|
作者
Dey, Tamal K. [1 ]
Mandal, Sayan [2 ]
Mukherjee, Soham [1 ]
机构
[1] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
[2] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
基金
美国国家科学基金会;
关键词
Topological data analysis; Gene expression; Persistent cycles; Neural network;
D O I
10.1186/s12859-022-04704-z
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background Interpretation of high-throughput gene expression data continues to require mathematical tools in data analysis that recognizes the shape of the data in high dimensions. Topological data analysis (TDA) has recently been successful in extracting robust features in several applications dealing with high dimensional constructs. In this work, we utilize some recent developments in TDA to curate gene expression data. Our work differs from the predecessors in two aspects: (1) Traditional TDA pipelines use topological signatures called barcodes to enhance feature vectors which are used for classification. In contrast, this work involves curating relevant features to obtain somewhat better representatives with the help of TDA. This representatives of the entire data facilitates better comprehension of the phenotype labels. (2) Most of the earlier works employ barcodes obtained using topological summaries as fingerprints for the data. Even though they are stable signatures, there exists no direct mapping between the data and said barcodes. Results The topology relevant curated data that we obtain provides an improvement in shallow learning as well as deep learning based supervised classifications. We further show that the representative cycles we compute have an unsupervised inclination towards phenotype labels. This work thus shows that topological signatures are able to comprehend gene expression levels and classify cohorts accordingly. Conclusions In this work, we engender representative persistent cycles to discern the gene expression data. These cycles allow us to directly procure genes entailed in similar processes.
引用
收藏
页数:21
相关论文
共 50 条
  • [21] Active learning with support vector machine applied to gene expression data for cancer classification
    Liu, Y
    JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (06): : 1936 - 1941
  • [22] A support vector machine ensemble for cancer classification using gene expression data
    Liao, Chen
    Li, Shutao
    BIOINFORMATICS RESEARCH AND APPLICATIONS, PROCEEDINGS, 2007, 4463 : 488 - +
  • [23] A Review on Recent Progress in Machine Learning and Deep Learning Methods for Cancer Classification on Gene Expression Data
    Mazlan, Aina Umairah
    Sahabudin, Noor Azida
    Remli, Muhammad Akmal
    Ismail, Nor Syahidatul Nadiah
    Mohamad, Mohd Saberi
    Nies, Hui Wen
    Abd Warif, Nor Bakiah
    PROCESSES, 2021, 9 (08)
  • [24] Seismic Data Classification using Machine Learning
    Li, Wenrui
    Nakshatra
    Narvekar, Nishita
    Raut, Nitisha
    Sirkeci, Birsen
    Gao, Jerry
    2018 IEEE FOURTH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (IEEE BIGDATASERVICE 2018), 2018, : 56 - 63
  • [25] Advanced Machine Learning Models for Large Scale Gene Expression Analysis in Cancer Classification: Deep Learning Versus Classical Models
    Zenbout, Imene
    Meshoul, Souham
    BIG DATA, CLOUD AND APPLICATIONS, BDCA 2018, 2018, 872 : 210 - 221
  • [26] Optimizing the classification of biological tissues using machine learning models based on polarized data
    Rodriguez, Carla
    Estevez, Irene
    Gonzalez-Arnay, Emilio
    Campos, Juan
    Lizana, Angel
    JOURNAL OF BIOPHOTONICS, 2023, 16 (04)
  • [27] Classification of Gene Expression Profile Using Combinatory Method of Evolutionary Computation and Machine Learning
    Shin Ando
    Hitoshi Iba
    Genetic Programming and Evolvable Machines, 2004, 5 (2) : 145 - 156
  • [28] Multicategory classification using an Extreme Learning Machine for Microarray gene expression cancer diagnosis
    Zhang, Runxuan
    Huang, Guang-Bin
    Sundararajan, Narasimhan
    Saratchandran, P.
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2007, 4 (03) : 485 - 495
  • [29] Regularised extreme learning machine with misclassification cost and rejection cost for gene expression data classification
    Lu, Huijuan
    Wei, Shasha
    Zhou, Zili
    Miao, Yanzi
    Lu, Yi
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2015, 12 (03) : 294 - 312
  • [30] Prediction of tumor purity from gene expression data using machine learning
    Koo, Bonil
    Rhee, Je-Keun
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (06)