Gene expression data classification using topology and machine learning models

被引:2
|
作者
Dey, Tamal K. [1 ]
Mandal, Sayan [2 ]
Mukherjee, Soham [1 ]
机构
[1] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
[2] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
基金
美国国家科学基金会;
关键词
Topological data analysis; Gene expression; Persistent cycles; Neural network;
D O I
10.1186/s12859-022-04704-z
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background Interpretation of high-throughput gene expression data continues to require mathematical tools in data analysis that recognizes the shape of the data in high dimensions. Topological data analysis (TDA) has recently been successful in extracting robust features in several applications dealing with high dimensional constructs. In this work, we utilize some recent developments in TDA to curate gene expression data. Our work differs from the predecessors in two aspects: (1) Traditional TDA pipelines use topological signatures called barcodes to enhance feature vectors which are used for classification. In contrast, this work involves curating relevant features to obtain somewhat better representatives with the help of TDA. This representatives of the entire data facilitates better comprehension of the phenotype labels. (2) Most of the earlier works employ barcodes obtained using topological summaries as fingerprints for the data. Even though they are stable signatures, there exists no direct mapping between the data and said barcodes. Results The topology relevant curated data that we obtain provides an improvement in shallow learning as well as deep learning based supervised classifications. We further show that the representative cycles we compute have an unsupervised inclination towards phenotype labels. This work thus shows that topological signatures are able to comprehend gene expression levels and classify cohorts accordingly. Conclusions In this work, we engender representative persistent cycles to discern the gene expression data. These cycles allow us to directly procure genes entailed in similar processes.
引用
收藏
页数:21
相关论文
共 50 条
  • [31] Deep learning techniques for cancer classification using microarray gene expression data
    Gupta, Surbhi
    Gupta, Manoj K.
    Shabaz, Mohammad
    Sharma, Ashutosh
    FRONTIERS IN PHYSIOLOGY, 2022, 13
  • [32] Heart Disease Classification Using Machine Learning Models
    Folorunso, Sakinat Oluwabukonla
    Awotunde, Joseph Bamidele
    Adeniyi, Emmanuel Abidemi
    Abiodun, Kazeem Moses
    Ayo, Femi Emmanuel
    INFORMATICS AND INTELLIGENT APPLICATIONS, 2022, 1547 : 35 - 49
  • [33] Deep Learning Based Tumor Type Classification Using Gene Expression Data
    Lyu, Boyu
    Haque, Anamul
    ACM-BCB'18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS, 2018, : 89 - 96
  • [34] Domain Text Classification Using Machine Learning Models
    Rao, Akula V. S. Siva Rama
    Bhavani, D. Ganga
    Krishna, J. Gopi
    Swapna, B.
    Varma, K. Rama Sai
    PROCEEDINGS OF SECOND INTERNATIONAL CONFERENCE ON SUSTAINABLE EXPERT SYSTEMS (ICSES 2021), 2022, 351 : 573 - 582
  • [35] Cancer Classification Based on Microarray Gene Expression Data Using Deep Learning
    Guillen, Pablo
    Ebalunode, Jerry
    2016 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE & COMPUTATIONAL INTELLIGENCE (CSCI), 2016, : 1403 - 1405
  • [36] App Success Classification Using Machine Learning Models
    Magar, Biplab Thapa
    Mali, Subin
    Abdelfattah, Eman
    2021 IEEE 11TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2021, : 642 - 647
  • [37] Gene reduction and machine learning algorithms for cancer classification based on microarray gene expression data: A comprehensive review
    Osama, Sarah
    Shaban, Hassan
    Ali, Abdelmgeid A.
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 213
  • [38] Machine Learning Models for Cancer Type Classification with Unstructured Data
    Montelongo Gonzalez, Erick E.
    Reyes Ortiz, Jose A.
    Gonzalez Beltran, Beatriz A.
    COMPUTACION Y SISTEMAS, 2020, 24 (02): : 403 - 411
  • [39] Classification of gene functions using support vector machine for time-course gene expression data
    Park, Changyi
    Koo, Ja-Yong
    Kim, Sujong
    Sohn, Insuk
    Lee, Jae Won
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2008, 52 (05) : 2578 - 2587
  • [40] Cancer classification using gene expression data
    Lu, Y
    Han, JW
    INFORMATION SYSTEMS, 2003, 28 (04) : 243 - 268