Gene expression data classification using topology and machine learning models

被引:2
|
作者
Dey, Tamal K. [1 ]
Mandal, Sayan [2 ]
Mukherjee, Soham [1 ]
机构
[1] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
[2] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
基金
美国国家科学基金会;
关键词
Topological data analysis; Gene expression; Persistent cycles; Neural network;
D O I
10.1186/s12859-022-04704-z
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background Interpretation of high-throughput gene expression data continues to require mathematical tools in data analysis that recognizes the shape of the data in high dimensions. Topological data analysis (TDA) has recently been successful in extracting robust features in several applications dealing with high dimensional constructs. In this work, we utilize some recent developments in TDA to curate gene expression data. Our work differs from the predecessors in two aspects: (1) Traditional TDA pipelines use topological signatures called barcodes to enhance feature vectors which are used for classification. In contrast, this work involves curating relevant features to obtain somewhat better representatives with the help of TDA. This representatives of the entire data facilitates better comprehension of the phenotype labels. (2) Most of the earlier works employ barcodes obtained using topological summaries as fingerprints for the data. Even though they are stable signatures, there exists no direct mapping between the data and said barcodes. Results The topology relevant curated data that we obtain provides an improvement in shallow learning as well as deep learning based supervised classifications. We further show that the representative cycles we compute have an unsupervised inclination towards phenotype labels. This work thus shows that topological signatures are able to comprehend gene expression levels and classify cohorts accordingly. Conclusions In this work, we engender representative persistent cycles to discern the gene expression data. These cycles allow us to directly procure genes entailed in similar processes.
引用
收藏
页数:21
相关论文
共 50 条
  • [41] Cancer Classification Using Gene Expression Data
    Sonsare, Pravinkumar
    Mujumdar, Aarya
    Joshi, Pranjali
    Morayya, Nipun
    Hablani, Sachal
    Khergade, Vedant
    SMART TRENDS IN COMPUTING AND COMMUNICATIONS, VOL 1, SMARTCOM 2024, 2024, 945 : 1 - 11
  • [42] Predicting Bone Metastasis Using Gene Expression-Based Machine Learning Models
    Albaradei, Somayah
    Uludag, Mahmut
    Thafar, Maha A.
    Gojobori, Takashi
    Essack, Magbubah
    Gao, Xin
    FRONTIERS IN GENETICS, 2021, 12
  • [43] Sensor data classification using machine learning algorithm
    Rose, Lina
    Mary, X. Anitha
    JOURNAL OF STATISTICS & MANAGEMENT SYSTEMS, 2020, 23 (02): : 363 - 371
  • [44] Classification of Logging Data Using Machine Learning Algorithms
    Mukhamediev, Ravil
    Kuchin, Yan
    Yunicheva, Nadiya
    Kalpeyeva, Zhuldyz
    Muhamedijeva, Elena
    Gopejenko, Viktors
    Rystygulov, Panabek
    APPLIED SCIENCES-BASEL, 2024, 14 (17):
  • [45] Classification of Psoriasis Microarray Data using Machine Learning
    Azam, Siti Nor Zulaika Nor
    Zakaria, Noor Hidayah
    Hassan, Rohayanti
    Zulkifle, Farizuwana Akma
    2022 2ND INTERNATIONAL CONFERENCE ON INTELLIGENT CYBERNETICS TECHNOLOGY & APPLICATIONS (ICICYTA), 2022, : 245 - 249
  • [46] PanClassif: Improving pan cancer classification of single cell RNA-seq gene expression data using machine learning
    Mahin, Kazi Ferdous
    Robiuddin, Md
    Islam, Mujahidul
    Ashraf, Shayed
    Yeasmin, Farjana
    Shatabda, Swakkhar
    GENOMICS, 2022, 114 (02)
  • [47] Classification of Microarray Gene Expression Data using Associative Classification
    Alagukumar, S.
    Lawrance, R.
    2016 INTERNATIONAL CONFERENCE ON COMPUTING TECHNOLOGIES AND INTELLIGENT DATA ENGINEERING (ICCTIDE'16), 2016,
  • [48] Optimal machine learning models for robust materials classification using ToF-SIMS data
    Madiona, Robert M. T.
    Winkler, David A.
    Muir, Benjamin W.
    Pigram, Paul J.
    APPLIED SURFACE SCIENCE, 2019, 487 : 773 - 783
  • [49] ConvNet and machine learning models with feature engineering using motor activity data for schizophrenia classification
    Ferreira, Fellipe Paes
    Daly, Aengus
    2022 IEEE 35TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS (CBMS), 2022, : 223 - 227
  • [50] Dementia classification using MR imaging and clinical data with voting based machine learning models
    Subrato Bharati
    Prajoy Podder
    Dang Ngoc Hoang Thanh
    V. B. Surya Prasath
    Multimedia Tools and Applications, 2022, 81 : 25971 - 25992