STR-based feature extraction and selection for genetic feature discovery in neurological disease genes

被引:1
|
作者
Dhaliwal, Jasbir [1 ]
Wagner, John [2 ]
机构
[1] Monash Univ, Fac Informat Technol, Clayton, Vic 3800, Australia
[2] PsychoGenics Inc, Paramus, NJ 07652 USA
关键词
REPEAT; MECHANISMS;
D O I
10.1038/s41598-023-29376-4
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Gene expression, often determined by single nucleotide polymorphisms, short repeated sequences known as short tandem repeats (STRs), structural variants, and environmental factors, provides means for an organism to produce gene products necessary to live. Variation in expression levels, sometimes known as enrichment patterns, has been associated with disease progression. Thus, the STR enrichment patterns have recently gained interest as potential genetic markers for disease progression. However, to the best of our knowledge, we are unaware of any study that evaluates and explores STRs, particularly trinucleotide sequences, as machine learning features for classifying neurological disease genes for the purpose of discovering genetic features. Thus, in this paper, we proposed a new metric and a novel feature extraction and selection algorithm based on statistically significant STR-based features and their respective enrichment patterns to create a statistically significant feature set. The proposed new metric has shown that the neurological disease family genes have a non-random AA, AT, TA, TG, and TT enrichment pattern. This is an important result, as it supports prior research that has established that certain trinucleotides, such as AAT, ATA, ATT, TAT, and TTA, are favored during protein misfolding. In contrast, trinucleotides, such as TAA, TAG, and TGA, are favored during premature termination codon mutations as they are stop codons. This suggests that the metric has the potential to identify patterns that may be genetic features in a sample of neurological genes. Moreover, the practical performance and high prediction results of the statistically significant STR-based feature set indicate that variations in STR enrichment patterns can distinguish neurological disease genes. In conclusion, the proposed approach may have the potential to discover differential genetic features for other diseases.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] Deluge based Genetic Algorithm for feature selection
    Ritam Guha
    Manosij Ghosh
    Souvik Kapri
    Sushant Shaw
    Shyok Mutsuddi
    Vikrant Bhateja
    Ram Sarkar
    Evolutionary Intelligence, 2021, 14 : 357 - 367
  • [22] Image feature selection based on genetic algorithm
    Lei, Liang
    Peng, Jun
    Yang, Bo
    Lecture Notes in Electrical Engineering, 2013, 219 LNEE (VOL. 4): : 825 - 831
  • [23] Hybrid Genetic Algorithm for Medical Image Feature Extraction and selection
    Nagarajan, G.
    Minu, R. I.
    Muthukumar, B.
    Vedanarayanan, V.
    Sundarsingh, S. D.
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL MODELLING AND SECURITY (CMS 2016), 2016, 85 : 455 - 462
  • [24] Feature Extraction and Selection for Parsimonious Classifiers With Multiobjective Genetic Programming
    Nag, Kaustuv
    Pal, Nikhil R.
    IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2020, 24 (03) : 454 - 466
  • [25] Analysis and Evaluation of Feature Selection and Feature Extraction Methods
    Nogales, Ruben E.
    Benalcazar, Marco E.
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2023, 16 (01)
  • [26] Analysis and Evaluation of Feature Selection and Feature Extraction Methods
    Rubén E. Nogales
    Marco E. Benalcázar
    International Journal of Computational Intelligence Systems, 16
  • [27] Bridging Feature Selection and Extraction: Compound Feature Generation
    Sreevani
    Murthy, C. A.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2017, 29 (04) : 757 - 770
  • [28] A Review on Feature Selection and Feature Extraction for Text Classification
    Shah, Foram P.
    Patel, Vibha
    PROCEEDINGS OF THE 2016 IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET), 2016, : 2264 - 2268
  • [29] Simultaneous Feature Selection and Extraction Using Feature Significance
    Maji, Pradipta
    Garai, Partha
    FUNDAMENTA INFORMATICAE, 2015, 136 (04) : 405 - 431
  • [30] A two stages algorithm for feature selection based on feature score and genetic algorithms
    Huang, Zhi
    INTELLIGENT DECISION TECHNOLOGIES-NETHERLANDS, 2019, 13 (02): : 139 - 151