STR-based feature extraction and selection for genetic feature discovery in neurological disease genes

被引:1
|
作者
Dhaliwal, Jasbir [1 ]
Wagner, John [2 ]
机构
[1] Monash Univ, Fac Informat Technol, Clayton, Vic 3800, Australia
[2] PsychoGenics Inc, Paramus, NJ 07652 USA
关键词
REPEAT; MECHANISMS;
D O I
10.1038/s41598-023-29376-4
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Gene expression, often determined by single nucleotide polymorphisms, short repeated sequences known as short tandem repeats (STRs), structural variants, and environmental factors, provides means for an organism to produce gene products necessary to live. Variation in expression levels, sometimes known as enrichment patterns, has been associated with disease progression. Thus, the STR enrichment patterns have recently gained interest as potential genetic markers for disease progression. However, to the best of our knowledge, we are unaware of any study that evaluates and explores STRs, particularly trinucleotide sequences, as machine learning features for classifying neurological disease genes for the purpose of discovering genetic features. Thus, in this paper, we proposed a new metric and a novel feature extraction and selection algorithm based on statistically significant STR-based features and their respective enrichment patterns to create a statistically significant feature set. The proposed new metric has shown that the neurological disease family genes have a non-random AA, AT, TA, TG, and TT enrichment pattern. This is an important result, as it supports prior research that has established that certain trinucleotides, such as AAT, ATA, ATT, TAT, and TTA, are favored during protein misfolding. In contrast, trinucleotides, such as TAA, TAG, and TGA, are favored during premature termination codon mutations as they are stop codons. This suggests that the metric has the potential to identify patterns that may be genetic features in a sample of neurological genes. Moreover, the practical performance and high prediction results of the statistically significant STR-based feature set indicate that variations in STR enrichment patterns can distinguish neurological disease genes. In conclusion, the proposed approach may have the potential to discover differential genetic features for other diseases.
引用
收藏
页数:19
相关论文
共 50 条
  • [41] A Review on Feature Extraction and Feature Selection for Handwritten Character Recognition
    Mohamad, Muhammad 'Arif
    Nasien, Dewi
    Hassan, Haswadi
    Haron, Habibollah
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2015, 6 (02) : 204 - 212
  • [42] Feature selection for morphological feature extraction using random forests
    Joelsson, Sveinn R.
    Benediktsson, Jon Atli
    Sveinsson, Johannes R.
    2006 7TH NORDIC SIGNAL PROCESSING SYMPOSIUM, 2006, : 138 - +
  • [43] Feature selection for morphological feature extraction using random forests
    Joelsson, Sveinn R.
    Benediktsson, Jon Atli
    Sveinsson, Johannes R.
    2006 7TH NORDIC SIGNAL PROCESSING SYMPOSIUM, 2006, : 10 - +
  • [44] A Survey of Feature Selection and Feature Extraction Techniques in Machine Learning
    Khalid, Samina
    Khalil, Tehmina
    Nasreen, Shamila
    2014 SCIENCE AND INFORMATION CONFERENCE (SAI), 2014, : 372 - 378
  • [45] Feature Extraction, Feature Selection and Classification from Electrocardiography to Emotions
    Ma Chang-wei
    Liu Guang-yuan
    PROCEEDINGS OF THE 2009 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND NATURAL COMPUTING, VOL I, 2009, : 190 - 193
  • [46] Shift Invariance based Feature Extraction and Weighted BPSO based Feature Selection for Enhanced Face Recognition
    Shetty, Santhosh
    Kelkar, Paritosh
    Manikantan, K.
    Ramachandran, S.
    FIRST INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE: MODELING TECHNIQUES AND APPLICATIONS (CIMTA) 2013, 2013, 10 : 822 - 830
  • [47] Federated learning-based disease prediction: A fusion approach with feature selection and extraction
    Kapila, Ramdas
    Saleti, Sumalatha
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 100
  • [48] A Survey on Causal Feature Selection Based on Markov Boundary Discovery
    Wu X.
    Jiang B.
    Lü S.
    Wang X.
    Chen Q.
    Chen H.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2022, 35 (05): : 422 - 438
  • [49] A Novel Approach for Feature Selection Based on MapReduce for Biomarker Discovery
    Kourid, Ahlem
    Batouche, Mohamed
    INTERNATIONAL CONFERENCE ON COMPUTER VISION AND IMAGE ANALYSIS APPLICATIONS, 2015,
  • [50] Entropic-GWT Based Feature Extraction and LBPSO Based Feature Selection for Enhanced Face Recognition
    Sah, Rageeni
    Shreeja, B., V
    Manikantan, K.
    Ramachandran, S.
    2015 INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND SIGNAL PROCESSING (ICCSP), 2015, : 180 - 184