STR-based feature extraction and selection for genetic feature discovery in neurological disease genes

被引:1
|
作者
Dhaliwal, Jasbir [1 ]
Wagner, John [2 ]
机构
[1] Monash Univ, Fac Informat Technol, Clayton, Vic 3800, Australia
[2] PsychoGenics Inc, Paramus, NJ 07652 USA
关键词
REPEAT; MECHANISMS;
D O I
10.1038/s41598-023-29376-4
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Gene expression, often determined by single nucleotide polymorphisms, short repeated sequences known as short tandem repeats (STRs), structural variants, and environmental factors, provides means for an organism to produce gene products necessary to live. Variation in expression levels, sometimes known as enrichment patterns, has been associated with disease progression. Thus, the STR enrichment patterns have recently gained interest as potential genetic markers for disease progression. However, to the best of our knowledge, we are unaware of any study that evaluates and explores STRs, particularly trinucleotide sequences, as machine learning features for classifying neurological disease genes for the purpose of discovering genetic features. Thus, in this paper, we proposed a new metric and a novel feature extraction and selection algorithm based on statistically significant STR-based features and their respective enrichment patterns to create a statistically significant feature set. The proposed new metric has shown that the neurological disease family genes have a non-random AA, AT, TA, TG, and TT enrichment pattern. This is an important result, as it supports prior research that has established that certain trinucleotides, such as AAT, ATA, ATT, TAT, and TTA, are favored during protein misfolding. In contrast, trinucleotides, such as TAA, TAG, and TGA, are favored during premature termination codon mutations as they are stop codons. This suggests that the metric has the potential to identify patterns that may be genetic features in a sample of neurological genes. Moreover, the practical performance and high prediction results of the statistically significant STR-based feature set indicate that variations in STR enrichment patterns can distinguish neurological disease genes. In conclusion, the proposed approach may have the potential to discover differential genetic features for other diseases.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] STR-based feature extraction and selection for genetic feature discovery in neurological disease genes
    Jasbir Dhaliwal
    John Wagner
    Scientific Reports, 13
  • [2] Wavelet feature extraction and genetic feature selection for multisource data
    Ulfarsson, MO
    Benediktsson, JA
    Sveinsson, JR
    IGARSS 2002: IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM AND 24TH CANADIAN SYMPOSIUM ON REMOTE SENSING, VOLS I-VI, PROCEEDINGS: REMOTE SENSING: INTEGRATING OUR VIEW OF THE PLANET, 2002, : 3329 - 3331
  • [3] Genetic Based LBP Feature Extraction and Selection for Facial Recognition
    Shelton, Joseph
    Dozier, Gerry
    Bryant, Kelvin
    Adams, Joshua
    Popplewell, Khary
    Abegaz, Tamirat
    Purrington, Kamilah
    Woodard, Damon L.
    Ricanek, Karl
    PROCEEDINGS OF THE 49TH ANNUAL ASSOCIATION FOR COMPUTING MACHINERY SOUTHEAST CONFERENCE (ACMSE '11), 2011, : 197 - 200
  • [4] Feature extraction and selection based on genetic algorithm for hyperion hyperspectral images
    Wang Zhenhai
    Hu Guangdao
    Zhang Hongjun
    PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE: 50 YEARS' ACHIEVEMENTS, FUTURE DIRECTIONS AND SOCIAL IMPACTS, 2006, : 265 - 267
  • [5] Classifier design with feature selection and feature extraction using layered genetic programming
    Lin, Jung-Yi
    Ke, Hao-Ren
    Chien, Been-Chian
    Yang, Wei-Pang
    EXPERT SYSTEMS WITH APPLICATIONS, 2008, 34 (02) : 1384 - 1393
  • [6] A Feature Selection Method Based on Feature Grouping and Genetic Algorithm
    Lin, Xiaohui
    Wang, Xiaomei
    Xiao, Niyi
    Huang, Xin
    Wang, Jue
    INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING: BIG DATA AND MACHINE LEARNING TECHNIQUES, ISCIDE 2015, PT II, 2015, 9243 : 150 - 158
  • [7] A Hybrid Approach for Feature Selection Based on Correlation Feature Selection and Genetic Algorithm
    Rani, Pooja
    Kumar, Rajneesh
    Jain, Anurag
    INTERNATIONAL JOURNAL OF SOFTWARE INNOVATION, 2022, 10 (01)
  • [8] SUBSPACE DETECTION BASED ON THE COMBINATION OF NONLINEAR FEATURE EXTRACTION AND FEATURE SELECTION
    Hossain, Md. Ali
    Jia, Xiuping
    Pickering, Mark
    2013 5TH WORKSHOP ON HYPERSPECTRAL IMAGE AND SIGNAL PROCESSING: EVOLUTION IN REMOTE SENSING (WHISPERS), 2013,
  • [9] Smile recognition based on PHOG feature extraction and Clustering feature selection
    Guo, Li-Hua
    Bai, Yang
    Jin, Lian-Wen
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2012, 25 (01): : 23 - 28
  • [10] Using Feature Selection with Bagging and Rule Extraction in Drug Discovery
    Johansson, Ulf
    Sonstrod, Cecilia
    Norinder, Ulf
    Bostrom, Henrik
    Lofstrom, Tuve
    ADVANCES IN INTELLIGENT DECISION TECHNOLOGIES, 2010, 4 : 413 - +