An effective deep learning-based approach for splice site identification in gene expression

被引:1
|
作者
Ali, Mohsin [1 ]
Shah, Dilawar [1 ]
Qazi, Shahid [1 ]
Khan, Izaz Ahmad [1 ]
Abrar, Mohammad [2 ]
Zahir, Sana [3 ]
机构
[1] Bacha Khan Univ, Dept Comp Sci, Charsadda, KP, Pakistan
[2] Arab Open Univ, Fac Comp Sci, Muscat, Oman
[3] Univ Agr Peshawar, Inst Comp Sci & Informat Technol, Peshawar, KP, Pakistan
关键词
Artificial intelligence; deep learning; biomedical data; RNA analysis; splicing sites; genomics; COMPUTATIONAL METHOD; SEQUENCE; TRINUCLEOTIDE; DNA;
D O I
10.1177/00368504241266588
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
A crucial stage in eukaryote gene expression involves mRNA splicing by a protein assembly known as the spliceosome. This step significantly contributes to generating and properly operating the ultimate gene product. Since non-coding introns disrupt eukaryotic genes, splicing entails the elimination of introns and joining exons to create a functional mRNA molecule. Nevertheless, accurately finding splice sequence sites using various molecular biology techniques and other biological approaches is complex and time-consuming. This paper presents a precise and reliable computer-aided diagnosis (CAD) technique for the rapid and correct identification of splice site sequences. The proposed deep learning-based framework uses long short-term memory (LSTM) to extract distinct patterns from RNA sequences, enabling rapid and accurate point mutation sequence mapping. The proposed network employs one-hot encodings to find sequential patterns that effectively identify splicing sites. A thorough ablation study of traditional machine learning, one-dimensional convolutional neural networks (1D-CNNs), and recurrent neural networks (RNNs) models was conducted. The proposed LSTM network outperformed existing state-of-the-art approaches, improving accuracy by 3% and 2% for the acceptor and donor sites datasets.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Deep learning-based forgery identification and localization in videos
    Gowda, Raghavendra
    Pawar, Digambar
    SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (05) : 2185 - 2192
  • [32] Explaining deep learning-based leaf disease identification
    Ankit Rajpal
    Rashmi Mishra
    Sheetal Rajpal
    Varnika Kavita
    Naveen Bhatia
    undefined Kumar
    Soft Computing, 2024, 28 (20) : 12299 - 12322
  • [33] Deep learning-based forgery identification and localization in videos
    Raghavendra Gowda
    Digambar Pawar
    Signal, Image and Video Processing, 2023, 17 : 2185 - 2192
  • [34] Deep Learning-Based Music Chord Family Identification
    Mukherjee, Himadri
    Dhar, Ankita
    Paul, Bachchu
    Obaidullah, Sk Md
    Santosh, K. C.
    Phadikar, Santanu
    Roy, Kaushik
    INTELLIGENT COMPUTING AND COMMUNICATION, ICICC 2019, 2020, 1034 : 175 - 184
  • [35] DeepCOP: deep learning-based approach to predict gene regulating effects of small molecules
    Woo, Godwin
    Fernandez, Michael
    Hsing, Michael
    Lack, Nathan A.
    Cavga, Ayse Derya
    Cherkasov, Artem
    BIOINFORMATICS, 2020, 36 (03) : 813 - 818
  • [36] Machine learning-based identification of genetic interactions from heterogeneous gene expression profiles
    Park, Chihyun
    Kim, JungRim
    Kim, Jeongwoo
    Park, Sanghyun
    PLOS ONE, 2018, 13 (07):
  • [37] Recent advances on effective and efficient deep learning-based solutions
    Alejandro Martín
    David Camacho
    Neural Computing and Applications, 2022, 34 : 10205 - 10210
  • [38] Effective deep learning-based multi-modal retrieval
    Wang, Wei
    Yang, Xiaoyan
    Ooi, Beng Chin
    Zhang, Dongxiang
    Zhuang, Yueting
    VLDB JOURNAL, 2016, 25 (01): : 79 - 101
  • [39] Effective deep learning-based multi-modal retrieval
    Wei Wang
    Xiaoyan Yang
    Beng Chin Ooi
    Dongxiang Zhang
    Yueting Zhuang
    The VLDB Journal, 2016, 25 : 79 - 101
  • [40] Recent advances on effective and efficient deep learning-based solutions
    Martin, Alejandro
    Camacho, David
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (13): : 10205 - 10210