An intelligent computational model for prediction of promoters and their strength via natural language processing

被引:15
|
作者
Tahir, Muhammad [1 ,2 ]
Hayat, Maqsood [1 ]
Gul, Sarah [4 ]
Chong, Kil To [2 ,3 ]
机构
[1] Abdul Wali Khan Univ, Dept Comp Sci, Mardan 23200, KP, Pakistan
[2] Chonbuk Natl Univ, Dept Elect & Informat Engn, Jeonju 54896, South Korea
[3] Chonbuk Natl Univ, Adv Elect & Informat Res Ctr, Jeonju 54896, South Korea
[4] Int Islamic Univ, Dept Biol Sci, FBAS, Islamabad, Pakistan
基金
新加坡国家研究基金会;
关键词
Promoters; Convolution neural network (CNN); Natural language processing; DNA; word2vec; SEQUENCE-BASED PREDICTOR; RECOMBINATION SPOTS; ENSEMBLE CLASSIFIER; PROTEIN TYPES; IDENTIFICATION; SITES; FEATURES; SPACE; DISCRIMINATION; TRINUCLEOTIDE;
D O I
10.1016/j.chemolab.2020.104034
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In DNA, a promoter is an essential part of genes that controls the transcription of specific genes in a particular tissue or cells. The combination of RNA polymerase and a number of various proteins named "sigma-factors" can define the transcription start site (TSS) by inducing RNA holoenzyme. Further, Promoter is categorized into strong and weak promoters on the basis of promoter strength. Owing to exponential increase of RNA/DNA and protein samples in the post-genomic era, developing a simple and efficient sequential-based intelligent computational model for the discrimination of promoters is a challenging job. An intelligent computational model namely: 2L-iPSW(word2vec) was introduced for discrimination of promoters and their strength, in this regard. Machine learning and Deep learning algorithms in conjunction with natural language processing method i.e., "word2vec" are used. The proposed computational model 2L-iPSW(word2vec) achieved 91.42% of accuracy for 1st layer contains promoters and non-promoters which is 8.29% higher than the existing model, whereas 82.42% of accuracy for 2nd layer identifies strong promoter and weak promoter which is 11.22% advanced than the present model. Proposed 2L-iPSW(word2vec) model obtained efficient success rates than the present models in terms of all assessment metrics. It is thus greatly observed that the 2L-iPSW(word2vec) model will lead a useful tool for academic research on promoter identification.
引用
收藏
页数:7
相关论文
共 50 条
  • [21] Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition
    Teller, V
    COMPUTATIONAL LINGUISTICS, 2000, 26 (04) : 638 - 641
  • [22] Conformal Prediction for Natural Language Processing: A Survey
    Campos, Margarida
    Farinhas, Antonio
    Zerva, Chrysoula
    Figueiredo, Mario A. T.
    Martins, Andre F. T.
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2024, 12 : 1497 - 1516
  • [23] The handbook of computational linguistics and natural language processing.
    Hirst, Graeme
    LANGUAGE, 2011, 87 (04) : 897 - 899
  • [24] Role of constrained computational systems in natural language processing
    Univ of Pennsylvania, Philadelphia, United States
    Artif Intell, 1-2 (117-132):
  • [25] AI for Computational Vision, Natural Language Processing, and Geoinformatics
    Zheng, Wenfeng
    Liu, Mingzhe
    Li, Kenan
    Liu, Xuan
    APPLIED SCIENCES-BASEL, 2023, 13 (24):
  • [26] Role of constrained computational systems in natural language processing
    Joshi, AK
    ARTIFICIAL INTELLIGENCE, 1998, 103 (1-2) : 117 - 132
  • [27] Natural Language Processing for Cohort Discovery in a Discharge Prediction Model for the Neonatal ICU
    Temple, Michael W.
    Lehmann, Christoph U.
    Fabbri, Daniel
    APPLIED CLINICAL INFORMATICS, 2016, 7 (01): : 101 - 115
  • [28] Performance Prediction via Bayesian Matrix Factorisation for Multilingual Natural Language Processing Tasks
    Schram, Viktoria
    Beck, Daniel
    Cohn, Trevor
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 1790 - 1801
  • [29] Natural Language Processing Model Compiling Natural Language into Byte Code
    Trifan, Alexandru
    Anghelus, Marilena
    Constantinescu, Rodica
    2017 INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2017,
  • [30] Towards an unified computational model of natural language
    Ramirez Gonzalez, Benjamin
    LINGUAMATICA, 2013, 5 (02): : 91 - 100