DNA promoter task-oriented dictionary mining and prediction model based on natural language technology

被引:0
|
作者
Zeng, Ruolei [1 ]
Li, Zihan [2 ]
Li, Jialu [2 ]
Zhang, Qingchuan [2 ]
机构
[1] Univ Minnesota, Dept Comp Sci & Engn, Minneapolis, MN 55455 USA
[2] Beijing Technol & Business Univ, Natl Engn Res Ctr Agriprod Qual Traceabil, 11 Fucheng Rd, Beijing 100048, Peoples R China
来源
SCIENTIFIC REPORTS | 2025年 / 15卷 / 01期
基金
国科技部“十一五”科技计划项目;
关键词
NEURAL-NETWORK;
D O I
10.1038/s41598-024-84105-9
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Promoters are essential DNA sequences that initiate transcription and regulate gene expression. Precisely identifying promoter sites is crucial for deciphering gene expression patterns and the roles of gene regulatory networks. Recent advancements in bioinformatics have leveraged deep learning and natural language processing (NLP) to enhance promoter prediction accuracy. Techniques such as convolutional neural networks (CNNs), long short-term memory (LSTM) networks, and BERT models have been particularly impactful. However, current approaches often rely on arbitrary DNA sequence segmentation during BERT pre-training, which may not yield optimal results. To overcome this limitation, this article introduces a novel DNA sequence segmentation method. This approach develops a more refined dictionary for DNA sequences, utilizes it for BERT pre-training, and employs an Inception neural network as the foundational model. This BERT-Inception architecture captures information across multiple granularities. Experimental results show that the model improves the performance of several downstream tasks and introduces deep learning interpretability, providing new perspectives for interpreting and understanding DNA sequence information. The detailed source code is available at https://github.com/katouMegumiH/Promoter_BERT.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Research on the Technology of Data Mining and Knowledge Discovery based on Natural Language Processing Algorithm
    Sun, JiangYan
    2015 2ND INTERNATIONAL SYMPOSIUM ON ENGINEERING TECHNOLOGY, EDUCATION AND MANAGEMENT (ISETEM 2015), 2015, : 142 - 147
  • [42] Prediction Model of Glutamic Acid Production of Data Mining Based on R Language
    Wang, Guicheng
    Xu, Ye
    Duan, Qiaoyu
    Zhang, Min
    Xu, Bing
    2017 29TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2017, : 6806 - 6810
  • [43] Prediction model of regional economic development potential based on data mining technology
    Yang, Jiayan
    ENGINEERING REPORTS, 2023, 5 (06)
  • [44] Application of natural language processing technology based on TensorFlow framework in text mining and discovery algorithm
    Cao, Yu
    Xu, Gaofeng
    Gao, You
    Song, Changxin
    IET COMMUNICATIONS, 2023, 17 (13) : 1648 - 1654
  • [45] Improving language model of human genome for DNA–protein binding prediction based on task-specific pre-training
    Hanyu Luo
    Wenyu Shan
    Cheng Chen
    Pingjian Ding
    Lingyun Luo
    Interdisciplinary Sciences: Computational Life Sciences, 2023, 15 : 32 - 43
  • [46] Hot News Prediction Method Based on Natural Language Processing Technology and Its Application
    Bao, Yiqin
    Sun, Zhengtang
    Zhao, Qiang
    Lin, Tianya
    Zheng, Hao
    AUTOMATIC CONTROL AND COMPUTER SCIENCES, 2022, 56 (01) : 83 - 94
  • [47] Hot News Prediction Method Based on Natural Language Processing Technology and Its Application
    Zhengtang Yiqin Bao
    Qiang Sun
    Tianya Zhao
    Hao Lin
    Automatic Control and Computer Sciences, 2022, 56 : 83 - 94
  • [48] An Operation-oriented Document Natural Language Understanding Method Based on Event Model
    Xie, Baoling
    Liu, Kan
    PROCEEDINGS OF 2014 IEEE INTERNATIONAL CONFERENCE ON PROGRESS IN INFORMATICS AND COMPUTING (PIC), 2014, : 16 - 20
  • [49] Improving language model of human genome for DNA-protein binding prediction based on task-specific pre-training
    Luo, Hanyu
    Shan, Wenyu
    Chen, Cheng
    Ding, Pingjian
    Luo, Lingyun
    INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES, 2023, 15 (01) : 32 - 43
  • [50] Mining and Application of Tourism Online Review Text Based on Natural Language Processing and Text Classification Technology
    Xu, Hongsheng
    Lv, Yanqing
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022