DNA promoter task-oriented dictionary mining and prediction model based on natural language technology

被引:0
|
作者
Zeng, Ruolei [1 ]
Li, Zihan [2 ]
Li, Jialu [2 ]
Zhang, Qingchuan [2 ]
机构
[1] Univ Minnesota, Dept Comp Sci & Engn, Minneapolis, MN 55455 USA
[2] Beijing Technol & Business Univ, Natl Engn Res Ctr Agriprod Qual Traceabil, 11 Fucheng Rd, Beijing 100048, Peoples R China
来源
SCIENTIFIC REPORTS | 2025年 / 15卷 / 01期
基金
国科技部“十一五”科技计划项目;
关键词
NEURAL-NETWORK;
D O I
10.1038/s41598-024-84105-9
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Promoters are essential DNA sequences that initiate transcription and regulate gene expression. Precisely identifying promoter sites is crucial for deciphering gene expression patterns and the roles of gene regulatory networks. Recent advancements in bioinformatics have leveraged deep learning and natural language processing (NLP) to enhance promoter prediction accuracy. Techniques such as convolutional neural networks (CNNs), long short-term memory (LSTM) networks, and BERT models have been particularly impactful. However, current approaches often rely on arbitrary DNA sequence segmentation during BERT pre-training, which may not yield optimal results. To overcome this limitation, this article introduces a novel DNA sequence segmentation method. This approach develops a more refined dictionary for DNA sequences, utilizes it for BERT pre-training, and employs an Inception neural network as the foundational model. This BERT-Inception architecture captures information across multiple granularities. Experimental results show that the model improves the performance of several downstream tasks and introduces deep learning interpretability, providing new perspectives for interpreting and understanding DNA sequence information. The detailed source code is available at https://github.com/katouMegumiH/Promoter_BERT.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Task-Oriented Muscle Synergy Extraction Using An Autoencoder-Based Neural Model
    Buongiorno, Domenico
    Cascarano, Giacomo Donato
    Camardella, Cristian
    De Feudis, Irio
    Frisoli, Antonio
    Bevilacqua, Vitoantonio
    INFORMATION, 2020, 11 (04)
  • [32] CrossAligner & Co: Zero-Shot Transfer Methods for Task-Oriented Cross-lingual Natural Language Understanding
    Gritta, Milan
    Hu, Ruoyu
    Iacobacci, Ignacio
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 4048 - 4061
  • [33] CASA-NLU: Context-Aware Self-Attentive Natural Language Understanding for Task-Oriented Chatbots
    Gupta, Arshit
    Zhang, Peng
    Lalwani, Garima
    Diab, Mona
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 1285 - 1290
  • [34] Task-oriented arm training for stroke patients based on remote handling technology concepts: A feasibility study
    Elmanowski, Jule
    Kleynen, Melanie
    Geers, Richard P. J.
    Rovelo-Ruiz, Gustavo
    Geurts, Eva
    Coninx, Karin
    Verbunt, Jeanine A.
    Seelen, Henk A. M.
    TECHNOLOGY AND HEALTH CARE, 2023, 31 (05) : 1593 - 1605
  • [35] DSPM-NLG: A Dual Supervised Pre-trained Model for Few-shot Natural Language Generation in Task-oriented Dialogue System
    Wang, Yufan
    Zou, Bowei
    Fan, Rui
    He, Tingting
    Awe, Ai Ti
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 12389 - 12402
  • [36] Task-oriented arm training for stroke patients based on remote handling technology concepts: Results of a pilot study
    Elmanowski, Jule
    Kleynen, Melanie
    Geers, Richard
    Verbunt, Jeanine
    Seelen, Henk
    2023 IEEE 36TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, CBMS, 2023, : 410 - 415
  • [37] Actor-Double-Critic: Incorporating Model-Based Critic for Task-Oriented Dialogue Systems
    Wu, Yen-Chen
    Tseng, Bo-Hsiang
    Gasic, Milica
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 854 - 863
  • [38] Two-Step Masked Language Model for Domain-Adapting Multi-Modal Task-Oriented Dialogue Systems
    Chang, Youngjae
    Ko, Youngjoong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2938 - 2943
  • [39] A task-oriented deep learning framework based on target-related transformer network for industrial quality prediction applications
    Wang, Yalin
    Dai, Rao
    Liu, Diju
    Wang, Kai
    Yuan, Xiaofeng
    Liu, Chenliang
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [40] MULTI3NLU++: A Multilingual, Multi-Intent, Multi-Domain Dataset for Natural Language Understanding in Task-Oriented Dialogue
    Moghe, Nikita
    Razumovskaia, Evgeniia
    Guillou, Liane
    Vulic, Ivan
    Korhonen, Anna
    Birch, Alexandra
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 3732 - 3755