Benchmarking DNA large language models on quadruplexes

被引:0
|
作者
Cherednichenko, Oleksandr [1 ]
Herbert, Alan [1 ,2 ]
Poptsova, Maria [1 ]
机构
[1] HSE Univ, Int Lab Bioinformat, Moscow, Russia
[2] InsideOutBio, Charlestown, MA USA
关键词
Foundation model; Large language model; DNABERT; HyenaDNA; MAMBA-DNA; Caduseus; Flipons; Non-B DNA; G-quadruplexes;
D O I
10.1016/j.csbj.2025.03.007
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Large language models (LLMs) in genomics have successfully predicted various functional genomic elements. While their performance is typically evaluated using genomic benchmark datasets, it remains unclear which LLM is best suited for specific downstream tasks, particularly for generating whole-genome annotations. Current LLMs in genomics fall into three main categories: transformer-based models, long convolution-based models, and statespace models (SSMs). In this study, we benchmarked three different types of LLM architectures for generating whole-genome maps of G-quadruplexes (GQ), a type of flipons, or non-B DNA structures, characterized by distinctive patterns and functional roles in diverse regulatory contexts. Although GQ forms from folding guanosine residues into tetrads, the computational task is challenging as the bases involved may be on different strands, separated by a large number of nucleotides, or made from RNA rather than DNA. All LLMs performed comparably well, with DNABERT-2 and HyenaDNA achieving superior results based on F1 and MCC. Analysis of whole-genome annotations revealed that HyenaDNA recovered more quadruplexes in distal enhancers and intronic regions. The models were better suited to detecting large GQ arrays that likely contribute to the nuclear condensates involved in gene transcription and chromosomal scaffolds. HyenaDNA and Caduceus formed a separate grouping in the generated de novo quadruplexes, while transformer-based models clustered together. Overall, our findings suggest that different types of LLMs complement each other. Genomic architectures with varying context lengths can detect distinct functional regulatory elements, underscoring the importance of selecting the appropriate model based on the specific genomic task. The code and data underlying this article are available at https://github.com/powidla/G4s-FMs
引用
收藏
页码:992 / 1000
页数:9
相关论文
共 50 条
  • [31] RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models
    Wang, Zekun Moore
    Peng, Zhongyuan
    Qu, Haoran
    Li, Jiaheng
    Zhou, Wangchunshu
    Wu, Yuhan
    Guo, Hongcheng
    Gan, Ruitong
    Ni, Zehao
    Yang, Jian
    Zhang, Man
    Zhang, Zhaoxiang
    Ouyang, Wanli
    Xu, Ke
    Huang, Stephen W.
    Fu, Jie
    Peng, Junran
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 14743 - 14777
  • [32] Molecular models for intrastrand DNA G-quadruplexes
    Fogolari, Federico
    Haridas, Haritha
    Corazza, Alessandra
    Viglino, Paolo
    Cora, Davide
    Caselle, Michele
    Esposito, Gennaro
    Xodo, Luigi E.
    BMC STRUCTURAL BIOLOGY, 2009, 9 : 64
  • [33] Benchmarking protein language models for protein crystallization
    Mall, Raghvendra
    Kaushik, Rahul
    Martinez, Zachary A.
    Thomson, Matt W.
    Castiglione, Filippo
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [34] AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension
    Yang, Qian
    Xu, Jin
    Liu, Wenrui
    Chu, Yunfei
    Jiang, Ziyue
    Zhou, Xiaohuan
    Leng, Yichong
    Lv, Yuanjun
    Zhao, Zhou
    Zhou, Chang
    Zhou, Jingren
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 1979 - 1998
  • [35] MedBench: A Comprehensive, Standardized, and Reliable Benchmarking System for Evaluating Chinese Medical Large Language Models
    Liu, Mianxin
    Hu, Weiguo
    Ding, Jinru
    Xu, Jie
    Li, Xiaoyang
    Zhu, Lifeng
    Bai, Zhian
    Shi, Xiaoming
    Wang, Benyou
    Song, Haitao
    Liu, Pengfei
    Zhang, Xiaofan
    Wang, Shanshan
    Li, Kang
    Wang, Haofen
    Ruan, Tong
    Huang, Xuanjing
    Sun, Xin
    Zhang, Shaoting
    BIG DATA MINING AND ANALYTICS, 2024, 7 (04): : 1116 - 1128
  • [36] Benchmarking and Evaluating Large Language Models in Phishing Detection for Small and Midsize Enterprises: A Comprehensive Analysis
    Zhang, Jun
    Wu, Peiqiao
    London, Jeffrey
    Tenney, Dan
    IEEE ACCESS, 2025, 13 : 28335 - 28352
  • [37] InteNSE: Interpretability, Robustness, and Benchmarking in Neural Software Engineering (Second Edition: Large Language Models)
    University of Illinois, Urbana-Champaign, United States
    不详
    不详
    不详
    Proc. - IEEE/ACM Int. Workshop Interpretability, Robust., Benchmarking Neural Softw. Eng. InteNSE, (VI):
  • [38] Large Language Models are Not Models of Natural Language: They are Corpus Models
    Veres, Csaba
    IEEE ACCESS, 2022, 10 : 61970 - 61979
  • [39] Large Language Models
    Vargas, Diego Collarana
    Katsamanis, Nassos
    ERCIM NEWS, 2024, (136): : 12 - 13
  • [40] Large Language Models
    Cerf, Vinton G.
    COMMUNICATIONS OF THE ACM, 2023, 66 (08) : 7 - 7