ESA: An efficient sequence alignment algorithm for biological database search on Sunway TaihuLight

被引:1
|
作者
Zhang, Hao [1 ]
Huang, Zhiyi [1 ]
Chen, Yawen [1 ]
Liang, Jianguo [2 ]
Gao, Xiran [3 ,4 ]
机构
[1] Univ Otago, Dept Comp Sci, Dunedin 9054, New Zealand
[2] Shandong Univ Sci & Technol, Coll Comp Sci & Engn, Qingdao 266590, Peoples R China
[3] Chinese Acad Sci, ICT, State Key Lab Proc, Beijing, Peoples R China
[4] Univ Chinese Acad Sci, Beijing, Peoples R China
关键词
Hybrid sequence alignment; Biological database search; Sunway TaihuLight; SW26010; Heterogeneous architecture; SMITH-WATERMAN; PERFORMANCE; PROCESSOR;
D O I
10.1016/j.parco.2023.103043
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In computational biology, biological database search has been playing a very important role. Since the COVID19 outbreak, it has provided significant help in identifying common characteristics of viruses and developing vaccines and drugs. Sequence alignment, a method finding similarity, homology and other information between gene/protein sequences, is the usual tool in the database search. With the explosive growth of biological databases, the search process has become extremely time-consuming. However, existing parallel sequence alignment algorithms cannot deliver efficient database search due to low utilization of the resources such as cache memory and performance issues such as load imbalance and high communication overhead. In this paper, we propose an efficient sequence alignment algorithm on Sunway TaihuLight, called ESA, for biological database search. ESA adopts a novel hybrid alignment algorithm combining local and global alignments, which has higher accuracy than other sequence alignment algorithms. Further, ESA has several optimizations including cache-aware sequence alignment, capacity-aware load balancing and bandwidth-aware data transfer. They are implemented in a heterogeneous processor SW26010 adopted in the world's 6th fastest supercomputer, Sunway TaihuLight. The implementation of ESA is evaluated with the Swiss-Prot database on Sunway TaihuLight and other platforms. Our experimental results show that ESA has a speedup of 34.5 on a single core group (with 65 cores) of Sunway TaihuLight. The strong and weak scalabilities of ESA are tested with 1 to 1024 core groups of Sunway TaihuLight. The results show that ESA has linear weak scalability and very impressive strong scalability. For strong scalability, ESA achieves a speedup of 338.04 with 1024 core groups compared with a single core group. We also show that our proposed optimizations are also applicable to GPU, Intel multicore processors, and heterogeneous computing platforms.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Prediction of Protein Function Improving Sequence Remote Alignment Search by a Fuzzy Logic Algorithm
    Antonio Gómez
    Juan Cedano
    Jordi Espadaler
    Antonio Hermoso
    Jaume Piñol
    Enrique Querol
    The Protein Journal, 2008, 27 : 130 - 139
  • [42] A Highly Parameterized and Efficient FPGA-Based Skeleton for Pairwise Biological Sequence Alignment
    Benkrid, Khaled
    Liu, Ying
    Benkrid, AbdSamad
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2009, 17 (04) : 561 - 570
  • [43] REDUCING THE SEARCH SPACE AND TIME COMPLEXITY OF NEEDLEMAN-WUNSCH ALGORITHM SMITH-WATERMAN ALGORITHM (LOCAL ALIGNMENT) FOR DNA SEQUENCE ALIGNMENT
    Muhamad, F. N.
    Ahmad, R. B.
    Asi, S. Mohd.
    Murad, M. N.
    JURNAL TEKNOLOGI, 2015, 77 (20): : 137 - 146
  • [44] Compact variant-rich customized sequence database and a fast and sensitive database search for efficient proteogenomic analyses
    Park, Heejin
    Bae, Junwoo
    Kim, Hyunwoo
    Kim, Sangok
    Kim, Hokeun
    Mun, Dong-Gi
    Joh, Yoonsung
    Lee, Wonyeop
    Chae, Sehyun
    Lee, Sanghyuk
    Kim, Hark Kyun
    Hwang, Daehee
    Lee, Sang-Won
    Paek, Eunok
    PROTEOMICS, 2014, 14 (23-24) : 2742 - 2749
  • [45] CalcGen sequence assembler using a spatio-temporally efficient DNA sequence search algorithm
    Yoon, Kyong Oh
    Cho, Sung-Bae
    4TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SYSTEMS-BIOLOGY AND BIOINFORMATICS (CSBIO2013), 2013, 23 : 122 - 128
  • [46] SAM: String-based sequence search algorithm for mitochondrial DNA database queries
    Roeck, Alexander
    Irwin, Jodi
    Duer, Arne
    Parsons, Thomas
    Parson, Walther
    FORENSIC SCIENCE INTERNATIONAL-GENETICS, 2011, 5 (02) : 126 - 132
  • [47] A Novel Efficient Mining Algorithm For Frequent Patterns On Biological Multiple Sequence
    Liu, Wei
    Chen, Ling
    FRONTIERS OF MANUFACTURING AND DESIGN SCIENCE, PTS 1-4, 2011, 44-47 : 3697 - +
  • [48] A fast algorithm for exact sequence search in biological sequences using polyphase decomposition
    Srikantha, Abhilash
    Bopardikar, Ajit S.
    Kaipa, Kalyan Kumar
    Venkataraman, Parthasarathy
    Lee, Kyusang
    Ahn, TaeJin
    Narayanan, Rangavittal
    BIOINFORMATICS, 2010, 26 (18) : i414 - i419
  • [49] Toward efficient multiple molecular sequence alignment: A system of genetic algorithm and dynamic programming
    Zhang, C
    Wong, AKC
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 1997, 27 (06): : 918 - 932
  • [50] Bwasw-Cloud: Efficient Sequence Alignment Algorithm for Two Big Data with MapReduce
    Sun, Mingming
    Zhou, Xuehai
    Yang, Feng
    Lu, Kun
    Dai, Dong
    2014 FIFTH INTERNATIONAL CONFERENCE ON THE APPLICATIONS OF DIGITAL INFORMATION AND WEB TECHNOLOGIES (ICADIWT), 2014, : 213 - 218