ESA: An efficient sequence alignment algorithm for biological database search on Sunway TaihuLight

被引:1
|
作者
Zhang, Hao [1 ]
Huang, Zhiyi [1 ]
Chen, Yawen [1 ]
Liang, Jianguo [2 ]
Gao, Xiran [3 ,4 ]
机构
[1] Univ Otago, Dept Comp Sci, Dunedin 9054, New Zealand
[2] Shandong Univ Sci & Technol, Coll Comp Sci & Engn, Qingdao 266590, Peoples R China
[3] Chinese Acad Sci, ICT, State Key Lab Proc, Beijing, Peoples R China
[4] Univ Chinese Acad Sci, Beijing, Peoples R China
关键词
Hybrid sequence alignment; Biological database search; Sunway TaihuLight; SW26010; Heterogeneous architecture; SMITH-WATERMAN; PERFORMANCE; PROCESSOR;
D O I
10.1016/j.parco.2023.103043
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In computational biology, biological database search has been playing a very important role. Since the COVID19 outbreak, it has provided significant help in identifying common characteristics of viruses and developing vaccines and drugs. Sequence alignment, a method finding similarity, homology and other information between gene/protein sequences, is the usual tool in the database search. With the explosive growth of biological databases, the search process has become extremely time-consuming. However, existing parallel sequence alignment algorithms cannot deliver efficient database search due to low utilization of the resources such as cache memory and performance issues such as load imbalance and high communication overhead. In this paper, we propose an efficient sequence alignment algorithm on Sunway TaihuLight, called ESA, for biological database search. ESA adopts a novel hybrid alignment algorithm combining local and global alignments, which has higher accuracy than other sequence alignment algorithms. Further, ESA has several optimizations including cache-aware sequence alignment, capacity-aware load balancing and bandwidth-aware data transfer. They are implemented in a heterogeneous processor SW26010 adopted in the world's 6th fastest supercomputer, Sunway TaihuLight. The implementation of ESA is evaluated with the Swiss-Prot database on Sunway TaihuLight and other platforms. Our experimental results show that ESA has a speedup of 34.5 on a single core group (with 65 cores) of Sunway TaihuLight. The strong and weak scalabilities of ESA are tested with 1 to 1024 core groups of Sunway TaihuLight. The results show that ESA has linear weak scalability and very impressive strong scalability. For strong scalability, ESA achieves a speedup of 338.04 with 1024 core groups compared with a single core group. We also show that our proposed optimizations are also applicable to GPU, Intel multicore processors, and heterogeneous computing platforms.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] AlineaGA-a genetic algorithm with local search optimization for multiple sequence alignment
    Mateus da Silva, Fernando Jose
    Sanchez Perez, Juan Manuel
    Gomez Pulido, Juan Antonio
    Vega Rodriguez, Miguel A.
    APPLIED INTELLIGENCE, 2010, 32 (02) : 164 - 172
  • [32] Multiple Guide Trees in a Tabu Search Algorithm for the Multiple Sequence Alignment Problem
    Mehenni, Tahar
    COMPUTER SCIENCE AND ITS APPLICATIONS, CIIA 2015, 2015, 456 : 141 - 152
  • [33] Efficient algorithm for sequence similarity search based on reference indexing
    Dai D.-B.
    Xiong Y.
    Zhu Y.-Y.
    Ruan Jian Xue Bao/Journal of Software, 2010, 21 (04): : 718 - 731
  • [34] Automated protein sequence database classification. I. Integration of compositional similarity search, local similarity search, and multiple sequence alignment
    Gracy, J
    Argos, P
    BIOINFORMATICS, 1998, 14 (02) : 164 - 173
  • [35] An Energy-Efficient Pipelined-Multiprocessor Architecture for Biological Sequence Alignment
    Sarkar, Ardhendu
    Banerjee, Som
    Ghosh, Surajeet
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2020, 28 (12) : 2598 - 2611
  • [36] Epigenetically Inspired Modification of Genetic Algorithm and His Efficiency on Biological Sequence Alignment
    Chrominski, Kornel
    Boryczka, Mariusz
    INTELLIGENT DECISION TECHNOLOGIES 2016, PT II, 2016, 57 : 95 - 105
  • [37] Efficient GPU-Based Algorithm for Aligning Huge Sequence Database
    Lin, Chun-Yuan
    Hung, Che-Lun
    Huang, Jen-Cheng
    2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 1758 - 1762
  • [38] An Efficient Alignment Algorithm for Searching Simple Pseudoknots over Long Genomic Sequence
    Ma, Christopher
    Wong, Thomas K. F.
    Lam, T. W.
    Hon, W. K.
    Sadakane, K.
    Yiu, S. M.
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2012, 9 (06) : 1629 - 1638
  • [39] Prediction of protein function improving sequence remote alignment search by a fuzzy logic algorithm
    Gomez, Antonio
    Cedano, Juan
    Espadaler, Jordi
    Hermoso, Antonio
    Pinol, Jaume
    Querol, Enrique
    PROTEIN JOURNAL, 2008, 27 (02): : 130 - 139
  • [40] EGSA: a new enhanced gravitational search algorithm to resolve multiple sequence alignment problem
    Zemali, Elamine
    Boukra, Abdelmadjid
    INTERNATIONAL JOURNAL OF INTELLIGENT ENGINEERING INFORMATICS, 2018, 6 (1-2) : 204 - 217