Towards a HPC-oriented parallel implementation of a learning algorithm for bioinformatics applications

被引:14
|
作者
D'Angelo, Gianni [1 ,2 ]
Rampone, Salvatore [1 ,2 ]
机构
[1] Univ Sannio, Dept Sci & Technol, Benevento, Italy
[2] Futuridea Innovaz Utile & Sostenibile, Benevento, Italy
来源
BMC BIOINFORMATICS | 2014年 / 15卷
关键词
INCOMPLETE DATA; GENE;
D O I
10.1186/1471-2105-15-S5-S2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The huge quantity of data produced in Biomedical research needs sophisticated algorithmic methodologies for its storage, analysis, and processing. High Performance Computing (HPC) appears as a magic bullet in this challenge. However, several hard to solve parallelization and load balancing problems arise in this context. Here we discuss the HPC-oriented implementation of a general purpose learning algorithm, originally conceived for DNA analysis and recently extended to treat uncertainty on data (U-BRAIN). The U-BRAIN algorithm is a learning algorithm that finds a Boolean formula in disjunctive normal form (DNF), of approximately minimum complexity, that is consistent with a set of data (instances) which may have missing bits. The conjunctive terms of the formula are computed in an iterative way by identifying, from the given data, a family of sets of conditions that must be satisfied by all the positive instances and violated by all the negative ones; such conditions allow the computation of a set of coefficients (relevances) for each attribute (literal), that form a probability distribution, allowing the selection of the term literals. The great versatility that characterizes it, makes U-BRAIN applicable in many of the fields in which there are data to be analyzed. However the memory and the execution time required by the running are of O(n(3)) and of O(n(5)) order, respectively, and so, the algorithm is unaffordable for huge data sets. Results: We find mathematical and programming solutions able to lead us towards the implementation of the algorithm U-BRAIN on parallel computers. First we give a Dynamic Programming model of the U-BRAIN algorithm, then we minimize the representation of the relevances. When the data are of great size we are forced to use the mass memory, and depending on where the data are actually stored, the access times can be quite different. According to the evaluation of algorithmic efficiency based on the Disk Model, in order to reduce the costs of the communications between different memories (RAM, Cache, Mass, Virtual) and to achieve efficient I/O performance, we design a mass storage structure able to access its data with a high degree of temporal and spatial locality. Then we develop a parallel implementation of the algorithm. We model it as a SPMD system together to a Message-Passing Programming Paradigm. Here, we adopt the high-level message-passing systems MPI (Message Passing Interface) in the version for the Java programming language, MPJ. The parallel processing is organized into four stages: partitioning, communication, agglomeration and mapping. The decomposition of the U-BRAIN algorithm determines the necessity of a communication protocol design among the processors involved. Efficient synchronization design is also discussed. Conclusions: In the context of a collaboration between public and private institutions, the parallel model of U-BRAIN has been implemented and tested on the INTEL XEON E7xxx and E5xxx family of the CRESCO structure of Italian National Agency for New Technologies, Energy and Sustainable Economic Development (ENEA), developed in the framework of the European Grid Infrastructure (EGI), a series of efforts to provide access to high-throughput computing resources across Europe using grid computing techniques. The implementation is able to minimize
引用
收藏
页数:15
相关论文
共 43 条
  • [1] Towards a HPC-oriented parallel implementation of a learning algorithm for bioinformatics applications
    Gianni D'Angelo
    Salvatore Rampone
    BMC Bioinformatics, 15
  • [2] HPC-oriented Canonical Workflows for Machine Learning Applications in Climate and Weather Prediction
    Mozaffari, Amirpasha
    Langguth, Michael
    Gong, Bing
    Ahring, Jessica
    Campos, Adrian Rojas
    Nieters, Pascal
    Escobar, Otoniel Jose Campos
    Wittenbrink, Martin
    Baumann, Peter
    Schultz, Martin G.
    DATA INTELLIGENCE, 2022, 4 (02) : 271 - 285
  • [3] HPC-oriented Canonical Workflows for Machine Learning Applications in Climate and Weather Prediction附视频
    Amirpasha Mozaffari
    Michael Langguth
    Bing Gong
    Jessica Ahring
    Adrian Rojas Campos
    Pascal Nieters
    Otoniel Jos Campos Escobar
    Martin Wittenbrink
    Peter Baumann
    Martin GSchultz
    Data Intelligence, 2022, (02) : 271 - 285
  • [4] Using an Adaptive and time predictable Runtime System for Power-Aware HPC-oriented applications
    Portero, A.
    Sevcik, J.
    Golasowski, M.
    Vavrik, R.
    Libutti, S.
    Massari, G.
    Catthoor, F.
    Fornaciari, W.
    Vondrak, V.
    2016 SEVENTH INTERNATIONAL GREEN AND SUSTAINABLE COMPUTING CONFERENCE (IGSC), 2016,
  • [5] Bioinformatics algorithm based on a parallel implementation of a machine learning approach using transducers
    Roche-Lima, Abiel
    Thulasiram, Ruppa K.
    HIGH PERFORMANCE COMPUTING SYMPOSIUM 2011, 2012, 341
  • [6] Parallel Clustering Algorithm for Large Data Sets with Applications in Bioinformatics
    Olman, Victor
    Mao, Fenglou
    Wu, Hongwei
    Xu, Ying
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2009, 6 (02) : 344 - 352
  • [7] Design and Implementation of Word2Vec Parallel Algorithm Based on HPC
    Yi, Xianyong
    Zheng, Rongge
    Wang, Aoyu
    Qin, Hao
    Chen, Yufeng
    2017 CHINESE AUTOMATION CONGRESS (CAC), 2017, : 585 - 590
  • [8] Parallel Implementation of a Machine Learning Algorithm on GPU
    Salvatore Cuomo
    Pasquale De Michele
    Emanuel Di Nardo
    Livia Marcellino
    International Journal of Parallel Programming, 2018, 46 : 923 - 942
  • [9] Parallel Implementation of a Machine Learning Algorithm on GPU
    Cuomo, Salvatore
    De Michele, Pasquale
    Di Nardo, Emanuel
    Marcellino, Livia
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2018, 46 (05) : 923 - 942
  • [10] Parallel implementation of the concurrent algorithm for acoustic field distribution calculating in heterogeneous HPC environment
    Szpakowski, A.
    Pustelny, T.
    JOURNAL DE PHYSIQUE IV, 2006, 137 : 153 - 156