Reducing data transfer in big-data workflows: the computation-flow delegated approach

被引:0
|
作者
Rickey T. P. Nunes
Santosh L. Deshpande
机构
[1] Government Polytechnic,Department of Computer Engineering
[2] Visvesvaraya Technological University,Centre for Postgraduate Studies
来源
关键词
Big-data; Bioinformatics; Orchestration; Workflow; Mobile agents; Computation-flow;
D O I
10.1007/s42488-019-00012-z
中图分类号
学科分类号
摘要
Existing orchestrated bioinformatics workflow execution approaches necessitate the transfer of datasets from biological data services to the analysis tool (computation) services of the workflow for various data analysis. This model of moving data to computation during workflow execution weakens the performance of the workflow especially when the orchestrated bioinformatics workflow has to handle big-data in it. Since the size of the analysis tools are much smaller than the datasets size in a workflow, in this paper, to minimize the dataflow and improve workflow performance, we propose a novel computation-flow delegated (CFD) approach. The CFD approach lets the tool services of the workflow to dynamically migrate analysis tools towards the datasets to perform computation on data side during workflow execution. We use a set of mobile agents to operate the CFD approach and present a mobile agent-based computation-flow delegation framework (MABCFD) to execute the workflow tasks. We implement the prototype of the MABCFD framework and analyze the performance of the CFD approach empirically by executing in isolation workflow patterns (sequence, fan-out and fan-in) common to bioinformatics applications. Performance analysis shows that the computation-driven CFD approach consistently outperforms the existing data-driven approaches across all patterns and scales favorably with data size.
引用
收藏
页码:129 / 145
页数:16
相关论文
共 50 条
  • [41] On the Timed Analysis of Big-Data Applications
    Marconi, Francesco
    Quattrocchi, Giovanni
    Baresi, Luciano
    Bersani, Marcello M.
    Rossi, Matteo
    NASA FORMAL METHODS, NFM 2018, 2018, 10811 : 315 - 332
  • [42] Rethinking Data Management for Big Data Scientific Workflows
    Vahi, Karan
    Rynge, Mats
    Juve, Gideon
    Mayani, Rajiv
    Deelman, Ewa
    2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
  • [43] Approximate Incremental Big-Data Harmonization
    Agarwal, Puneet
    Shroff, Gautam
    Malhotra, Pankaj
    2013 IEEE INTERNATIONAL CONGRESS ON BIG DATA, 2013, : 118 - 125
  • [44] Mending the Big-Data Missing Information
    Daltrophe, Hadassa
    Dolcv, ShIomi
    Lotker, Zvi
    2016 IEEE INTERNATIONAL CONFERENCE ON THE SCIENCE OF ELECTRICAL ENGINEERING (ICSEE), 2016,
  • [45] Big-Data in Climate Change Models - A novel approach with Hadoop MapReduce
    Loaiza, Juan Manuel Carmona
    Giuliani, Graziano
    Fiameni, Giuseppe
    2017 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2017, : 45 - 50
  • [46] Harmony: An Approach for Geo-distributed Processing of Big-Data Applications
    Zhang, Han
    Ramapantulu, Lavanya
    Teo, Yong Meng
    2019 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2019, : 160 - 170
  • [47] Analysis of production cycle-time distribution with a big-data approach
    Tan, Xu
    Xing, Lining
    Cai, Zhaoquan
    Wang, Gaige
    JOURNAL OF INTELLIGENT MANUFACTURING, 2020, 31 (08) : 1889 - 1897
  • [48] Searching String in Big-Data: A Better Approach by Applied Machine Learning
    Singh P.N.
    Gowdar T.P.
    SN Computer Science, 2021, 2 (3)
  • [49] Regional Ways of Seeing: A Big-Data Approach for Measuring Ancient Visualscapes
    Susmann, Natalie M.
    ADVANCES IN ARCHAEOLOGICAL PRACTICE, 2020, 8 (02): : 174 - 191
  • [50] Industrial Symbiosis: Exploring Big-data Approach for Waste Stream Discovery
    Song, Bin
    Yeo, Zhiquan
    Kohls, Paul
    Herrmann, Christoph
    24TH CIRP CONFERENCE ON LIFE CYCLE ENGINEERING, 2017, 61 : 353 - 358