A Middleware for Managing Big-Data Flows

被引:0
|
作者
Gupta, Rajeev [1 ]
Gupta, Himanshu [1 ]
Gupta, Sanjeev [2 ]
Padmanabhan, Sriram [2 ]
机构
[1] IBM Res, New Delhi, India
[2] IBM Software Grp, San Jose, CA USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Hadoop is being used for various diverse kinds of applications over diverse kinds of data. This makes developing and managing data flows over Hadoop MapReduce a complex task. Various scripting languages such as Hive, Pig, Jaql, etc., have been developed to hide the complexity of MapReduce applications from the user. But, even these high level query languages can get complex over-time and it is a non-trivial task even for a user proficient in these languages to develop, debug, and maintain these scripts. This paper presents a middleware for developing and maintaining MapReduce data flows. This middleware can be used to Extract data from diverse data sources, Load it into distributed file system, and Transform in a format which can be easily analyzed by the subsequent systems in a user friendly manner. MetaOperators are the backbone of our middleware. Using MetaOperators one can express a data-flow only by specifying the relevant inputs rather than worrying about data schema and the query syntax. A data-flow written using such MetaOperators localizes schema specific parts of the query to the MetaOperator parameters making the flow easier to develop, debug, and maintain. Using these MetaOperators we show how one can express operations over hierarchical as well as flat data in a similar manner, track data schema as it flows through the operators, and add a drag-and-drop GUI layer on top of this framework. This brings MapReduce application development in the realm of middle management.
引用
收藏
页码:410 / 424
页数:15
相关论文
共 50 条
  • [1] A "big-data" platform, managing the clinical data & workflows and facilitating clinical research
    Persoon, L.
    Kooy, H.
    Van der Kruijssen, F.
    Doosje, J. W.
    Wolfgang
    RADIOTHERAPY AND ONCOLOGY, 2018, 127 : S596 - S596
  • [2] On the role of message broker middleware for many-task computing on a big-data platform
    Cao Ngoc Nguyen
    Lee, Jaehwan
    Hwang, Soonwook
    Kim, Jik-Soo
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 1): : 2527 - 2540
  • [3] On the role of message broker middleware for many-task computing on a big-data platform
    Cao Ngoc Nguyen
    Jaehwan Lee
    Soonwook Hwang
    Jik-Soo Kim
    Cluster Computing, 2019, 22 : 2527 - 2540
  • [4] Big-Data Visualization
    Keim, Daniel
    Qu, Huamin
    Ma, Kwan-Liu
    IEEE COMPUTER GRAPHICS AND APPLICATIONS, 2013, 33 (04) : 20 - 21
  • [5] Mode Construction of "Managing the Real Estate Tax by Information" in the Big-data Times
    Zhang, Fu-qiang
    Yi, Jia-chun
    3RD INTERNATIONAL CONFERENCE ON ECONOMICS AND MANAGEMENT (ICEM 2016), 2016, : 371 - 375
  • [6] Neurotrauma as a big-data problem
    Huie, J. Russell
    Almeida, Carlos A.
    Ferguson, Adam R.
    CURRENT OPINION IN NEUROLOGY, 2018, 31 (06) : 702 - 708
  • [7] BigCache for Big-data Systems
    Roger, Michel Angelo
    Xu, Yiqi
    Zhao, Ming
    2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2014, : 189 - 194
  • [8] 'Big-Data' in dermatological research
    Kaliyadan, Feroze
    Chatterjee, Kingshuk
    INDIAN JOURNAL OF DERMATOLOGY VENEREOLOGY & LEPROLOGY, 2024, 90 (03): : 342 - 344
  • [9] Lessons for big-data projects
    Birney, Ewan
    NATURE, 2012, 489 (7414) : 49 - 51
  • [10] Lessons for big-data projects
    Ewan Birney
    Nature, 2012, 489 : 49 - 51