A Middleware for Managing Big-Data Flows

被引:0
|
作者
Gupta, Rajeev [1 ]
Gupta, Himanshu [1 ]
Gupta, Sanjeev [2 ]
Padmanabhan, Sriram [2 ]
机构
[1] IBM Res, New Delhi, India
[2] IBM Software Grp, San Jose, CA USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Hadoop is being used for various diverse kinds of applications over diverse kinds of data. This makes developing and managing data flows over Hadoop MapReduce a complex task. Various scripting languages such as Hive, Pig, Jaql, etc., have been developed to hide the complexity of MapReduce applications from the user. But, even these high level query languages can get complex over-time and it is a non-trivial task even for a user proficient in these languages to develop, debug, and maintain these scripts. This paper presents a middleware for developing and maintaining MapReduce data flows. This middleware can be used to Extract data from diverse data sources, Load it into distributed file system, and Transform in a format which can be easily analyzed by the subsequent systems in a user friendly manner. MetaOperators are the backbone of our middleware. Using MetaOperators one can express a data-flow only by specifying the relevant inputs rather than worrying about data schema and the query syntax. A data-flow written using such MetaOperators localizes schema specific parts of the query to the MetaOperator parameters making the flow easier to develop, debug, and maintain. Using these MetaOperators we show how one can express operations over hierarchical as well as flat data in a similar manner, track data schema as it flows through the operators, and add a drag-and-drop GUI layer on top of this framework. This brings MapReduce application development in the realm of middle management.
引用
收藏
页码:410 / 424
页数:15
相关论文
共 50 条
  • [31] On the Timed Analysis of Big-Data Applications
    Marconi, Francesco
    Quattrocchi, Giovanni
    Baresi, Luciano
    Bersani, Marcello M.
    Rossi, Matteo
    NASA FORMAL METHODS, NFM 2018, 2018, 10811 : 315 - 332
  • [32] Approximate Incremental Big-Data Harmonization
    Agarwal, Puneet
    Shroff, Gautam
    Malhotra, Pankaj
    2013 IEEE INTERNATIONAL CONGRESS ON BIG DATA, 2013, : 118 - 125
  • [33] Mending the Big-Data Missing Information
    Daltrophe, Hadassa
    Dolcv, ShIomi
    Lotker, Zvi
    2016 IEEE INTERNATIONAL CONFERENCE ON THE SCIENCE OF ELECTRICAL ENGINEERING (ICSEE), 2016,
  • [34] Data Modifications in Blockchain Architecture for Big-Data Processing
    Tulkinbekov, Khikmatullo
    Kim, Deok-Hwan
    SENSORS, 2023, 23 (21)
  • [35] A big-data processing framework for uncertainties in transportation data
    Yang, Jie
    Ma, Jun
    2015 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE 2015), 2015,
  • [36] Becoming data-savvy in a big-data world
    Xu, Meng
    Rhee, Seung Yon
    TRENDS IN PLANT SCIENCE, 2014, 19 (10) : 619 - 622
  • [37] Interpreting big-data analysis of retrospective observational data
    Huizinga, Tom W. J.
    Knevel, Rachel
    LANCET RHEUMATOLOGY, 2020, 2 (11): : E652 - E653
  • [38] Managing confidential data in the gLite middleware
    Scardaci, Diego
    Scuderi, Giordano
    WET ICE 2007: 16TH IEEE INTERNATIONAL WORKSHOPS ON ENABLING TECHNOLOGIES: INFRASTRUCTURE FOR COLLABORATIVE ENTERPRISES, PROCEEDINGS, 2007, : 298 - 299
  • [39] Analysis of Big-Data Based Data Mining Engine
    Huang, Xinxin
    Gong, Shu
    2017 13TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2017, : 164 - 168
  • [40] BBS: A Blockchain Big-Data Sharing System
    Wang, Shan
    Yang, Ming
    Ge, Tingjian
    Luo, Yan
    Fu, Xinwen
    IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2022), 2022, : 4205 - 4210