A Middleware for Managing Big-Data Flows

被引:0
|
作者
Gupta, Rajeev [1 ]
Gupta, Himanshu [1 ]
Gupta, Sanjeev [2 ]
Padmanabhan, Sriram [2 ]
机构
[1] IBM Res, New Delhi, India
[2] IBM Software Grp, San Jose, CA USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Hadoop is being used for various diverse kinds of applications over diverse kinds of data. This makes developing and managing data flows over Hadoop MapReduce a complex task. Various scripting languages such as Hive, Pig, Jaql, etc., have been developed to hide the complexity of MapReduce applications from the user. But, even these high level query languages can get complex over-time and it is a non-trivial task even for a user proficient in these languages to develop, debug, and maintain these scripts. This paper presents a middleware for developing and maintaining MapReduce data flows. This middleware can be used to Extract data from diverse data sources, Load it into distributed file system, and Transform in a format which can be easily analyzed by the subsequent systems in a user friendly manner. MetaOperators are the backbone of our middleware. Using MetaOperators one can express a data-flow only by specifying the relevant inputs rather than worrying about data schema and the query syntax. A data-flow written using such MetaOperators localizes schema specific parts of the query to the MetaOperator parameters making the flow easier to develop, debug, and maintain. Using these MetaOperators we show how one can express operations over hierarchical as well as flat data in a similar manner, track data schema as it flows through the operators, and add a drag-and-drop GUI layer on top of this framework. This brings MapReduce application development in the realm of middle management.
引用
收藏
页码:410 / 424
页数:15
相关论文
共 50 条
  • [11] ARE YOU READY FOR BIG DATA? GOVERNANCE IN BIG-DATA RESEARCH
    Scheepers, Floortje E.
    Deschamps, Peter
    JOURNAL OF THE AMERICAN ACADEMY OF CHILD AND ADOLESCENT PSYCHIATRY, 2016, 55 (10): : S309 - S309
  • [12] A Data Reconstruction Method for The Big-Data Analysis
    Mito, Masataka
    Murata, Kenya
    Eguchi, Daisuke
    Mori, Yuichiro
    Toyonaga, Masahiko
    2018 9TH INTERNATIONAL CONFERENCE ON AWARENESS SCIENCE AND TECHNOLOGY (ICAST), 2018, : 319 - 323
  • [13] A big-data analytics method for capturing visitor activities and flows: the case of an island country
    Miah, Shah Jahan
    HuyQuan Vu
    Gammack, John
    INFORMATION TECHNOLOGY & MANAGEMENT, 2019, 20 (04): : 203 - 221
  • [14] Interminable Flows: A Generic, Joint, Customizable Resiliency Model for Big-Data Streaming Platforms
    Abusalah, Bara
    Qadah, Thamir M.
    Stephen, Julian James
    Eugster, Patrick
    IEEE ACCESS, 2023, 11 : 10762 - 10776
  • [15] A big-data analytics method for capturing visitor activities and flows: the case of an island country
    Shah Jahan Miah
    HuyQuan Vu
    John Gammack
    Information Technology and Management, 2019, 20 : 203 - 221
  • [16] Voter Privacy and Big-Data Elections
    Judge, Elizabeth F.
    Pal, Michael
    OSGOODE HALL LAW JOURNAL, 2021, 58 (01): : 1 - 55
  • [17] BIG-DATA VISUALIZATION FOR TRANSLATIONAL NEUROTRAUMA
    Nielson, Jessica
    Inoue, Tomoo
    Paquette, Jesse
    Lin, Amity
    Sacramento, Jeffrey
    Liu, Aiwen W.
    Guandique, Cristian F.
    Irvine, Karen-Amanda
    Gensel, John C.
    Beattie, Michael S.
    Bresnahan, Jacqueline C.
    Manley, Geoffrey T.
    Carlsson, Gunnar
    Lum, Pek Yee
    Ferguson, Adam R.
    JOURNAL OF NEUROTRAUMA, 2013, 30 (15) : A61 - A62
  • [19] A Minimax Approach for Classification with Big-data
    Krishnan, R.
    Jagannathan, S.
    Samaranayake, V. A.
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 1437 - 1444
  • [20] Persisting big-data: The NoSQL landscape
    Corbellini, Alejandro
    Mateos, Cristian
    Zunino, Alejandro
    Godoy, Daniela
    Schiaffino, Silvia
    INFORMATION SYSTEMS, 2017, 63 : 1 - 23