Experiences of Converging Big Data Analytics Frameworks with High Performance Computing Systems

被引:2
|
作者
Cheng, Peng [1 ,2 ]
Lu, Yutong [3 ]
Du, Yunfei [3 ]
Chen, Zhiguang [1 ,2 ]
机构
[1] Natl Univ Def Technol, Coll Comp, Changsha, Peoples R China
[2] State Key Lab High Performance Comp, Changsha, Peoples R China
[3] Natl Supercomp Ctr Guangzhou NSCC GZ, Guangzhou, Peoples R China
来源
基金
国家重点研发计划;
关键词
High performance computing; Big data; Convergence; File system; Hadoop;
D O I
10.1007/978-3-319-69953-0_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rapid development of big data analytics frameworks, many existing high performance computing (HPC) facilities are evolving new capabilities to support big data analytics workloads. However, due to the different workload characteristics and optimization objectives of system architectures, migrating data-intensive applications to HPC systems that are geared for traditional compute-intensive applications presents a new challenge. In this paper, we address a critical question on how to accelerate complex application that contains both data-intensive and compute-intensive workloads on the Tianhe-2 system by deploying an in-memory file system as data access middleware; we characterize the impact of storage architecture on data-intensive MapReduce workloads when using Lustre as the underlying file system. Based on our characterization and findings of the performance behaviors, we propose shared map output shuffle strategy and file metadata cache layer to alleviate the impact of metadata bottleneck. The evaluation of these optimization techniques shows up to 17% performance benefit for data-intensive workloads.
引用
收藏
页码:90 / 106
页数:17
相关论文
共 50 条
  • [41] Application of Big Data Analytics via Cloud Computing
    Yetis, Yunus
    Sara, Ruthvik Goud
    Erol, Berat A.
    Kaplan, Halid
    Akuzum, Abdurrahman
    Jamshidi, Mo
    2016 WORLD AUTOMATION CONGRESS (WAC), 2016,
  • [42] FogGIS: Fog Computing for Geospatial Big Data Analytics
    Barik, Rabindra K.
    Dubey, Harishchandra
    Samaddar, Arun B.
    Gupta, Rajan D.
    Ray, Prakash K.
    2016 IEEE UTTAR PRADESH SECTION INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND ELECTRONICS ENGINEERING (UPCON), 2016, : 613 - 618
  • [43] Cloud Computing Platforms for Big Data Adoption and Analytics
    Hussain, Mohammad Jabed
    Alsadie, Deafallah
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2022, 22 (02): : 290 - 296
  • [44] An empirical study of cloud computing and big data analytics
    Al-Shawakfa E.
    Alsghaier H.
    Al-Shawakfa, Emad (shawakfa@yu.edu.jo), 2018, Inderscience Publishers, 29, route de Pre-Bois, Case Postale 856, CH-1215 Geneva 15, CH-1215, Switzerland (09) : 180 - 188
  • [45] Fog Computing: An Overview of Big IoT Data Analytics
    Anawar, Muhammad Rizwan
    Wang, Shangguang
    Zia, Muhammad Azam
    Jadoon, Ahmer Khan
    Akram, Umair
    Raza, Salman
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2018,
  • [46] A CLOUD COMPUTING SOLUTION FOR BIG IMAGERY DATA ANALYTICS
    Huang, Yan
    Gao, Peng
    Zhang, Yongjun
    Zhang, Jie
    2018 INTERNATIONAL WORKSHOP ON BIG GEOSPATIAL DATA AND DATA SCIENCE (BGDDS 2018), 2018,
  • [47] Big Data with Integrated Cloud Computing For Healthcare Analytics
    Jangade, Rajesh
    Chauhan, Ritu
    PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 4068 - 4071
  • [48] A Distributed Computing Platform for fMRI Big Data Analytics
    Makkie, Milad
    Li, Xiang
    Quinn, Shannon
    Lin, Binbin
    Ye, Jieping
    Mon, Geoffrey
    Liu, Tianming
    IEEE TRANSACTIONS ON BIG DATA, 2019, 5 (02) : 109 - 119
  • [49] Special issue on big data computing, analytics and applications
    Chenren Xu
    Zhu Han
    Yanyong Zhang
    Lan Zhang
    Personal and Ubiquitous Computing, 2017, 21 : 1 - 3
  • [50] Big data solutions for CMS computing monitoring and analytics
    Ariza-Porras, Christian
    Kuznetsov, Valentin
    Legger, Federica
    24TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP 2019), 2020, 245