Experiences of Converging Big Data Analytics Frameworks with High Performance Computing Systems

被引:2
|
作者
Cheng, Peng [1 ,2 ]
Lu, Yutong [3 ]
Du, Yunfei [3 ]
Chen, Zhiguang [1 ,2 ]
机构
[1] Natl Univ Def Technol, Coll Comp, Changsha, Peoples R China
[2] State Key Lab High Performance Comp, Changsha, Peoples R China
[3] Natl Supercomp Ctr Guangzhou NSCC GZ, Guangzhou, Peoples R China
来源
基金
国家重点研发计划;
关键词
High performance computing; Big data; Convergence; File system; Hadoop;
D O I
10.1007/978-3-319-69953-0_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rapid development of big data analytics frameworks, many existing high performance computing (HPC) facilities are evolving new capabilities to support big data analytics workloads. However, due to the different workload characteristics and optimization objectives of system architectures, migrating data-intensive applications to HPC systems that are geared for traditional compute-intensive applications presents a new challenge. In this paper, we address a critical question on how to accelerate complex application that contains both data-intensive and compute-intensive workloads on the Tianhe-2 system by deploying an in-memory file system as data access middleware; we characterize the impact of storage architecture on data-intensive MapReduce workloads when using Lustre as the underlying file system. Based on our characterization and findings of the performance behaviors, we propose shared map output shuffle strategy and file metadata cache layer to alleviate the impact of metadata bottleneck. The evaluation of these optimization techniques shows up to 17% performance benefit for data-intensive workloads.
引用
收藏
页码:90 / 106
页数:17
相关论文
共 50 条
  • [21] Big data analytics in Cloud computing: an overview
    Berisha, Blend
    Meziu, Endrit
    Shabani, Isak
    JOURNAL OF CLOUD COMPUTING-ADVANCES SYSTEMS AND APPLICATIONS, 2022, 11 (01):
  • [22] THE APLICATION OF BIG DATA ANALYTICS AND HIGH PERFORMANCE COMPUTING TO DELIVER INSTANT INSIGHT ON LONGITUDINAL DISEASE MANAGEMENT
    Khosla, S.
    VALUE IN HEALTH, 2014, 17 (03) : A189 - A190
  • [23] Toward High-Performance Computing and Big Data Analytics Convergence: The Case of Spark-DIY
    Caino-Lores, Silvina
    Carretero, Jesus
    Nicolae, Bogdan
    Yildiz, Orcun
    Peterka, Tom
    IEEE ACCESS, 2019, 7 : 156929 - 156955
  • [24] Network computing and applications for Big Data analytics
    Abawajy, Jemal H.
    Zomaya, Albert Y.
    Stojmenovic, Ivan
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2016, 59 : 361 - 361
  • [25] Distributed Big Data Analytics in Service Computing
    Yu, Weider D.
    Gottumukkala, AvinashChander
    Senthailselvi, Deenash Arivazhagan
    Maniraj, Prabhu
    Khonde, Tushar
    2017 IEEE 13TH INTERNATIONAL SYMPOSIUM ON AUTONOMOUS DECENTRALIZED SYSTEMS (ISADS 2017), 2017, : 55 - 60
  • [26] Challenges of Cloud Computing & Big Data Analytics
    Gupta, Anita
    Mehrotra, Abhay
    Khan, P. M.
    2015 2ND INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 2015, : 1112 - 1115
  • [27] Big data analytics in Cloud computing: an overview
    Blend Berisha
    Endrit Mëziu
    Isak Shabani
    Journal of Cloud Computing, 11
  • [28] Big data analytics in Cloud computing: an overview
    Berisha, Blend
    Mëziu, Endrit
    Shabani, Isak
    Journal of Cloud Computing, 2022, 11 (01)
  • [29] Quantum Computing in Big Data Analytics: A Survey
    Shaikh, Tawseef Ayoub
    Ali, Rashid
    2016 IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (CIT), 2016, : 112 - 115
  • [30] Can Big Data Analytics Enhance Performance Measurement Systems?
    Mello R.
    Martins R.A.
    IEEE Engineering Management Review, 2019, 47 (01): : 52 - 57