A Structured Approach Towards Big Data Identification

被引:2
|
作者
Ahmed, Hameeza [1 ]
Ismail, Muhammad Ali [1 ]
机构
[1] NED Univ Engn & Technol, Dept Comp & Informat Syst Engn, Karachi 75270, Sindh, Pakistan
关键词
Big Data; Hardware; Complexity theory; Real-time systems; Personnel; Optimization; Mathematical models; Big data; identification; 3Vs; offloading; mathematical equations; DATA ANALYTICS; BENCHMARK SUITE; DATA CHALLENGES; MAPREDUCE; INTERNET; IOT; FRAMEWORK; SYSTEMS; THINGS; TECHNOLOGIES;
D O I
10.1109/TBDATA.2021.3139069
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Big data is a "relative " concept. It is the combination of data, application, and platform properties. The term big data has been used with almost every problem involving large size, real time, and heterogeneous data. However, these data attributes are not enough to identify big data by ignoring the application and platform properties for finding processing thresholds. The equivocated identification of big data can lead to an inefficient use of optimization techniques, resulting into global inefficiency, reduced system performance, increasing power consumption, requiring greater effort on the part of the programming team, and misallocation of the hardware resources required for the task. In this regard, a structured approach has been presented for identification of big data. The approach is based on three equations that categorize the Volume, Velocity, and Variety characteristics by relating data, application, and platform properties. The 3Vs identification is necessary for enabling the relevant optimization techniques. In addition to 3Vs identification, it is required to discriminate whether the big data is due to 1V, 2Vs or 3Vs, as the involvement of more Vs increases the problem complexity. In this regard, the classification of big data into strong, moderate or weak level has been proposed . To evaluate the proposed methods, a set of well-known applications have been experimented and categorized, depicting a saving of up to 58% main memory and 44% disk reads, as well as prescribing lower clock rate, lesser cores, sequential programming, and non adaptive processing & storage formats. Moreover, four case studies reported as big data have been analyzed according to the proposed system. The proposed method is able to categorize two case studies as weak low big data presenting only volume, the third case is weak medium due to velocity, whereas in the fourth case no V is involved. Also, the proposed equations reduce the computation and human resources up to 75% of Spark cluster execution. In this manner, the proposed work can save the unnecessary investments by relevant prescriptions. Furthermore, the proposed equations can be integrated into different tools for assisting selective offloading of big data workloads to appropriate software and hardware solutions.
引用
收藏
页码:147 / 159
页数:13
相关论文
共 50 条
  • [21] THE CHALLENGES OF DOING CRIMINOLOGY IN THE BIG DATA ERA: TOWARDS A DIGITAL AND DATA-DRIVEN APPROACH
    Smith, Gavin J. D.
    Moses, Lyria Bennett
    Chan, Janet
    BRITISH JOURNAL OF CRIMINOLOGY, 2017, 57 (02): : 259 - 274
  • [22] Big data approach towards the characterization of normal peripheral immune cells with data from ImmPort
    Andorf, Sandra
    Bollyky, Jennifer
    Bhattacharya, Sanchita
    Shankar, Ravi
    Dunn, Patrick
    Thomson, Elizabeth
    Wiser, Jeffrey
    Butte, Atul
    JOURNAL OF IMMUNOLOGY, 2014, 192
  • [23] Big data for Design Options Repository: Towards a DFMA approach for offsite construction
    Gbadamosi, Abdul-Quayyum
    Oyedele, Lukumon
    Mahamadu, Abdul-Majeed
    Kusimo, Habeeb
    Bilal, Muhammad
    Delgado, Juan Manuel Davila
    Muhammed-Yakubu, Naimah
    AUTOMATION IN CONSTRUCTION, 2020, 120
  • [24] Towards a Model-Driven Approach for Big Data Analytics in the Genomics Field
    Fernandes, Ana Xavier
    Ferreira, Filipa
    Leon, Ana
    Santos, Maribel Yasmina
    ADVANCES IN CONCEPTUAL MODELING, CMLS, EMPER AND JUSMOD, 2022, 13650 : 5 - 14
  • [25] Towards the Evaluation of a Big Data-as-a-Service Model: A Decision Theoretic Approach
    Skourletopoulos, Georgios
    Mavromoustakis, Constandinos X.
    Mastorakis, George
    Pallis, Evangelos
    Chatzimisios, Periklis
    Mongay Batalla, Jordi
    2016 IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (INFOCOM WKSHPS), 2016,
  • [26] A Decision Tree Based Approach Towards Adaptive Modeling of Big Data Applications
    Giannakopoulos, Ioannis
    Tsoumakos, Dimitrios
    Koziris, Nectarios
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 163 - 172
  • [27] Towards Big Data Bayesian Network Learning - an Ensemble Learning Based Approach
    Tang, Yan
    Wang, Yu
    Li, Ling
    Cooper, Kendra M. L.
    2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS), 2014, : 355 - 357
  • [28] Big Data Quality Scoring for Structured Data Using MapReduce
    Wu, Yalong
    Dhamodharan, Shalini
    Ghattamaneni, Vinuthna
    Kokila, Narmada
    Pathakamuri, Chandrika
    Carter, Timothy
    Tian, Pu
    Sha, Kewei
    2024 33RD INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS AND NETWORKS, ICCCN 2024, 2024,
  • [29] Towards Semantification of Big Data Technology
    Mami, Mohamed Nadjib
    Scerri, Simon
    Auer, Soeren
    Vidal, Maria-Esther
    BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, DAWAK 2016, 2016, 9829 : 376 - 390
  • [30] Visual analytics towards big data
    Ren, Lei
    Du, Yi
    Ma, Shuai
    Zhang, Xiao-Long
    Dai, Guo-Zhong
    Ruan Jian Xue Bao/Journal of Software, 2014, 25 (09): : 1909 - 1936