A Structured Approach Towards Big Data Identification

被引:2
|
作者
Ahmed, Hameeza [1 ]
Ismail, Muhammad Ali [1 ]
机构
[1] NED Univ Engn & Technol, Dept Comp & Informat Syst Engn, Karachi 75270, Sindh, Pakistan
关键词
Big Data; Hardware; Complexity theory; Real-time systems; Personnel; Optimization; Mathematical models; Big data; identification; 3Vs; offloading; mathematical equations; DATA ANALYTICS; BENCHMARK SUITE; DATA CHALLENGES; MAPREDUCE; INTERNET; IOT; FRAMEWORK; SYSTEMS; THINGS; TECHNOLOGIES;
D O I
10.1109/TBDATA.2021.3139069
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Big data is a "relative " concept. It is the combination of data, application, and platform properties. The term big data has been used with almost every problem involving large size, real time, and heterogeneous data. However, these data attributes are not enough to identify big data by ignoring the application and platform properties for finding processing thresholds. The equivocated identification of big data can lead to an inefficient use of optimization techniques, resulting into global inefficiency, reduced system performance, increasing power consumption, requiring greater effort on the part of the programming team, and misallocation of the hardware resources required for the task. In this regard, a structured approach has been presented for identification of big data. The approach is based on three equations that categorize the Volume, Velocity, and Variety characteristics by relating data, application, and platform properties. The 3Vs identification is necessary for enabling the relevant optimization techniques. In addition to 3Vs identification, it is required to discriminate whether the big data is due to 1V, 2Vs or 3Vs, as the involvement of more Vs increases the problem complexity. In this regard, the classification of big data into strong, moderate or weak level has been proposed . To evaluate the proposed methods, a set of well-known applications have been experimented and categorized, depicting a saving of up to 58% main memory and 44% disk reads, as well as prescribing lower clock rate, lesser cores, sequential programming, and non adaptive processing & storage formats. Moreover, four case studies reported as big data have been analyzed according to the proposed system. The proposed method is able to categorize two case studies as weak low big data presenting only volume, the third case is weak medium due to velocity, whereas in the fourth case no V is involved. Also, the proposed equations reduce the computation and human resources up to 75% of Spark cluster execution. In this manner, the proposed work can save the unnecessary investments by relevant prescriptions. Furthermore, the proposed equations can be integrated into different tools for assisting selective offloading of big data workloads to appropriate software and hardware solutions.
引用
收藏
页码:147 / 159
页数:13
相关论文
共 50 条
  • [1] An Approach Towards Big Data-A Review
    Gupta, Palak
    Tyagi, Nidhi
    2015 INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION & AUTOMATION (ICCCA), 2015, : 118 - 123
  • [2] Towards a Set Theoretical Approach to Big Data Analytics
    Mukkamala, Raghava Rao
    Hussain, Abid
    Vatrapu, Ravi
    2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS), 2014, : 629 - 636
  • [3] Reconfigurable Manufacturing: Towards an industrial Big Data approach
    Arnarson, Halldor
    Bremdal, Bernt Arild
    Solvang, Bjorn
    2022 IEEE/ASME INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT MECHATRONICS (AIM), 2022, : 632 - 637
  • [4] A Structured Approach towards Robust Database Collection for Language Identification
    Deshwal, Deepti
    Sangwan, Pardeep
    Kumar, Divya
    2020 21ST INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT), 2020,
  • [5] Towards Federated Learning Approach to Determine Data Relevance in Big Data
    Doku, Ronald
    Rawat, Danda B.
    Liu, Chunmei
    2019 IEEE 20TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE (IRI 2019), 2019, : 184 - 192
  • [6] A Transformation Approach Towards Big Data Multilabel Decision Trees
    Rivera Rivas, Antonio Jesus
    Charte Ojeda, Francisco
    Javier Pulgar, Francisco
    Jose del Jesus, Maria
    ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2017, PT I, 2017, 10305 : 73 - 84
  • [7] Scalable fuzzy multivariate outliers identification towards big data applications
    Touny, Huda Mohammed
    Moussa, Ahmed Shawky
    Hadi, Ali S.
    APPLIED SOFT COMPUTING, 2024, 155
  • [8] Vehicle Incident Hot Spots Identification: An Approach for Big Data
    Triguero, Isaac
    Figueredo, Grazziela P.
    Mesgarpour, Mohammad
    Garibaldi, Jonathan M.
    John, Robert I.
    2017 16TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS / 11TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA SCIENCE AND ENGINEERING / 14TH IEEE INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS, 2017, : 901 - 908
  • [9] Structured and Unstructured Big Data Analytics
    Misluu, Suyash
    Misra, Anuranjan
    2017 INTERNATIONAL CONFERENCE ON CURRENT TRENDS IN COMPUTER, ELECTRICAL, ELECTRONICS AND COMMUNICATION (CTCEEC), 2017, : 740 - 746
  • [10] Megastore: structured storage for Big Data
    Moscoso Zea, Oswaldo
    ENFOQUE UTE, 2012, 3 (02): : 1 - 12