Big Data Quality Scoring for Structured Data Using MapReduce

被引:0
|
作者
Wu, Yalong [1 ]
Dhamodharan, Shalini [1 ]
Ghattamaneni, Vinuthna [1 ]
Kokila, Narmada [1 ]
Pathakamuri, Chandrika [1 ]
Carter, Timothy [1 ]
Tian, Pu [2 ]
Sha, Kewei [3 ]
机构
[1] Univ Houston Clear Lake, Dept Comp Sci, Houston, TX 77058 USA
[2] Stockton Univ, Sch Business, Galloway, NJ 08205 USA
[3] Univ North Texas, Dept Informat Sci, Denton, TX 76203 USA
关键词
Big data quality scoring; MapReduce; data quality dimensions; standard normalization; aggregate data quality score;
D O I
10.1109/NANO61778.2024.10637520
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In the current big data landscape, where data forms the cornerstone of myriad applications, it is crucial to establish the reliability and credibility of application outcomes through the prism of high-quality data. Nonetheless, data quality has been facing significant evaluation challenges due to the exponential increase in data volume and diversity. This paper introduces a novel big data quality scoring (BDQS) model, which is particularly designed for assessing the quality of large-scale datasets within the Hadoop MapReduce ecosystem. Unlike other models that either focus on smaller datasets or rely on sampling techniques, BDQS excels in providing comprehensive data quality assessment for substantial data sources. Specifically, BDQS identifies accuracy, completeness, consistency, timeliness, and correlation as critical dimensions of data quality, scores each dimension on a scale of 0 to 100, and derives an aggregate data quality score through binomial testing and standard normalization of these scores. This research advances a potent model for big data quality assessment and offers valuable insights for enhancing the reliability and applicability of large-scale datasets across various sectors.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Clustering on Big Data Using Hadoop MapReduce
    Akthar, Nadeem
    Ahamad, Mohd Vasim
    Khan, Shahbaz
    2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN), 2015, : 789 - 795
  • [2] A MapReduce-based scalable discovery and indexing of structured big data
    Singh, Hari
    Bawa, Seema
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2017, 73 : 32 - 43
  • [3] MapReduce Clustering for Big Data
    Ghattas, Badih
    Pinto, Antoine
    Diao, Sambou
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 5116 - 5124
  • [4] Challenges for MapReduce in Big Data
    Grolinger, Katarina
    Hayes, Michael
    Higashino, Wilson A.
    L'Heureux, Alexandra
    Allison, David S.
    Capretz, Miriam A. M.
    2014 IEEE WORLD CONGRESS ON SERVICES (SERVICES), 2014, : 182 - 189
  • [5] MapReduce: Simplified Data Analysis of Big Data
    Maitrey, Seema
    Jha, C. K.
    3RD INTERNATIONAL CONFERENCE ON RECENT TRENDS IN COMPUTING 2015 (ICRTC-2015), 2015, 57 : 563 - 571
  • [6] Big Data Analysis Solutions using MapReduce Framework
    Elagib, Sara B.
    Najeeb, Atahur Rahman
    Hashim, Aisha H.
    Olanrewaju, Rashidah F.
    2014 INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION ENGINEERING (ICCCE), 2014, : 127 - 130
  • [7] Matrix Multiplication of Big Data Using MapReduce: A Review
    Qasem, Mais Haj
    Abu Sarhan, Alaa
    Qaddoura, Raneem
    Mahafzah, Basel A.
    PROCEEDINGS OF 2017 2ND INTERNATIONAL CONFERENCE ON THE APPLICATIONS OF INFORMATION TECHNOLOGY IN DEVELOPING RENEWABLE ENERGY PROCESSES & SYSTEMS (IT-DREPS 2017), 2017,
  • [8] An Approach in Big Data Analytics to Improve the Velocity of Unstructured Data Using MapReduce
    Sundarakumar, M. R.
    Mahadevan, G.
    Somula, Ramasubbareddy
    Sennan, Sankar
    Rawal, Bharat S.
    INTERNATIONAL JOURNAL OF SYSTEM DYNAMICS APPLICATIONS, 2021, 10 (04)
  • [9] Analysis of the Big Data based on MapReduce
    Tian, Zi-de
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON AUTOMATION, MECHANICAL CONTROL AND COMPUTATIONAL ENGINEERING, 2015, 124 : 224 - 228
  • [10] MapReduce Algorithms for Big Data Analysis
    Shim, Kyuseok
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (12): : 2016 - 2017