Big Data Quality Scoring for Structured Data Using MapReduce

被引:0
|
作者
Wu, Yalong [1 ]
Dhamodharan, Shalini [1 ]
Ghattamaneni, Vinuthna [1 ]
Kokila, Narmada [1 ]
Pathakamuri, Chandrika [1 ]
Carter, Timothy [1 ]
Tian, Pu [2 ]
Sha, Kewei [3 ]
机构
[1] Univ Houston Clear Lake, Dept Comp Sci, Houston, TX 77058 USA
[2] Stockton Univ, Sch Business, Galloway, NJ 08205 USA
[3] Univ North Texas, Dept Informat Sci, Denton, TX 76203 USA
关键词
Big data quality scoring; MapReduce; data quality dimensions; standard normalization; aggregate data quality score;
D O I
10.1109/NANO61778.2024.10637520
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In the current big data landscape, where data forms the cornerstone of myriad applications, it is crucial to establish the reliability and credibility of application outcomes through the prism of high-quality data. Nonetheless, data quality has been facing significant evaluation challenges due to the exponential increase in data volume and diversity. This paper introduces a novel big data quality scoring (BDQS) model, which is particularly designed for assessing the quality of large-scale datasets within the Hadoop MapReduce ecosystem. Unlike other models that either focus on smaller datasets or rely on sampling techniques, BDQS excels in providing comprehensive data quality assessment for substantial data sources. Specifically, BDQS identifies accuracy, completeness, consistency, timeliness, and correlation as critical dimensions of data quality, scores each dimension on a scale of 0 to 100, and derives an aggregate data quality score through binomial testing and standard normalization of these scores. This research advances a potent model for big data quality assessment and offers valuable insights for enhancing the reliability and applicability of large-scale datasets across various sectors.
引用
收藏
页数:6
相关论文
共 50 条
  • [31] Atrak: a MapReduce-based data warehouse for big data
    Barkhordari, Mohammadhossein
    Niamanesh, Mahdi
    JOURNAL OF SUPERCOMPUTING, 2017, 73 (10): : 4596 - 4610
  • [32] Atrak: a MapReduce-based data warehouse for big data
    Mohammadhossein Barkhordari
    Mahdi Niamanesh
    The Journal of Supercomputing, 2017, 73 : 4596 - 4610
  • [33] Reduced Time Compression in Big Data Using MapReduce Approach and Hadoop
    Meena, K.
    Sujatha, J.
    JOURNAL OF MEDICAL SYSTEMS, 2019, 43 (08)
  • [34] A Big Data Prediction Framework for Weather Forecast Using MapReduce Algorithm
    Adam, Khalid
    Majid, Mazlina Abdul
    Fakherldin, Mohammed Adam Ibrahim
    Zain, Jasni Mohamed
    ADVANCED SCIENCE LETTERS, 2017, 23 (11) : 11138 - 11143
  • [35] EMR: Scalable Clustering of Big HR Data using Evolutionary MapReduce
    Bohlouli, Mahdi
    He, Zhonghua
    WEB CONFERENCE 2021: COMPANION OF THE WORLD WIDE WEB CONFERENCE (WWW 2021), 2021, : 26 - 34
  • [36] Optimized big data K-means clustering using MapReduce
    Cui, Xiaoli
    Zhu, Pingfei
    Yang, Xin
    Li, Keqiu
    Ji, Changqing
    JOURNAL OF SUPERCOMPUTING, 2014, 70 (03): : 1249 - 1259
  • [37] On using MapReduce to scale algorithms for Big Data analytics: a case study
    Kijsanayothin, Phongphun
    Chalumporn, Gantaphon
    Hewett, Rattikorn
    JOURNAL OF BIG DATA, 2019, 6 (01)
  • [38] Optimized big data K-means clustering using MapReduce
    Xiaoli Cui
    Pingfei Zhu
    Xin Yang
    Keqiu Li
    Changqing Ji
    The Journal of Supercomputing, 2014, 70 : 1249 - 1259
  • [39] Enabling Big Data Analytics in the Hybrid Cloud using Iterative MapReduce
    Clemente-Castello, Francisco J.
    Nicolae, Bogdan
    Katrinis, Kostas
    Rafique, M. Mustafa
    Mayo, Rafael
    Carlos Fernandez, Juan
    Loreti, Daniela
    2015 IEEE/ACM 8TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC), 2015, : 290 - 299
  • [40] Efficient Querying Distributed Big-XML Data using MapReduce
    Song Kunfang
    Hongwei Lu
    INTERNATIONAL JOURNAL OF GRID AND HIGH PERFORMANCE COMPUTING, 2016, 8 (03) : 70 - 79