Big Data Quality Scoring for Structured Data Using MapReduce

被引:0
|
作者
Wu, Yalong [1 ]
Dhamodharan, Shalini [1 ]
Ghattamaneni, Vinuthna [1 ]
Kokila, Narmada [1 ]
Pathakamuri, Chandrika [1 ]
Carter, Timothy [1 ]
Tian, Pu [2 ]
Sha, Kewei [3 ]
机构
[1] Univ Houston Clear Lake, Dept Comp Sci, Houston, TX 77058 USA
[2] Stockton Univ, Sch Business, Galloway, NJ 08205 USA
[3] Univ North Texas, Dept Informat Sci, Denton, TX 76203 USA
关键词
Big data quality scoring; MapReduce; data quality dimensions; standard normalization; aggregate data quality score;
D O I
10.1109/NANO61778.2024.10637520
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In the current big data landscape, where data forms the cornerstone of myriad applications, it is crucial to establish the reliability and credibility of application outcomes through the prism of high-quality data. Nonetheless, data quality has been facing significant evaluation challenges due to the exponential increase in data volume and diversity. This paper introduces a novel big data quality scoring (BDQS) model, which is particularly designed for assessing the quality of large-scale datasets within the Hadoop MapReduce ecosystem. Unlike other models that either focus on smaller datasets or rely on sampling techniques, BDQS excels in providing comprehensive data quality assessment for substantial data sources. Specifically, BDQS identifies accuracy, completeness, consistency, timeliness, and correlation as critical dimensions of data quality, scores each dimension on a scale of 0 to 100, and derives an aggregate data quality score through binomial testing and standard normalization of these scores. This research advances a potent model for big data quality assessment and offers valuable insights for enhancing the reliability and applicability of large-scale datasets across various sectors.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] Incremental attribute reduction algorithm for big data using MapReduce
    Lv, Ping
    Qian, Jin
    Yue, Xiaodong
    JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2016, 16 (03) : 641 - 652
  • [22] Event Segmentation using MapReduce based Big Data Clustering
    Shafiq, M. Omair
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 1857 - 1866
  • [23] Parallel knowledge acquisition algorithms for big data using MapReduce
    Qian, Jin
    Xia, Min
    Yue, Xiaodong
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2018, 9 (06) : 1007 - 1021
  • [24] Budget Constraint Scheduler for Big Data Using Hadoop MapReduce
    Vinutha D.C.
    Raju G.T.
    SN Computer Science, 2021, 2 (4)
  • [25] Feature Selection and Classification of Big Data Using MapReduce Framework
    Devi, D. Renuka
    Sasikala, S.
    INTELLIGENT COMPUTING, INFORMATION AND CONTROL SYSTEMS, ICICCS 2019, 2020, 1039 : 666 - 673
  • [26] On the use of MapReduce for imbalanced big data using Random Forest
    del Rio, Sara
    Lopez, Victoria
    Manuel Benitez, Jose
    Herrera, Francisco
    INFORMATION SCIENCES, 2014, 285 : 112 - 137
  • [27] Dache: A Data Aware Caching for Big-Data Applications Using The MapReduce Framework
    Zhao, Yaxiong
    Wu, Jie
    2013 PROCEEDINGS IEEE INFOCOM, 2013, : 35 - 39
  • [28] Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Framework
    Zhao, Yaxiong
    Wu, Jie
    Liu, Cong
    TSINGHUA SCIENCE AND TECHNOLOGY, 2014, 19 (01) : 39 - 50
  • [29] Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Framework
    Yaxiong Zhao
    Jie Wu
    Cong Liu
    TsinghuaScienceandTechnology, 2014, 19 (01) : 39 - 50
  • [30] Dache: A data aware caching for big-data applications using the MapReduce framework
    Zhao, Y. (yaxiongzhao@google.com), 1600, Tsinghua University (19):