Big Data Quality Scoring for Structured Data Using MapReduce

被引:0
|
作者
Wu, Yalong [1 ]
Dhamodharan, Shalini [1 ]
Ghattamaneni, Vinuthna [1 ]
Kokila, Narmada [1 ]
Pathakamuri, Chandrika [1 ]
Carter, Timothy [1 ]
Tian, Pu [2 ]
Sha, Kewei [3 ]
机构
[1] Univ Houston Clear Lake, Dept Comp Sci, Houston, TX 77058 USA
[2] Stockton Univ, Sch Business, Galloway, NJ 08205 USA
[3] Univ North Texas, Dept Informat Sci, Denton, TX 76203 USA
关键词
Big data quality scoring; MapReduce; data quality dimensions; standard normalization; aggregate data quality score;
D O I
10.1109/NANO61778.2024.10637520
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In the current big data landscape, where data forms the cornerstone of myriad applications, it is crucial to establish the reliability and credibility of application outcomes through the prism of high-quality data. Nonetheless, data quality has been facing significant evaluation challenges due to the exponential increase in data volume and diversity. This paper introduces a novel big data quality scoring (BDQS) model, which is particularly designed for assessing the quality of large-scale datasets within the Hadoop MapReduce ecosystem. Unlike other models that either focus on smaller datasets or rely on sampling techniques, BDQS excels in providing comprehensive data quality assessment for substantial data sources. Specifically, BDQS identifies accuracy, completeness, consistency, timeliness, and correlation as critical dimensions of data quality, scores each dimension on a scale of 0 to 100, and derives an aggregate data quality score through binomial testing and standard normalization of these scores. This research advances a potent model for big data quality assessment and offers valuable insights for enhancing the reliability and applicability of large-scale datasets across various sectors.
引用
收藏
页数:6
相关论文
共 50 条
  • [11] MapReduce Research on Warehousing of Big Data
    Pticek, M.
    Vrdoljak, B.
    2017 40TH INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2017, : 1361 - 1366
  • [12] MapReduce Algorithms for Big Data Analysis
    Shim, Kyuseok
    DATABASES THEORY AND APPLICATIONS, ADC 2018, 2018, 10837 : XV - XV
  • [13] Prominence of MapReduce in BIG DATA Processing
    Pandey, Shweta
    Tokekar, Vrinda
    2014 FOURTH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT), 2014, : 555 - 560
  • [14] A FAST BIG DATA COLLECTION SYSTEM USING MAPREDUCE FRAMEWORK
    Bing, Li
    Chan, Keith C. C.
    2014 IEEE 3RD INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2014, : 530 - 535
  • [15] Hierarchical attribute reduction algorithms for big data using MapReduce
    Qian, Jin
    Lv, Ping
    Yue, Xiaodong
    Liu, Caihui
    Jing, Zhengjun
    KNOWLEDGE-BASED SYSTEMS, 2015, 73 : 18 - 31
  • [16] A Crowdsourcing Worker Quality Evaluation Algorithm on MapReduce for Big Data Applications
    Dang, Depeng
    Liu, Ying
    Zhang, Xiaoran
    Huang, Shihang
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2016, 27 (07) : 1879 - 1888
  • [17] Handling Big Data Using MapReduce Over Hybrid Cloud
    Saxena, Ankur
    Chaurasia, Ankur
    Kaushik, Neeraj
    Kaushik, Nidhi
    INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING AND COMMUNICATIONS, VOL 2, 2019, 56 : 135 - 144
  • [18] PARALLEL KNOWLEDGE ACQUISITION ALGORITHM FOR BIG DATA USING MAPREDUCE
    Qian, Jin
    Xia, Min
    Lv, Ping
    PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOL. 1, 2015, : 316 - 321
  • [19] Parallel knowledge acquisition algorithms for big data using MapReduce
    Jin Qian
    Min Xia
    Xiaodong Yue
    International Journal of Machine Learning and Cybernetics, 2018, 9 : 1007 - 1021
  • [20] Improved CURE Clustering for Big Data using Hadoop and Mapreduce
    Lathiya, Piyush
    Rani, Rinkle
    2016 INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT), VOL 3, 2015, : 241 - 245