Big Data Quality Scoring for Structured Data Using MapReduce

被引：0

作者：

Wu, Yalong ^{[1
]}

Dhamodharan, Shalini ^{[1
]}

Ghattamaneni, Vinuthna ^{[1
]}

Kokila, Narmada ^{[1
]}

Pathakamuri, Chandrika ^{[1
]}

Carter, Timothy ^{[1
]}

Tian, Pu ^{[2
]}

Sha, Kewei ^{[3
]}

机构：

[1] Univ Houston Clear Lake, Dept Comp Sci, Houston, TX 77058 USA

[2] Stockton Univ, Sch Business, Galloway, NJ 08205 USA

[3] Univ North Texas, Dept Informat Sci, Denton, TX 76203 USA

来源：

2024 33RD INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS AND NETWORKS, ICCCN 2024 | 2024年

关键词：

Big data quality scoring; MapReduce; data quality dimensions; standard normalization; aggregate data quality score;

D O I：

10.1109/NANO61778.2024.10637520

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In the current big data landscape, where data forms the cornerstone of myriad applications, it is crucial to establish the reliability and credibility of application outcomes through the prism of high-quality data. Nonetheless, data quality has been facing significant evaluation challenges due to the exponential increase in data volume and diversity. This paper introduces a novel big data quality scoring (BDQS) model, which is particularly designed for assessing the quality of large-scale datasets within the Hadoop MapReduce ecosystem. Unlike other models that either focus on smaller datasets or rely on sampling techniques, BDQS excels in providing comprehensive data quality assessment for substantial data sources. Specifically, BDQS identifies accuracy, completeness, consistency, timeliness, and correlation as critical dimensions of data quality, scores each dimension on a scale of 0 to 100, and derives an aggregate data quality score through binomial testing and standard normalization of these scores. This research advances a potent model for big data quality assessment and offers valuable insights for enhancing the reliability and applicability of large-scale datasets across various sectors.

引用

页数：6

共 50 条

[41] Evolving Big Data Stream Classification with MapReduce
Haque, Ahsanul
Parker, Brandon
Khan, Latifur
Thuraisingham, Bhavani
2014 IEEE 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD), 2014, : 570 - 577
[42] A Mapreduce Fuzzy Techniques of Big Data Classification
El Bakry, Malak
Safwat, Soha
Hegazy, Osman
PROCEEDINGS OF THE 2016 SAI COMPUTING CONFERENCE (SAI), 2016, : 118 - 128
[43] Design of MapReduce and CTA for Big Data System
Kim, Earl
Shin, Dong-ryeol
2015 INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND ARTIFICIAL INTELLIGENCE (CAAI 2015), 2015, : 294 - 297
[44] Big data pre-processing methods with vehicle driving data using MapReduce techniques
Wonhee Cho
Eunmi Choi
The Journal of Supercomputing, 2017, 73 : 3179 - 3195
[45] Cross-Cloud MapReduce for Big Data
Li, Peng
Guo, Song
Yu, Shui
Zhuang, Weihua
IEEE TRANSACTIONS ON CLOUD COMPUTING, 2020, 8 (02) : 375 - 386
[46] Reduced Time Compression in Big Data Using MapReduce Approach and Hadoop
K. Meena
J. Sujatha
Journal of Medical Systems, 2019, 43
[47] Tomographic Image Reconstruction Method Using MapReduce in Big Data Environment
Alves, Gabriel M.
Cruvinel, Paulo E.
16TH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2022), 2022, : 293 - 298
[48] Big data pre-processing methods with vehicle driving data using MapReduce techniques
Cho, Wonhee
Choi, Eunmi
JOURNAL OF SUPERCOMPUTING, 2017, 73 (07): : 3179 - 3195
[49] Efficient Big Data Processing in Hadoop MapReduce
Dittrich, Jens
Quiane-Ruiz, Jorge-Arnulfo
PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (12): : 2014 - 2015
[50] Correlated Topic Modeling for Big Data with MapReduce
Oo, Mi Khine
Khine, May Aye
2018 IEEE 7TH GLOBAL CONFERENCE ON CONSUMER ELECTRONICS (GCCE 2018), 2018, : 408 - 409

← 1 2 3 4 5 →