Big Data Quality Scoring for Structured Data Using MapReduce

被引：0

作者：

Wu, Yalong ^{[1
]}

Dhamodharan, Shalini ^{[1
]}

Ghattamaneni, Vinuthna ^{[1
]}

Kokila, Narmada ^{[1
]}

Pathakamuri, Chandrika ^{[1
]}

Carter, Timothy ^{[1
]}

Tian, Pu ^{[2
]}

Sha, Kewei ^{[3
]}

机构：

[1] Univ Houston Clear Lake, Dept Comp Sci, Houston, TX 77058 USA

[2] Stockton Univ, Sch Business, Galloway, NJ 08205 USA

[3] Univ North Texas, Dept Informat Sci, Denton, TX 76203 USA

来源：

2024 33RD INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS AND NETWORKS, ICCCN 2024 | 2024年

关键词：

Big data quality scoring; MapReduce; data quality dimensions; standard normalization; aggregate data quality score;

D O I：

10.1109/NANO61778.2024.10637520

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In the current big data landscape, where data forms the cornerstone of myriad applications, it is crucial to establish the reliability and credibility of application outcomes through the prism of high-quality data. Nonetheless, data quality has been facing significant evaluation challenges due to the exponential increase in data volume and diversity. This paper introduces a novel big data quality scoring (BDQS) model, which is particularly designed for assessing the quality of large-scale datasets within the Hadoop MapReduce ecosystem. Unlike other models that either focus on smaller datasets or rely on sampling techniques, BDQS excels in providing comprehensive data quality assessment for substantial data sources. Specifically, BDQS identifies accuracy, completeness, consistency, timeliness, and correlation as critical dimensions of data quality, scores each dimension on a scale of 0 to 100, and derives an aggregate data quality score through binomial testing and standard normalization of these scores. This research advances a potent model for big data quality assessment and offers valuable insights for enhancing the reliability and applicability of large-scale datasets across various sectors.

引用

页数：6

共 50 条

[21] Incremental attribute reduction algorithm for big data using MapReduce
Lv, Ping
Qian, Jin
Yue, Xiaodong
JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2016, 16 (03) : 641 - 652
[22] Event Segmentation using MapReduce based Big Data Clustering
Shafiq, M. Omair
2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 1857 - 1866
[23] Parallel knowledge acquisition algorithms for big data using MapReduce
Qian, Jin
Xia, Min
Yue, Xiaodong
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2018, 9 (06) : 1007 - 1021
[24] Budget Constraint Scheduler for Big Data Using Hadoop MapReduce
Vinutha D.C.
Raju G.T.
SN Computer Science, 2021, 2 (4)
[25] Feature Selection and Classification of Big Data Using MapReduce Framework
Devi, D. Renuka
Sasikala, S.
INTELLIGENT COMPUTING, INFORMATION AND CONTROL SYSTEMS, ICICCS 2019, 2020, 1039 : 666 - 673
[26] On the use of MapReduce for imbalanced big data using Random Forest
del Rio, Sara
Lopez, Victoria
Manuel Benitez, Jose
Herrera, Francisco
INFORMATION SCIENCES, 2014, 285 : 112 - 137
[27] Dache: A Data Aware Caching for Big-Data Applications Using The MapReduce Framework
Zhao, Yaxiong
Wu, Jie
2013 PROCEEDINGS IEEE INFOCOM, 2013, : 35 - 39
[28] Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Framework
Zhao, Yaxiong
Wu, Jie
Liu, Cong
TSINGHUA SCIENCE AND TECHNOLOGY, 2014, 19 (01) : 39 - 50
[29] Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Framework
Yaxiong Zhao
Jie Wu
Cong Liu
TsinghuaScienceandTechnology, 2014, 19 (01) : 39 - 50
[30] Dache: A data aware caching for big-data applications using the MapReduce framework
Zhao, Y. (yaxiongzhao@google.com), 1600, Tsinghua University (19):

← 1 2 3 4 5 →