Big Data Quality Scoring for Structured Data Using MapReduce

被引：0

作者：

Wu, Yalong ^{[1
]}

Dhamodharan, Shalini ^{[1
]}

Ghattamaneni, Vinuthna ^{[1
]}

Kokila, Narmada ^{[1
]}

Pathakamuri, Chandrika ^{[1
]}

Carter, Timothy ^{[1
]}

Tian, Pu ^{[2
]}

Sha, Kewei ^{[3
]}

机构：

[1] Univ Houston Clear Lake, Dept Comp Sci, Houston, TX 77058 USA

[2] Stockton Univ, Sch Business, Galloway, NJ 08205 USA

[3] Univ North Texas, Dept Informat Sci, Denton, TX 76203 USA

来源：

2024 33RD INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS AND NETWORKS, ICCCN 2024 | 2024年

关键词：

Big data quality scoring; MapReduce; data quality dimensions; standard normalization; aggregate data quality score;

D O I：

10.1109/NANO61778.2024.10637520

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In the current big data landscape, where data forms the cornerstone of myriad applications, it is crucial to establish the reliability and credibility of application outcomes through the prism of high-quality data. Nonetheless, data quality has been facing significant evaluation challenges due to the exponential increase in data volume and diversity. This paper introduces a novel big data quality scoring (BDQS) model, which is particularly designed for assessing the quality of large-scale datasets within the Hadoop MapReduce ecosystem. Unlike other models that either focus on smaller datasets or rely on sampling techniques, BDQS excels in providing comprehensive data quality assessment for substantial data sources. Specifically, BDQS identifies accuracy, completeness, consistency, timeliness, and correlation as critical dimensions of data quality, scores each dimension on a scale of 0 to 100, and derives an aggregate data quality score through binomial testing and standard normalization of these scores. This research advances a potent model for big data quality assessment and offers valuable insights for enhancing the reliability and applicability of large-scale datasets across various sectors.

引用

页数：6

共 50 条

[11] MapReduce Research on Warehousing of Big Data
Pticek, M.
Vrdoljak, B.
2017 40TH INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2017, : 1361 - 1366
[12] MapReduce Algorithms for Big Data Analysis
Shim, Kyuseok
DATABASES THEORY AND APPLICATIONS, ADC 2018, 2018, 10837 : XV - XV
[13] Prominence of MapReduce in BIG DATA Processing
Pandey, Shweta
Tokekar, Vrinda
2014 FOURTH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT), 2014, : 555 - 560
[14] A FAST BIG DATA COLLECTION SYSTEM USING MAPREDUCE FRAMEWORK
Bing, Li
Chan, Keith C. C.
2014 IEEE 3RD INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2014, : 530 - 535
[15] Hierarchical attribute reduction algorithms for big data using MapReduce
Qian, Jin
Lv, Ping
Yue, Xiaodong
Liu, Caihui
Jing, Zhengjun
KNOWLEDGE-BASED SYSTEMS, 2015, 73 : 18 - 31
[16] A Crowdsourcing Worker Quality Evaluation Algorithm on MapReduce for Big Data Applications
Dang, Depeng
Liu, Ying
Zhang, Xiaoran
Huang, Shihang
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2016, 27 (07) : 1879 - 1888
[17] Handling Big Data Using MapReduce Over Hybrid Cloud
Saxena, Ankur
Chaurasia, Ankur
Kaushik, Neeraj
Kaushik, Nidhi
INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING AND COMMUNICATIONS, VOL 2, 2019, 56 : 135 - 144
[18] PARALLEL KNOWLEDGE ACQUISITION ALGORITHM FOR BIG DATA USING MAPREDUCE
Qian, Jin
Xia, Min
Lv, Ping
PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOL. 1, 2015, : 316 - 321
[19] Parallel knowledge acquisition algorithms for big data using MapReduce
Jin Qian
Min Xia
Xiaodong Yue
International Journal of Machine Learning and Cybernetics, 2018, 9 : 1007 - 1021
[20] Improved CURE Clustering for Big Data using Hadoop and Mapreduce
Lathiya, Piyush
Rani, Rinkle
2016 INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT), VOL 3, 2015, : 241 - 245

← 1 2 3 4 5 →