Design of SPRINT Parallelization of Data Mining Algorithms Based on Cloud Computing

被引:0
|
作者
Song, Lei [1 ]
Zhang, Huajie [2 ]
Feng, Dongdong [3 ]
机构
[1] Kaifeng Vocat Coll Culture & Arts, Modern Educ Ctr, Kaifeng 475004, Peoples R China
[2] Zhengzhou Univ Technol, Engn Training Ctr, Zhengzhou 450044, Peoples R China
[3] Henan Univ, Sch Software, Kaifeng 475004, Peoples R China
关键词
data mining; cloud computing; SPRINT algorithm; parallel design; ENERGY; PREDICTION;
D O I
暂无
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
For traditional data mining, all data shall be loaded into memory for analysis and calculation. It belongs to a stand-alone computing mode, which has low calculation efficiency, and a high mining failure rate during the work process. As the data storage and computer technology develop rapidly, how to store and process big data effectively has become an important problem to be solved. Cloud computing can quickly obtain resources from the computing resource pool, and implement parallel improvement of data mining algorithms, which can achieve an efficient combination of cloud computing platform and data mining, and effectively make up for the bottlenecks faced by traditional data mining processes. Therefore, based on the Hadoop cloud computing platform, this paper makes full use of the characteristics of the MapReduce programming framework, and proposes a parallel design of decision tree nodes, node attribute metrics, and Gini index ranking for the SPRINT decision tree algorithm. The performance of the parallelized SPRINT algorithm on classification accuracy, scalability, and speedup ratio is tested. The results indicate that the parallel design of the SPRINT algorithm can obtain good scalability and parallel speedup under the premise of ensuring classification accuracy, which verifies the feasibility of the parallel design of data mining algorithms on the basis of cloud computing.
引用
收藏
页码:399 / 405
页数:7
相关论文
共 50 条
  • [1] Design and Implementation of Data Mining Platform Based on the Cloud Computing
    Zhu Jia
    Zhang Ping
    PROCEEDINGS OF 2014 IEEE WORKSHOP ON ADVANCED RESEARCH AND TECHNOLOGY IN INDUSTRY APPLICATIONS (WARTIA), 2014, : 163 - 165
  • [2] Design and Implementation of a Data Mining Platform Based on Cloud Computing
    Nie, Jing
    AGRO FOOD INDUSTRY HI-TECH, 2017, 28 (01): : 318 - 321
  • [3] Optimization of Data Mining with Evolutionary Algorithms for Cloud Computing Application
    Malmir, Hamid
    Farokhi, Fardad
    Sabbaghi-Nadooshan, Reza
    PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE 2013), 2013, : 343 - 347
  • [4] Research on algorithms of data mining under cloud computing environment
    Long, Fei
    Journal of Chemical and Pharmaceutical Research, 2014, 6 (07) : 1152 - 1157
  • [5] Research on the Data Mining Based on Cloud Computing
    Luo, Laixi
    Zhu, Yu
    PROCEEDINGS OF 2020 CHINA MARKETING INTERNATIONAL CONFERENCE (WEB CONFERENCING): MARKETING AND MANAGEMENT IN THE DIGITAL AGE, 2020, : 494 - 505
  • [6] DATA MINING ALGORITHM BASED ON CLOUD COMPUTING
    Hao, Y. J.
    LATIN AMERICAN APPLIED RESEARCH, 2018, 48 (04) : 281 - 285
  • [7] Exploration of data mining algorithms of an online learning behaviour log based on cloud computing
    Wang, Rongguo
    INTERNATIONAL JOURNAL OF CONTINUING ENGINEERING EDUCATION AND LIFE-LONG LEARNING, 2021, 31 (03) : 371 - 380
  • [8] Parallelization with Multiplicative Algorithms for Big Data Mining
    Luo, Dijun
    Ding, Chris
    Huang, Heng
    12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2012), 2012, : 489 - 498
  • [9] Parallelization of Data Mining Algorithms for Multicore Processors
    Kholod, Ivan
    Kuprianov, Mikhail
    Shorov, Andrey
    2015 4TH MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING (MECO), 2015, : 262 - 267
  • [10] Data Mining Based on Cloud-Computing Technology
    Ren Ying
    Lv Hong
    Li Hua-wei
    Zhou Li-jun
    Wang Li-na
    INTERNATIONAL SEMINAR ON APPLIED PHYSICS, OPTOELECTRONICS AND PHOTONICS (APOP 2016), 2016, 61