An Efficient Distributed Database Clustering Algorithm for Big Data Processing

被引:0
|
作者
Sun, Qiao [1 ]
Fu, Lan-mei [1 ]
Deng, Bu-qiao [1 ]
Pei, Xu-bin [2 ]
Sun, Jia-song [3 ]
机构
[1] Beijing GuoDianTong Network Technol Co Ltd, Beijing, Peoples R China
[2] State Grid Zhejiang Elect Power Co Ltd, Hangzhou, Zhejiang, Peoples R China
[3] Tsinghua Univ, EE Dept, Beijing, Peoples R China
关键词
Distributed big data processing; Distributed database; Data clustering; Depth neural network; K-means;
D O I
10.23977/iccsc.2017.1012
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper proposes a distributed data clustering technique based on deep neural network. First, each record in the distributed database is taken as an input vector, and its characteristics are extracted and input to the input layer of the depth neural network. The weight of the connection is trained by BP algorithm, and the training of depth neural network output is realized by adjusting the weight. Finally, the data clustering results are judged according to the similarity of the current vector corresponding to the output data. Experimental results based on small-scale distributed systems show that this method has better test set accuracy than traditional k-means clustering method, and is more suitable for large-scale data clustering in the distributed environments.
引用
收藏
页码:70 / 74
页数:5
相关论文
共 50 条
  • [21] A Computational Efficient Fuzzy Clustering Algorithm for Big Incomplete Longitudinal Trial Data
    Gurugubelli, Venkata Sukumar
    Li, Zhouzhou
    Wang, Honggang
    Fang, Hua
    2018 IEEE/ACM INTERNATIONAL CONFERECE ON CONNECTED HEALTH: APPLICATIONS, SYSTEMS AND ENGINEERING TECHNOLOGIES (CHASE), 2018, : 25 - 26
  • [22] An Efficient Approach for Storage of Big Data Streams in Distributed Stream Processing Systems
    Alshamrani, Sultan
    Waseem, Quadri
    Alharbi, Abdullah
    Alosaimi, Wael
    Turabieh, Hamza
    Alyami, Hashem
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (05) : 91 - 98
  • [23] Cludoop: An Efficient Distributed Density-Based Clustering for Big Data Using Hadoop
    Yu, Yanwei
    Zhao, Jindong
    Wang, Xiaodong
    Wang, Qin
    Zhang, Yonggang
    INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2015,
  • [24] An Efficient Transaction Processing Method on the Distributed Database
    Sun, Qiao
    Fu, Lan-mei
    Deng, Bu-qiao
    Sun, Jiasong
    2016 9TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2016), 2016, : 1916 - 1920
  • [25] A Distributed Rough Set Theory Algorithm based on Locality Sensitive Hashing for an Efficient Big Data Pre-processing
    Dagdia, Zaineb Chelly
    Zarges, Christine
    Beck, Gael
    Azzag, Hanene
    Lebbah, Mustapha
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 2597 - 2606
  • [26] A Distributed Rough Set Theory based Algorithm for an Efficient Big Data Pre-processing under the Spark Framework
    Dagdia, Zaineb Chelly
    Zarges, Christine
    Beck, Gael
    Lebbah, Mustapha
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 911 - 916
  • [27] Efficient Distributed Data Clustering on Spark
    Li, Jia
    Li, Dongsheng
    Zhang, Yiming
    2015 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING - CLUSTER 2015, 2015, : 504 - 505
  • [28] Distributed Efficient Multimodal Data Clustering
    Chen, Jia
    Schizas, Ioannis D.
    2017 25TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2017, : 2304 - 2308
  • [29] An efficient algorithm for distributed density-based outlier detection on big data
    Bai, Mei
    Wang, Xite
    Xin, Junchang
    Wang, Guoren
    NEUROCOMPUTING, 2016, 181 : 19 - 28
  • [30] An Efficient and Scalable Algorithm to Mine Functional Dependencies from Distributed Big Data
    Wu, Wanqing
    Mao, Wenyu
    SENSORS, 2022, 22 (10)