Euclidean distance stratified random sampling based clustering model for big data mining

被引:1
|
作者
Pandey, Kamlesh Kumar [1 ]
Shukla, Diwakar [1 ]
机构
[1] Dr Hari Singh Gour Vishwavidyalaya, Dept Comp Sci & Applicat, Sagar, Madhya Pradesh, India
关键词
big data mining; big data sampling; big data clustering; Euclidean distance based stratum; random sampling; sample extension; SSK-Means; stratified sampling; FRAMEWORK; ALGORITHM;
D O I
10.1002/cmm4.1206
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Big data mining is related to large-scale data analysis and faces computational cost-related challenges due to the exponential growth of digital technologies. Classical data mining algorithms suffer from computational deficiency, memory utilization, resource optimization, scale-up, and speed-up related challenges in big data mining. Sampling is one of the most effective data reduction techniques that reduces the computational cost, improves scalability and computational speed with high efficiency for any data mining algorithm in single and multiple machine execution environments. This study suggested a Euclidean distance-based stratum method for stratum creation and a stratified random sampling-based big data mining model using the K-Means clustering (SSK-Means) algorithm in a single machine execution environment. The performance of the SSK-Means algorithm has achieved better cluster quality, speed-up, scale-up, and memory utilization against the random sampling-based K-Means and classical K-Means algorithms using silhouette coefficient, Davies Bouldin index, Calinski Harabasz index, execution time, and speedup ratio internal measures.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] On Model Based Clustering in a Spatial Data Mining Context
    Schoier, Gabriella
    Borruso, Giuseppe
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2013, PT IV, 2013, 7974 : 375 - 388
  • [32] Model-based clustering with missing not at random data
    Sportisse, Aude
    Marbac, Matthieu
    Laporte, Fabien
    Celeux, Gilles
    Boyer, Claire
    Josse, Julie
    Biernacki, Christophe
    STATISTICS AND COMPUTING, 2024, 34 (04)
  • [33] CSS: Handling imbalanced data by improved clustering with stratified sampling
    Cao, Lu
    Shen, Hong
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (02):
  • [34] The Modeling and Simulation of Data Clustering Algorithms in Data Mining with Big Data
    Chen, Weiru
    Oliverio, Jared
    Kim, Jin Ho
    Shen, Jiayue
    JOURNAL OF INDUSTRIAL INTEGRATION AND MANAGEMENT-INNOVATION AND ENTREPRENEURSHIP, 2019, 4 (01):
  • [35] Social mining-based clustering process for big-data integration
    Hoill Jung
    Kyungyong Chung
    Journal of Ambient Intelligence and Humanized Computing, 2021, 12 : 589 - 600
  • [36] Social mining-based clustering process for big-data integration
    Jung, Hoill
    Chung, Kyungyong
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 12 (01) : 589 - 600
  • [37] Big Data Clustering Mining Based on Swarm Intelligence Algorithm in Cloud Environment
    Yan, Yaning
    MOBILE INFORMATION SYSTEMS, 2022, 2022
  • [38] A Comprehensive Study on Clustering Approaches for Big Data Mining
    Pandove, Divya
    Goel, Shivani
    2015 2ND INTERNATIONAL CONFERENCE ON ELECTRONICS AND COMMUNICATION SYSTEMS (ICECS), 2015, : 1333 - 1338
  • [39] Data mining method for English distance learning based on weighted fast clustering
    Yu, Xiaohong
    International Journal of Reasoning-based Intelligent Systems, 2024, 16 (05) : 367 - 374
  • [40] Generalized noise clustering based on non-Euclidean distance
    Department of Physics and Electronic Information, Leshan Teachers College, Leshan 614004, China
    不详
    不详
    Beijing Jiaotong Daxue Xuebao, 2008, 6 (98-101):