Euclidean distance stratified random sampling based clustering model for big data mining

被引:1
|
作者
Pandey, Kamlesh Kumar [1 ]
Shukla, Diwakar [1 ]
机构
[1] Dr Hari Singh Gour Vishwavidyalaya, Dept Comp Sci & Applicat, Sagar, Madhya Pradesh, India
关键词
big data mining; big data sampling; big data clustering; Euclidean distance based stratum; random sampling; sample extension; SSK-Means; stratified sampling; FRAMEWORK; ALGORITHM;
D O I
10.1002/cmm4.1206
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Big data mining is related to large-scale data analysis and faces computational cost-related challenges due to the exponential growth of digital technologies. Classical data mining algorithms suffer from computational deficiency, memory utilization, resource optimization, scale-up, and speed-up related challenges in big data mining. Sampling is one of the most effective data reduction techniques that reduces the computational cost, improves scalability and computational speed with high efficiency for any data mining algorithm in single and multiple machine execution environments. This study suggested a Euclidean distance-based stratum method for stratum creation and a stratified random sampling-based big data mining model using the K-Means clustering (SSK-Means) algorithm in a single machine execution environment. The performance of the SSK-Means algorithm has achieved better cluster quality, speed-up, scale-up, and memory utilization against the random sampling-based K-Means and classical K-Means algorithms using silhouette coefficient, Davies Bouldin index, Calinski Harabasz index, execution time, and speedup ratio internal measures.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Stratified linear systematic sampling based clustering approach for detection of financial risk group by mining of big data
    Kamlesh Kumar Pandey
    Diwakar Shukla
    International Journal of System Assurance Engineering and Management, 2022, 13 : 1239 - 1253
  • [2] Stratified linear systematic sampling based clustering approach for detection of financial risk group by mining of big data
    Pandey, Kamlesh Kumar
    Shukla, Diwakar
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2022, 13 (03) : 1239 - 1253
  • [3] Stratified Sampling Design Based on Data Mining
    Kim, Yeonkook J.
    Oh, Yoonhwan
    Park, Sunghoon
    Cho, Sungzoon
    Park, Hayoung
    HEALTHCARE INFORMATICS RESEARCH, 2013, 19 (03) : 186 - 195
  • [4] A Euclidean Distance Matrix Model for Convex Clustering
    Wang, Z. W.
    Liu, X. W.
    Li, Q. N.
    JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2025, 205 (01)
  • [5] Improving the Efficiency of Image Clustering using Modified Non Euclidean Distance Measures in Data Mining
    Santhi, P.
    Bhaskaran, V. Murali
    INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2014, 9 (01) : 56 - 61
  • [6] Sampling-Based Consensus Fuzzy Clustering on Big Data
    Zoghlami, Mohamed Ali
    Sassi Hidri, Minyar
    Ben Ayed, Rahma
    2016 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2016, : 1501 - 1508
  • [7] Centrality Clustering-Based Sampling for Big Data Visualization
    Tam Thanh Nguyen
    Song, Insu
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 1911 - 1917
  • [8] Stratified sampling for data mining on the deep web
    Tantan Liu
    Fan Wang
    Gagan Agrawal
    Frontiers of Computer Science, 2012, 6 : 179 - 196
  • [9] A stratified sampling based clustering algorithm for large-scale data
    Zhao, Xingwang
    Liang, Jiye
    Dang, Chuangyin
    KNOWLEDGE-BASED SYSTEMS, 2019, 163 : 416 - 428
  • [10] Stratified sampling for data mining on the deep web
    Liu, Tantan
    Wang, Fan
    Agrawal, Gagan
    FRONTIERS OF COMPUTER SCIENCE, 2012, 6 (02) : 179 - 196