New diagonal bundle method for clustering problems in large data sets

被引:13
|
作者
Karmitsa, Napsu [1 ]
Bagirov, Adil M. [2 ]
Taheri, Sona [2 ]
机构
[1] Univ Turku, Dept Math & Stat, FI-20014 Turku, Finland
[2] Federat Univ Australia, Fac Sci & Technol, Ballarat, Vic, Australia
基金
芬兰科学院;
关键词
Data mining; Nonsmooth optimization; Nonconvex optimization; DC function; Bundle methods; SCALE NONSMOOTH OPTIMIZATION; VARIABLE-METRIC METHOD; UNCONSTRAINED MINIMIZATION; BOUND ALGORITHM; K-MEANS; SEARCH; MODEL; DCA;
D O I
10.1016/j.ejor.2017.06.010
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
Clustering is one of the most important tasks in data mining. Recent developments in computer hardware allow us to store in random access memory (RAM) and repeatedly read data sets with hundreds of thousands and even millions of data points. This makes it possible to use conventional clustering algorithms in such data sets. However, these algorithms may need prohibitively large computational time and fail to produce accurate solutions. Therefore, it is important to develop clustering algorithms which are accurate and can provide real time clustering in large data sets. This paper introduces one of them. Using nonsmooth optimization formulation of the clustering problem the objective function is represented as a difference of two convex (DC) functions. Then a new diagonal bundle algorithm that explicitly uses this structure is designed and combined with an incremental approach to solve this problem. The method is evaluated using real world data sets with both large number of attributes and large number of data points. The proposed method is compared with two other clustering algorithms using numerical results. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:367 / 379
页数:13
相关论文
共 50 条
  • [1] Clustering in large data sets with the limited memory bundle method
    Karmitsa, Napsu
    Bagirov, Adil M.
    Taheri, Sona
    PATTERN RECOGNITION, 2018, 83 : 245 - 259
  • [2] An investigation of mountain method clustering for large data sets
    Velthuizen, RP
    Hall, LO
    Clarke, LP
    Silbiger, ML
    PATTERN RECOGNITION, 1997, 30 (07) : 1121 - 1135
  • [3] A clustering method for very large mixed data sets
    Sánchez-Díaz, G
    Ruiz-Shulcloper, J
    2001 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2001, : 643 - 644
  • [4] Determination of similarity threshold in clustering problems for large data sets
    Sánchez-Díaz, G
    Martínez-Trinidad, JF
    PROGRESS IN PATTERN RECOGNITION, SPEECH AND IMAGE ANALYSIS, 2003, 2905 : 611 - 618
  • [5] A visual and interactive data exploration method for large data sets and clustering
    Da Costa, David
    Venturini, Gilles
    ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2007, 4632 : 553 - +
  • [6] A projection method for robust estimation and clustering in large data sets
    Pena, Daniel
    Prieto, Francisco J.
    DATA ANALYSIS, CLASSIFICATION AND THE FORWARD SEARCH, 2006, : 209 - +
  • [7] A Fast Method of Coarse Density Clustering for Large Data Sets
    Zhao, Lei
    Yang, Jiwen
    Fan, Jianxi
    PROCEEDINGS OF THE 2009 2ND INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING AND INFORMATICS, VOLS 1-4, 2009, : 1941 - 1945
  • [8] Efficient clustering of large data sets
    Ananthanarayana, VS
    Murty, MN
    Subramanian, DK
    PATTERN RECOGNITION, 2001, 34 (12) : 2561 - 2563
  • [9] A New Clustering Method Suitable for Large Scale Data
    Xu Yin
    Hong Xingyong
    Zhou Wenjiang
    Wang Lunwen
    Zhang Ling
    Tan Ying
    2008 7TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-23, 2008, : 6277 - +
  • [10] Clustering Analysis for Large Scale Data Sets
    Singh, Sachin
    Mishra, Ashish
    2015 INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION & AUTOMATION (ICCCA), 2015, : 1 - 4