Enhanced Distributed Document Clustering Algorithm Using Different Similarity Measures

被引:0
|
作者
Narayanan, Neethi [1 ]
Judith, J. E. [1 ]
Jayakumari, J. [1 ]
机构
[1] Noorul Islam Ctr Higher Educ Kumaracoil, Kumaracoil, Tamil Nadu, India
关键词
Distributed document clustering similarity measures; Cosine similarity; Jaccard coefficient; Pearson coefficient;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Many of the distributed environments like internets, intranets, local area networks and wireless networks have different distributed data sources. lnorder to analyze and monitor these distributed data sources specialized data mining technologies for distributed applications are required. A variety of distributed document clustering algorithms exists for this purpose. This paper presents an Enhanced Distributed Algorithm (FDA) for document clustering. This paper presents the performance analysis of the algorithm using different similarity measures like cosine similarity, Jaccard and Pearson coefficient. The test was performed on standard document corpora like 2ONG (News Group), Reuters, Web The performance of this proposed FDA algorithm is also evaluated using different performance factors in order to determine its accuracy and clustering quality.
引用
收藏
页码:545 / 550
页数:6
相关论文
共 50 条
  • [41] Clustering Blogs Using Document Context Similarity and Spectral Graph Partitioning
    Ayyasamy, Ramesh Kumar
    Alhashmi, Saadat M.
    Eu-Gene, Siew
    Tahayna, Bashar
    KNOWLEDGE ENGINEERING AND MANAGEMENT, 2011, 123 : 475 - +
  • [42] An Empirical Evaluation of K-Means Clustering Algorithm Using Different Distance/Similarity Metrics
    Gupta, Manoj Kumar
    Chandra, Pravin
    PROCEEDINGS OF ICETIT 2019: EMERGING TRENDS IN INFORMATION TECHNOLOGY, 2020, 605 : 884 - 892
  • [43] A Similarity Rough Set Model for Document Representation and Document Clustering
    Nguyen Chi Thanh
    Yamada, Koichi
    Unehara, Muneyuki
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2011, 15 (02) : 125 - 133
  • [44] Efficient Pre-Processing for Enhanced Semantics Based Distributed Document Clustering
    Shah, Neepa
    Mahajan, Sunita
    PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 338 - 343
  • [45] Distributed Clustering Algorithm in Sensor Networks via Normalized Information Measures
    Qin, Jiahu
    Zhu, Yingda
    Fu, Weiming
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2020, 68 : 3266 - 3279
  • [46] Clustering of documents via similarity measures
    Rezanková, H
    Húsek, D
    Smid, J
    Snásel, V
    CIC'03: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMMUNICATIONS IN COMPUTING, 2003, : 292 - 299
  • [47] SIMILARITY MEASURES FOR NOMINAL VARIABLE CLUSTERING
    Sulc, Zdenek
    8TH INTERNATIONAL DAYS OF STATISTICS AND ECONOMICS, 2014, : 1536 - 1545
  • [48] Improved Similarity Measures For Software Clustering
    Naseem, Rashid
    Maqbool, Onaiza
    Muhammad, Siraj
    2011 15TH EUROPEAN CONFERENCE ON SOFTWARE MAINTENANCE AND REENGINEERING (CSMR), 2011, : 45 - 54
  • [49] Text document clustering using Spectral Clustering algorithm with Particle Swarm Optimization
    Janani, R.
    Vijayarani, S.
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 134 : 192 - 200
  • [50] Document Clustering in Correlation Similarity Measure Space
    Zhang, Taiping
    Tang, Yuan Yan
    Fang, Bin
    Xiang, Yong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (06) : 1002 - 1013