Enhanced Distributed Document Clustering Algorithm Using Different Similarity Measures

被引:0
|
作者
Narayanan, Neethi [1 ]
Judith, J. E. [1 ]
Jayakumari, J. [1 ]
机构
[1] Noorul Islam Ctr Higher Educ Kumaracoil, Kumaracoil, Tamil Nadu, India
关键词
Distributed document clustering similarity measures; Cosine similarity; Jaccard coefficient; Pearson coefficient;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Many of the distributed environments like internets, intranets, local area networks and wireless networks have different distributed data sources. lnorder to analyze and monitor these distributed data sources specialized data mining technologies for distributed applications are required. A variety of distributed document clustering algorithms exists for this purpose. This paper presents an Enhanced Distributed Algorithm (FDA) for document clustering. This paper presents the performance analysis of the algorithm using different similarity measures like cosine similarity, Jaccard and Pearson coefficient. The test was performed on standard document corpora like 2ONG (News Group), Reuters, Web The performance of this proposed FDA algorithm is also evaluated using different performance factors in order to determine its accuracy and clustering quality.
引用
收藏
页码:545 / 550
页数:6
相关论文
共 50 条
  • [1] Comparative Analysis of Similarity Measures in Document Clustering
    Karun, Kavitha A.
    Philip, Mintu
    Lubna, K.
    2013 INTERNATIONAL CONFERENCE ON GREEN COMPUTING, COMMUNICATION AND CONSERVATION OF ENERGY (ICGCE), 2013, : 857 - 860
  • [2] Hypergraph based clustering for document similarity using FP growth algorithm
    Ramakrishnan, Nayana
    Nair, Meenakshi J.
    Jayaprakash, Deepak
    Ananthakrishnan, H.
    Rani, Siji S.
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICCS), 2019, : 332 - 336
  • [3] Efficient text document clustering with new similarity measures
    Lakshmi R.
    Baskar S.
    International Journal of Business Intelligence and Data Mining, 2021, 18 (01) : 109 - 126
  • [4] An improved Document Clustering Approach with Multi-Viewpoint based on different similarity measures
    Gupta, Anjali
    Dubey, Rahul
    PROCEEDINGS OF THE 2018 SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS), 2018, : 152 - 157
  • [5] Medical document clustering using ontology-based term similarity measures
    College of Information Science and Technology, Drexel University, Philadelphia, PA, United States
    不详
    不详
    不详
    不详
    Int. J. Data Warehouse. Min., 2008, 1 (62-73):
  • [6] Optimized Distributed Text Document Clustering Algorithm
    Judith, J. E.
    Jayakumari, J.
    ARTIFICIAL INTELLIGENCE AND EVOLUTIONARY ALGORITHMS IN ENGINEERING SYSTEMS, VOL 2, 2015, 325 : 565 - 574
  • [7] Fine-Tuning an Algorithm for Semantic Document Clustering Using a Similarity Graph
    Stanchev, Lubomir
    INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING, 2016, 10 (04) : 527 - 555
  • [8] Frequent Term Based Text Document Clustering Using Similarity Measures: A Novel Approach
    Gupta, Vijay Kumar
    Dutta, Maitreyee
    Kumar, Manoj
    2017 FOURTH INTERNATIONAL CONFERENCE ON IMAGE INFORMATION PROCESSING (ICIIP), 2017, : 164 - 169
  • [9] Analysis of similarity measures with WordNet based text document clustering
    Sandhya, Nadella
    Govardhan, A.
    Advances in Intelligent and Soft Computing, 2012, 132 AISC : 703 - 714
  • [10] Analysis of Similarity Measures with WordNet Based Text Document Clustering
    Sandhya, Nadella
    Govardhan, A.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS 2012 (INDIA 2012), 2012, 132 : 703 - +