Enhanced Distributed Document Clustering Algorithm Using Different Similarity Measures

被引:0
|
作者
Narayanan, Neethi [1 ]
Judith, J. E. [1 ]
Jayakumari, J. [1 ]
机构
[1] Noorul Islam Ctr Higher Educ Kumaracoil, Kumaracoil, Tamil Nadu, India
关键词
Distributed document clustering similarity measures; Cosine similarity; Jaccard coefficient; Pearson coefficient;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Many of the distributed environments like internets, intranets, local area networks and wireless networks have different distributed data sources. lnorder to analyze and monitor these distributed data sources specialized data mining technologies for distributed applications are required. A variety of distributed document clustering algorithms exists for this purpose. This paper presents an Enhanced Distributed Algorithm (FDA) for document clustering. This paper presents the performance analysis of the algorithm using different similarity measures like cosine similarity, Jaccard and Pearson coefficient. The test was performed on standard document corpora like 2ONG (News Group), Reuters, Web The performance of this proposed FDA algorithm is also evaluated using different performance factors in order to determine its accuracy and clustering quality.
引用
收藏
页码:545 / 550
页数:6
相关论文
共 50 条
  • [21] Fuzzy Ontology for Distributed Document Clustering based on Genetic Algorithm
    Thangamani, M.
    Thangaraj, P.
    APPLIED MATHEMATICS & INFORMATION SCIENCES, 2013, 7 (04): : 1563 - 1574
  • [22] Effect of Dimensionality Reduction on Different Distance Measures in Document Clustering
    Paukkeri, Mari-Sanna
    Kivimaki, Ilkka
    Tirunagari, Santosh
    Oja, Erkki
    Honkela, Timo
    NEURAL INFORMATION PROCESSING, PT III, 2011, 7064 : 167 - +
  • [23] Sentence Clustering in Text Document Using Fuzzy Clustering Algorithm
    Sruthi, S.
    Shalini, L.
    2014 INTERNATIONAL CONFERENCE ON CONTROL, INSTRUMENTATION, COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES (ICCICCT), 2014, : 1473 - 1476
  • [24] Genetic algorithm for text clustering using ontology and evaluating the validity of various semantic similarity measures
    Song, Wei
    Li, Cheng Hua
    Park, Soon Cheol
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (05) : 9095 - 9104
  • [25] Clustering of dissimilar perception phase constructed for similarity measures using k-Means Algorithm
    Bindiya, M. K.
    RaviKumar, G. K.
    PROCEEDINGS OF THE 2016 2ND INTERNATIONAL CONFERENCE ON APPLIED AND THEORETICAL COMPUTING AND COMMUNICATION TECHNOLOGY (ICATCCT), 2016, : 618 - 622
  • [26] Clustering of Argument Graphs Using Semantic Similarity Measures
    Block, Karsten
    Trumm, Simon
    Sahitaj, Premtim
    Ollinger, Stefan
    Bergmann, Ralph
    ADVANCES IN ARTIFICIAL INTELLIGENCE, KI 2019, 2019, 11793 : 101 - 114
  • [27] DOCUMENT CLUSTERING USING ANT COLONY ALGORITHM
    Nagarajan, E.
    Saritha, Keshetty
    MadhuGayathri, G.
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS AND COMPUTATIONAL INTELLIGENCE (ICBDAC), 2017, : 459 - 463
  • [28] Document Clustering using Concept Space and Cosine Similarity Measurement
    Muflikhah, Lailil
    Baharudin, Baharum
    PROCEEDINGS OF THE 2009 INTERNATIONAL CONFERENCE ON COMPUTER TECHNOLOGY AND DEVELOPMENT, VOL 1, 2009, : 58 - 62
  • [29] Distributed document clustering using word-clusters
    Deb, Debzani
    Angryk, Rafal A.
    2007 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DATA MINING, VOLS 1 AND 2, 2007, : 376 - 383
  • [30] The Effect of Different Similarity Distance Measures in Detecting Outliers Using Single-Linkage Clustering Algorithm for Univariate Circular Biological Data
    Zulkipli, Nur Syahirah
    Satari, Siti Zanariah
    Yusoff, Wan Nur Syahidah Wan
    PAKISTAN JOURNAL OF STATISTICS AND OPERATION RESEARCH, 2022, 18 (03) : 561 - 573