Hierarchical Document Clustering based on Cosine Similarity measure

被引:0
|
作者
Popat, Shraddha K. [1 ]
Deshmukh, Pramod B. [1 ]
Metre, Vishakha A. [1 ]
机构
[1] DY Patil Coll Engn, Pune, Maharashtra, India
关键词
Cluster; Document cluster; Similarity;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering is one of the prime topics in data mining. Clustering partitions the data and classifies the data into meaningful subgroups. Document clustering is a set of the document into groups such that two groups show different characteristics with respect to likeness. In this paper, an experimental exploration of similarity based method, HSC for measuring the similarity between data objects particularly text documents is introduced. It also provides an algorithm which has an incremental approach and evaluates cluster likeness between documents that leads to much improved results over other traditional methods. It also focuses on the selection of appropriate similarity measure for analyzing similarity between the documents.
引用
收藏
页码:153 / 159
页数:7
相关论文
共 50 条
  • [21] Hierarchical Clustering based on IndoorGML Document
    Tamas, Judit
    2019 IEEE 15TH INTERNATIONAL SCIENTIFIC CONFERENCE ON INFORMATICS (INFORMATICS 2019), 2019, : 177 - 182
  • [22] Document Similarity Measure Based on Topic Model
    He, Ming
    Wang, Zhen-zhen
    Du, Yong-ping
    APPLIED SCIENCE, MATERIALS SCIENCE AND INFORMATION TECHNOLOGIES IN INDUSTRY, 2014, 513-517 : 1280 - 1284
  • [23] Similarity Measures of Pythagorean Fuzzy Sets Based on Combination of Cosine Similarity Measure and Euclidean Distance Measure
    Mohd, Wan Rosanisah Wan
    Abdullah, Lazim
    PROCEEDING OF THE 25TH NATIONAL SYMPOSIUM ON MATHEMATICAL SCIENCES (SKSM25): MATHEMATICAL SCIENCES AS THE CORE OF INTELLECTUAL EXCELLENCE, 2018, 1974
  • [24] Scalable spectral clustering with cosine similarity
    Chen, Guangliang
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 314 - 319
  • [25] A Taxonomy based Semantic Similarity of Documents using the Cosine Measure
    Madylova, Ainura
    Oguducu, Sule Guenduez
    2009 24TH INTERNATIONAL SYMPOSIUM ON COMPUTER AND INFORMATION SCIENCES, 2009, : 129 - 134
  • [26] Content based Document Classification using Soft Cosine Measure
    Hasan, Md Zahid
    Hossain, Shakhawat
    Rizvee, Md Arif
    Rana, Md Shohel
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (04) : 522 - 528
  • [27] A Document Clustering Method based on Hierarchical Algorithm with Model Clustering
    Sun, Haojun
    Liu, Zhihui
    Kong, Lingjun
    2008 22ND INTERNATIONAL WORKSHOPS ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS, VOLS 1-3, 2008, : 1229 - +
  • [28] Similarity Measure of the Visual Features Using the Constrained Hierarchical Clustering for Content Based Image Retrieval
    Yoon, Sang Min
    Graf, Holger
    ADVANCES IN VISUAL COMPUTING, PT II, PROCEEDINGS, 2008, 5359 : 860 - +
  • [29] A measure of DNA sequence similarity by Fourier Transform with applications on hierarchical clustering
    Yin, Changchuan
    Chen, Ying
    Yau, Stephen S. -T.
    JOURNAL OF THEORETICAL BIOLOGY, 2014, 359 : 18 - 28
  • [30] A New Similarity Measure and Hierarchical Clustering Approach to Color Image Segmentation
    Gherbaoui, Radhwane
    Benamrane, Nacera
    Ouali, Mohammed
    2023 IEEE 24TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE, IRI, 2023, : 34 - 39