Hierarchical Document Clustering based on Cosine Similarity measure

被引:0
|
作者
Popat, Shraddha K. [1 ]
Deshmukh, Pramod B. [1 ]
Metre, Vishakha A. [1 ]
机构
[1] DY Patil Coll Engn, Pune, Maharashtra, India
关键词
Cluster; Document cluster; Similarity;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering is one of the prime topics in data mining. Clustering partitions the data and classifies the data into meaningful subgroups. Document clustering is a set of the document into groups such that two groups show different characteristics with respect to likeness. In this paper, an experimental exploration of similarity based method, HSC for measuring the similarity between data objects particularly text documents is introduced. It also provides an algorithm which has an incremental approach and evaluates cluster likeness between documents that leads to much improved results over other traditional methods. It also focuses on the selection of appropriate similarity measure for analyzing similarity between the documents.
引用
收藏
页码:153 / 159
页数:7
相关论文
共 50 条
  • [1] A hierarchical clustering based on overlap similarity measure
    Qu, Jun
    Jiang, Qingshan
    Weng, Fangfei
    Hong, Zhiling
    SNPD 2007: EIGHTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING, AND PARALLEL/DISTRIBUTED COMPUTING, VOL 3, PROCEEDINGS, 2007, : 905 - +
  • [2] Hierarchical Clustering Algorithm for Binary Data Based on Cosine Similarity
    Gao, Xiaonan
    Wu, Sen
    2018 8TH INTERNATIONAL CONFERENCE ON LOGISTICS, INFORMATICS AND SERVICE SCIENCES (LISS), 2018,
  • [3] Novel Similarity Measure for Document Clustering Based on Topic Phrases
    ELdesoky, A. E.
    Saleh, M.
    Sakr, N. A.
    ICNM: 2009 INTERNATIONAL CONFERENCE ON NETWORKING & MEDIA CONVERGENCE, 2007, : 92 - +
  • [4] Affinity-based similarity measure for web document clustering
    Shyu, ML
    Chen, SC
    Chen, M
    Rubin, SH
    PROCEEDINGS OF THE 2004 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI-2004), 2004, : 247 - 252
  • [5] A Spatial Overlapping Based Similarity Measure Applied to Hierarchical Clustering
    Chen, Hong
    Guo, Gongde
    Huang, Yu
    Huang, Tianqiang
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 371 - 375
  • [6] A hierarchical clustering method for random intervals based on a similarity measure
    Ana Belén Ramos-Guajardo
    Computational Statistics, 2022, 37 : 229 - 261
  • [7] A hierarchical clustering method for random intervals based on a similarity measure
    Belen Ramos-Guajardo, Ana
    COMPUTATIONAL STATISTICS, 2022, 37 (01) : 229 - 261
  • [8] An Improved Cosine Similarity Algorithm Based on Document Similarity
    Lee, Ming
    Zhao, Heji
    INTERNATIONAL SYMPOSIUM ON FUZZY SYSTEMS, KNOWLEDGE DISCOVERY AND NATURAL COMPUTATION (FSKDNC 2014), 2014, : 196 - 204
  • [9] Document Clustering using Concept Space and Cosine Similarity Measurement
    Muflikhah, Lailil
    Baharudin, Baharum
    PROCEEDINGS OF THE 2009 INTERNATIONAL CONFERENCE ON COMPUTER TECHNOLOGY AND DEVELOPMENT, VOL 1, 2009, : 58 - 62
  • [10] Document Clustering in Correlation Similarity Measure Space
    Zhang, Taiping
    Tang, Yuan Yan
    Fang, Bin
    Xiang, Yong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (06) : 1002 - 1013