Hierarchical Document Clustering based on Cosine Similarity measure

被引:0
|
作者
Popat, Shraddha K. [1 ]
Deshmukh, Pramod B. [1 ]
Metre, Vishakha A. [1 ]
机构
[1] DY Patil Coll Engn, Pune, Maharashtra, India
关键词
Cluster; Document cluster; Similarity;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering is one of the prime topics in data mining. Clustering partitions the data and classifies the data into meaningful subgroups. Document clustering is a set of the document into groups such that two groups show different characteristics with respect to likeness. In this paper, an experimental exploration of similarity based method, HSC for measuring the similarity between data objects particularly text documents is introduced. It also provides an algorithm which has an incremental approach and evaluates cluster likeness between documents that leads to much improved results over other traditional methods. It also focuses on the selection of appropriate similarity measure for analyzing similarity between the documents.
引用
收藏
页码:153 / 159
页数:7
相关论文
共 50 条
  • [31] Hierarchical Clustering Using Homogeneity as Similarity Measure for Big Data Analytics
    Zhao, Yunwei
    Chi, Chi-Hung
    Ding, Chen
    Wong, Raymond
    Zhou, Wei
    Wang, Can
    2015 IEEE 12TH INTERNATIONAL CONFERENCE ON SERVICES COMPUTING (SCC 2015), 2015, : 348 - 354
  • [32] Web User Profiling using Hierarchical Clustering with Improved Similarity Measure
    Algiriyage, Nilani
    Jayasena, Sanath
    Dias, Gihan
    2015 MORATUWA ENGINEERING RESEARCH CONFERENCE (MERCON), 2015, : 295 - 300
  • [33] Clustering with Multiviewpoint-Based Similarity Measure
    Duc Thang Nguyen
    Chen, Lihui
    Chan, Chee Keong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (06) : 988 - 1001
  • [34] Ontology-based structured cosine similarity in speech document summarization
    Yuan, ST
    Sun, J
    IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2004), PROCEEDINGS, 2004, : 508 - 513
  • [35] A fast text similarity measure for large document collections using multireference cosine and genetic algorithm
    Mohammadi, Hamid
    Khasteh, Seyed Hossein
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2020, 28 (02) : 999 - 1013
  • [36] The Document Vectors Using Cosine Similarity Revisited
    Zhang Bingyu
    Arefyev, Nikolay
    PROCEEDINGS OF THE THIRD WORKSHOP ON INSIGHTS FROM NEGATIVE RESULTS IN NLP (INSIGHTS 2022), 2022, : 129 - 133
  • [37] An Efficient Parallelized Ontology Network-Based Semantic Similarity Measure for Big Biomedical Document Clustering
    Li, Meijing
    Chen, Tianjie
    Ryu, Keun Ho
    Jin, Cheng Hao
    COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2021, 2021 (2021)
  • [38] Adaptive document clustering based on query-based similarity
    Na, Seung-Hoon
    Kang, In-Su
    Lee, Jong-Hyeok
    INFORMATION PROCESSING & MANAGEMENT, 2007, 43 (04) : 887 - 901
  • [39] WordNet and Semantic Similarity based Approach for Document Clustering
    Desai, Sneha S.
    Laxminarayana, J. A.
    2016 INTERNATIONAL CONFERENCE ON COMPUTATION SYSTEM AND INFORMATION TECHNOLOGY FOR SUSTAINABLE SOLUTIONS (CSITSS), 2016, : 312 - 317
  • [40] Efficient phrase-based document similarity for clustering
    Chim, Hung
    Deng, Xiaotie
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (09) : 1217 - 1229