Hierarchical Document Clustering based on Cosine Similarity measure

被引：0

作者：

Popat, Shraddha K. ^{[1
]}

Deshmukh, Pramod B. ^{[1
]}

Metre, Vishakha A. ^{[1
]}

机构：

[1] DY Patil Coll Engn, Pune, Maharashtra, India

来源：

2017 1ST INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND INFORMATION MANAGEMENT (ICISIM) | 2017年

关键词：

Cluster; Document cluster; Similarity;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Clustering is one of the prime topics in data mining. Clustering partitions the data and classifies the data into meaningful subgroups. Document clustering is a set of the document into groups such that two groups show different characteristics with respect to likeness. In this paper, an experimental exploration of similarity based method, HSC for measuring the similarity between data objects particularly text documents is introduced. It also provides an algorithm which has an incremental approach and evaluates cluster likeness between documents that leads to much improved results over other traditional methods. It also focuses on the selection of appropriate similarity measure for analyzing similarity between the documents.

引用

页码：153 / 159

页数：7

共 50 条

[31] Hierarchical Clustering Using Homogeneity as Similarity Measure for Big Data Analytics
Zhao, Yunwei
Chi, Chi-Hung
Ding, Chen
Wong, Raymond
Zhou, Wei
Wang, Can
2015 IEEE 12TH INTERNATIONAL CONFERENCE ON SERVICES COMPUTING (SCC 2015), 2015, : 348 - 354
[32] Web User Profiling using Hierarchical Clustering with Improved Similarity Measure
Algiriyage, Nilani
Jayasena, Sanath
Dias, Gihan
2015 MORATUWA ENGINEERING RESEARCH CONFERENCE (MERCON), 2015, : 295 - 300
[33] Clustering with Multiviewpoint-Based Similarity Measure
Duc Thang Nguyen
Chen, Lihui
Chan, Chee Keong
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (06) : 988 - 1001
[34] Ontology-based structured cosine similarity in speech document summarization
Yuan, ST
Sun, J
IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2004), PROCEEDINGS, 2004, : 508 - 513
[35] A fast text similarity measure for large document collections using multireference cosine and genetic algorithm
Mohammadi, Hamid
Khasteh, Seyed Hossein
TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2020, 28 (02) : 999 - 1013
[36] The Document Vectors Using Cosine Similarity Revisited
Zhang Bingyu
Arefyev, Nikolay
PROCEEDINGS OF THE THIRD WORKSHOP ON INSIGHTS FROM NEGATIVE RESULTS IN NLP (INSIGHTS 2022), 2022, : 129 - 133
[37] An Efficient Parallelized Ontology Network-Based Semantic Similarity Measure for Big Biomedical Document Clustering
Li, Meijing
Chen, Tianjie
Ryu, Keun Ho
Jin, Cheng Hao
COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2021, 2021 (2021)
[38] Adaptive document clustering based on query-based similarity
Na, Seung-Hoon
Kang, In-Su
Lee, Jong-Hyeok
INFORMATION PROCESSING & MANAGEMENT, 2007, 43 (04) : 887 - 901
[39] WordNet and Semantic Similarity based Approach for Document Clustering
Desai, Sneha S.
Laxminarayana, J. A.
2016 INTERNATIONAL CONFERENCE ON COMPUTATION SYSTEM AND INFORMATION TECHNOLOGY FOR SUSTAINABLE SOLUTIONS (CSITSS), 2016, : 312 - 317
[40] Efficient phrase-based document similarity for clustering
Chim, Hung
Deng, Xiaotie
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (09) : 1217 - 1229

← 1 2 3 4 5 →