Sampling and feature selection in a genetic algorithm for document clustering

被引：0

作者：

Casillas, A ^{[1
]}

de Lena, MTG

Martínez, R

机构：

[1] Univ Basque Country, Dpt Elect & Elect, E-48080 Bilbao, Spain

[2] Univ Rey Juan Carlos, Dpt Informat Estadist & Telemat, Madrid, Spain

来源：

COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING | 2004年 / 2945卷

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper we describe a Genetic Algorithm for document clustering that includes a sampling technique to reduce computation time. This algorithm calculates an approximation of the optimum k value, and solves the best grouping of the documents into these k clusters. We evaluate this algorithm with sets of documents that are the output of a query in a search engine. Two types of experiment are carried out to determine: (1) how the genetic algorithm works with a sample of documents, (2) which document features lead to the best clustering according to an external evaluation. On the one hand, our CA with sampling performs the clustering in a time that makes interaction with a search engine viable. On the other hand, our CA approach with the representation of the documents by means of entities leads to better results than representation by lemmas only.

引用

页码：601 / 612

页数：12

共 50 条

[1] A Clustering Based Genetic Algorithm for Feature Selection
Rostami, Mehrdad
Moradi, Parham
2014 6TH CONFERENCE ON INFORMATION AND KNOWLEDGE TECHNOLOGY (IKT), 2014, : 112 - 116
[2] Feature selection and document clustering
Dhillon, I
Kogan, J
Nicholas, C
SURVEY OF TEXT MINING: CLUSTERING, CLASSIFICATION, AND RETRIEVAL, 2004, : 73 - 100
[3] A feature selection Bayesian approach for a clustering genetic algorithm
Hruschka, ER
Hruschka, ER
Ebecken, NFF
DATA MINING IV, 2004, 7 : 181 - 192
[4] Feature selection by integrating document frequency with genetic algorithm for Amharic news document classification
Endalie, Demeke
Haile, Getamesay
Abebe, Wondmagegn Taye
PEERJ COMPUTER SCIENCE, 2022, 8
[5] A Feature Selection for Korean Web Document Clustering
Park, Heum
Kim, Young-Gi
Kwon, Hyuk-Chul
IECON 2004: 30TH ANNUAL CONFERENCE OF IEEE INDUSTRIAL ELECTRONICS SOCIETY, VOL 3, 2004, : 2650 - 2654
[6] LDA Based Feature Selection for Document Clustering
Kumar, B. Shravan
Ravi, Vadlamani
COMPUTE'17: PROCEEDINGS OF THE 10TH ANNUAL ACM INDIA COMPUTE CONFERENCE, 2017, : 125 - 130
[7] Application of Genetic Algorithm in Document Clustering
Wei Jian-Xiang
Liu Huai
Sun Yue-hong
Su Xin-Ning
2009 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND COMPUTER SCIENCE, VOL 1, PROCEEDINGS, 2009, : 145 - +
[8] A feature selection algorithm for document clustering based on word co-occurence frequency
Liu, YC
Wang, XL
Liu, BQ
PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 2963 - 2968
[9] Empirical Study on Unsupervised Feature Selection for Document Clustering
Mackute-Varoneckiene, Ausra
Krilavicius, Tomas
HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, BALTIC HLT 2014, 2014, 268 : 107 - +
[10] Unsupervised Feature Selection Technique Based on Genetic Algorithm for Improving the Text Clustering
Abualigah, Laith Mohammad
Khader, Ahamad Tajudin
Al-Betar, Mohammed Azmi
2016 7TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY (CSIT), 2016,

← 1 2 3 4 5 →