Automatic documents classification

被引:1
|
作者
Mohamed, Hoda K. [1 ]
机构
[1] Ain Shams U, Fac Engn, Comp & Syst Engn Dept, Cairo, Egypt
关键词
text classification; information retrieve; Stemmer algorithm; natural language processing and neural networks;
D O I
10.1109/ICCES.2007.4447022
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic document classification is of paramount importance to knowledge management in the information age. Document classification poses many challenges for learning systems since the feature vector used to represent a document must capture some of the complex semantics of natural language. In this paper, we design an automatic document classification system. We investigate the different parameters and design decisions that affect the building of automatic classifiers. The system creates an item vector for each document retrieved and assigns weights for each item. The vectors are selected using combined techniques from stemmer algorithm and natural language processing. Several weighting schema have been used. Documents are classified using neural network (NN). We investigate different cases applied to the NN classifier. Cases are classified according to weighting schema, effect of weighting words in the title, and the number of inputs to the classifier. Analyzing the performance of the classifier according to different cases is illustrated.
引用
收藏
页码:33 / 37
页数:5
相关论文
共 50 条
  • [1] Individualized Automatic Classification of Web Documents
    Tsai, Yihjia
    Chen, Kaun-Yu
    PROCEEDINGS OF 2010 CROSS-STRAIT CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY, 2010, : 410 - 412
  • [2] Chinese Automatic Documents Classification System
    Li, Li-Rui
    Yang, Kai
    PROCEEDINGS OF 2010 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY (ICCSIT 2010), VOL 5, 2010, : 324 - 327
  • [3] Automatic classification method for XML documents
    Haitao W.
    Zhenmin T.
    International Journal of Digital Content Technology and its Applications, 2011, 5 (12) : 153 - 161
  • [4] Automatic classification of journalistic documents on the Internet
    Oliveira, Elias
    Branquinho Filho, Delermando
    TRANSINFORMACAO, 2017, 29 (03): : 245 - 255
  • [5] STEINADLER - SYSTEM OF AUTOMATIC DESCRIPTION AND CLASSIFICATION OF DOCUMENTS
    PANYR, J
    NACHRICHTEN FUR DOKUMENTATION, 1978, 29 (4-5): : 184 - 191
  • [6] Study for Automatic Classification of Arabic Spoken Documents
    Labidi, Mohamed
    Maraoui, Mohsen
    Zrigui, Mounir
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2017, PT II, 2017, 10449 : 459 - 468
  • [7] Automatic classification of patent documents for TRIZ users
    Tong, Loh Han
    Cong, He
    Lixiang, Shen
    WORLD PATENT INFORMATION, 2006, 28 (01) : 6 - 13
  • [8] An adaptive system for automatic invoice-documents classification
    Alippi, C
    Pessina, F
    Roveri, M
    2005 International Conference on Image Processing (ICIP), Vols 1-5, 2005, : 1225 - 1228
  • [9] DAN: An automatic segmentation and classification engine for paper documents
    Dept. of Information Science, Universita La Sapienza di Roma, Via Salaria 113, Roma
    00198, Italy
    Lect. Notes Comput. Sci., 1600, (491-502):
  • [10] Use of noun phrases in automatic classification of electronic documents
    Maia, Luiz Claudio
    Souza, Renato Rocha
    PERSPECTIVAS EM CIENCIA DA INFORMACAO, 2010, 15 (01): : 154 - 172