Automatic documents classification

被引:1
|
作者
Mohamed, Hoda K. [1 ]
机构
[1] Ain Shams U, Fac Engn, Comp & Syst Engn Dept, Cairo, Egypt
关键词
text classification; information retrieve; Stemmer algorithm; natural language processing and neural networks;
D O I
10.1109/ICCES.2007.4447022
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic document classification is of paramount importance to knowledge management in the information age. Document classification poses many challenges for learning systems since the feature vector used to represent a document must capture some of the complex semantics of natural language. In this paper, we design an automatic document classification system. We investigate the different parameters and design decisions that affect the building of automatic classifiers. The system creates an item vector for each document retrieved and assigns weights for each item. The vectors are selected using combined techniques from stemmer algorithm and natural language processing. Several weighting schema have been used. Documents are classified using neural network (NN). We investigate different cases applied to the NN classifier. Cases are classified according to weighting schema, effect of weighting words in the title, and the number of inputs to the classifier. Analyzing the performance of the classifier according to different cases is illustrated.
引用
收藏
页码:33 / 37
页数:5
相关论文
共 50 条
  • [31] Automatic Text Classification of PDF Documents using NLP Techniques
    Abdoun, Nabil
    Chami, Mohammad
    INCOSE International Symposium, 2022, 32 (01) : 1320 - 1331
  • [32] A system for the automatic layout segmentation and classification of digital documents.
    Cinque, L
    Levialdi, S
    Malizia, A
    12TH INTERNATIONAL CONFERENCE ON IMAGE ANALYSIS AND PROCESSING, PROCEEDINGS, 2003, : 201 - 206
  • [33] Multiple sets of features for automatic genre classification of web documents
    Lim, CS
    Lee, KJ
    Kim, GC
    INFORMATION PROCESSING & MANAGEMENT, 2005, 41 (05) : 1263 - 1276
  • [34] Automatic Documents Counterfeit Classification Using Image Processing and Analysis
    Vieira, Rafael
    Antunes, Mario
    Silva, Catarina
    Assis, Ana
    PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2017), 2017, 10255 : 400 - 407
  • [35] Variable Global Feature Selection Scheme for automatic classification of text documents
    Agnihotri, Deepak
    Verma, Kesari
    Tripathi, Priyanka
    EXPERT SYSTEMS WITH APPLICATIONS, 2017, 81 : 268 - 281
  • [36] INTERPRETATION OF AUTOMATIC CLASSIFICATION IN INFORMATION-RETRIEVAL (ROUGH SEARCH OF DOCUMENTS)
    PANYR, J
    INTERNATIONAL CLASSIFICATION, 1982, 9 (01): : 11 - 18
  • [37] Automatic Annotation Extension and Classification of Documents Using a Probabilistic Graphical Model
    Bouzaieni, Abdessalem
    Barrat, Sabine
    Tabbone, Salvatore
    2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 316 - 320
  • [38] AUTOMATIC CLASSIFICATION AND RETRIEVAL OF DOCUMENTS BY MEANS OF A BIBLIOGRAPHIC PATTERN DISCOVERY ALGORITHM
    SCHIMINOVICH, S
    INFORMATION STORAGE AND RETRIEVAL, 1971, 6 (06): : 417 - +
  • [39] Using Automatic Features for Text-image Classification in Amharic Documents
    Belay, Birhanu
    Habtegebrial, Tewodros
    Belay, Gebeyehu
    Stricker, Didier
    ICPRAM: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS, 2020, : 440 - 445
  • [40] Automatic classification and categorization: Application for identifying and thematically analysing unstructured textual documents
    Forest, D
    CANADIAN JOURNAL OF INFORMATION AND LIBRARY SCIENCE-REVUE CANADIENNE DES SCIENCES DE L INFORMATION ET DE BIBLIOTHECONOMIE, 2005, 29 (03): : 356 - 356