Towards an intelligent text categorization for web resources: An implementation

被引:0
|
作者
Zadrozny, S [1 ]
Lawcewicz, K [1 ]
Kacprzyk, J [1 ]
机构
[1] Polish Acad Sci, Syst Res Inst, PL-01447 Warsaw, Poland
关键词
automatic classification of documents; Internet; linguistic terms;
D O I
10.1016/B978-044451379-3/50012-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose the concept and implementation of a software system, TCAT (Text CATegorization) system, for an automatic recognition of a topic of an Internet document. In the training mode the user provides the system with a list of topics and sets of documents representing each topic (supervised learning). In the recognition mode the system automatically classifies previously unseen document to a topic category. A simple learning algorithm is devised and implemented. The results of the classification are presented to the user in the form of a set of linguistic terms. Some new measures of correctness of the classification are proposed. The implemented system processes documents in several popular Internet-related formats.
引用
收藏
页码:153 / 164
页数:12
相关论文
共 50 条
  • [21] Web search with text categorization using Probabilistic Framework of SVM
    Lim, B. P. C.
    Tsui, M. H.
    Charastrakul, V.
    Shi, D.
    2006 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-6, PROCEEDINGS, 2006, : 2950 - +
  • [22] Text Categorization Models for Identifying Unproven Cancer Treatments on the Web
    Aphinyanaphongs, Yin
    Aliferis, Constantin
    MEDINFO 2007: PROCEEDINGS OF THE 12TH WORLD CONGRESS ON HEALTH (MEDICAL) INFORMATICS, PTS 1 AND 2: BUILDING SUSTAINABLE HEALTH SYSTEMS, 2007, 129 : 968 - 972
  • [23] Dictionary-based text categorization of chemical web pages
    Liang, CY
    Guo, L
    Xia, ZH
    Nie, FG
    Li, XX
    Su, LA
    Yang, ZY
    INFORMATION PROCESSING & MANAGEMENT, 2006, 42 (04) : 1017 - 1029
  • [24] Using the Web as corpus for self-training text categorization
    Rafael Guzmán-Cabrera
    Manuel Montes-y-Gómez
    Paolo Rosso
    Luis Villaseñor-Pineda
    Information Retrieval, 2009, 12 : 400 - 415
  • [25] Application for Web Text Categorization Based on Support Vector Machine
    Pan Hao
    Duan Ying
    Tan Longyuan
    2009 INTERNATIONAL FORUM ON COMPUTER SCIENCE-TECHNOLOGY AND APPLICATIONS, VOL 2, PROCEEDINGS, 2009, : 42 - 45
  • [26] Using the Web as corpus for self-training text categorization
    Guzman-Cabrera, Rafael
    Montes-y-Gomez, Manuel
    Rosso, Paolo
    Villasenor-Pineda, Luis
    INFORMATION RETRIEVAL, 2009, 12 (03): : 400 - 415
  • [27] IntelliSearch:: Intelligent search for images and text on the web
    Voutsakis, Epimenides
    Petrakis, Euripides G. M.
    Milios, Evangelos
    IMAGE ANALYSIS AND RECOGNITION, PT 1, 2006, 4141 : 697 - 708
  • [28] Implementation of unsupervised and supervised learning systems for multilingual text categorization
    Lee, Chung-Hong
    Yang, Hsin-Chang
    INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY, PROCEEDINGS, 2007, : 377 - +
  • [29] Towards machine learning based text categorization in the financial domain
    Voigt, Frederic
    Calero, Jose Alcarez
    Dahal, Keshav
    Wang, Qi
    von Luck, Kai
    Stelldinger, Peer
    2024 IEEE 3RD CONFERENCE ON INFORMATION TECHNOLOGY AND DATA SCIENCE, CITDS 2024, 2024, : 235 - 240
  • [30] Towards content trust of web resources
    Gil, Yolanda
    Artz, Donovan
    JOURNAL OF WEB SEMANTICS, 2007, 5 (04): : 227 - 239