Enhanced information retrieval by using HTML']HTML tags

被引:0
|
作者
Werner, L [1 ]
Böttcher, S [1 ]
Beckmann, R [1 ]
机构
[1] Univ Gesamthsch Paderborn, C LAB, D-4790 Paderborn, Germany
关键词
typographical information; text classification; HTMEL tags;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Whenever digital libraries or knowledge management systems are to be automatically filled with web pages from the internet, document classification of the web pages is one of the major challenges. We present an approach which uses HTML tags in order to improve the quality of the hypertext document classification. Our approach uses weighting of HTML tags for separating relevant information in hypertext documents from the noise. We have evaluated our approach on the basis of a document classification algorithm. The results show that our weighting approach yields a classification which is approximately 35% better than a classification without the use of the HTML tagging information.
引用
收藏
页码:24 / 29
页数:6
相关论文
共 50 条
  • [1] Image Retrieval From WWW Using Attributes in HTML']HTML TAGs
    Vadivu, P. Shanmuga
    Sumathy, P.
    Vadivel, A.
    2ND INTERNATIONAL CONFERENCE ON COMMUNICATION, COMPUTING & SECURITY [ICCCS-2012], 2012, 1 : 509 - 516
  • [2] Study on the Technology of Information Hiding Based on HTML']HTML Tags
    Wang, Xiaofeng
    ADVANCES IN APPLIED SCIENCE AND INDUSTRIAL TECHNOLOGY, PTS 1 AND 2, 2013, 798-799 : 423 - 426
  • [3] A fuzzy representation of HTML']HTML documents for information retrieval systems
    Molinari, A
    Pasi, G
    FUZZ-IEEE '96 - PROCEEDINGS OF THE FIFTH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-3, 1996, : 107 - 112
  • [4] HTML']HTML Web Content Extraction Using Paragraph Tags
    Carey, Howard J., III
    Manic, Milos
    PROCEEDINGS 2016 IEEE 25TH INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS (ISIE), 2016, : 1099 - 1104
  • [5] Detecting similar HTML']HTML documents using a fuzzy set information retrieval approach
    Yerra, R
    Ng, YK
    2005 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING, VOLS 1 AND 2, 2005, : 693 - 699
  • [6] Using the structure of HTML']HTML documents to improve retrieval
    Cutler, M
    Shih, YM
    Meng, WY
    PROCEEDINGS OF THE USENIX SYMPOSIUM ON INTERNET TECHNOLOGIES AND SYSTEMS, 1997, : 241 - 251
  • [7] Using Semantic-Level Tags in HTML']HTML/XML Documents
    Henschen, Lawrence J.
    Lee, Julia C.
    UNIVERSAL ACCESS IN HUMAN-COMPUTER INTERACTION: APPLICATIONS AND SERVICES, PT III, 2009, 5616 : 683 - 692
  • [8] Web content topic modeling using LDA and HTML']HTML tags
    Altarturi, Hamza H. M.
    Saadoon, Muntadher
    Anuar, Nor Badrul
    PEERJ COMPUTER SCIENCE, 2023, 9
  • [9] On the synthesis of metadata tags for HTML']HTML files
    Jimenez, Patricia
    Roldan, Juan C.
    Gallego, Fernando O.
    Corchuelo, Rafael
    SOFTWARE-PRACTICE & EXPERIENCE, 2020, 50 (12): : 2169 - 2192
  • [10] Toward a retrieval of HTML']HTML documents using a semantic approach
    Ferri, F
    Ghiselli, C
    Grifoni, P
    Padula, M
    2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 1571 - 1574