Enhanced information retrieval by using HTML']HTML tags

被引:0
|
作者
Werner, L [1 ]
Böttcher, S [1 ]
Beckmann, R [1 ]
机构
[1] Univ Gesamthsch Paderborn, C LAB, D-4790 Paderborn, Germany
关键词
typographical information; text classification; HTMEL tags;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Whenever digital libraries or knowledge management systems are to be automatically filled with web pages from the internet, document classification of the web pages is one of the major challenges. We present an approach which uses HTML tags in order to improve the quality of the hypertext document classification. Our approach uses weighting of HTML tags for separating relevant information in hypertext documents from the noise. We have evaluated our approach on the basis of a document classification algorithm. The results show that our weighting approach yields a classification which is approximately 35% better than a classification without the use of the HTML tagging information.
引用
收藏
页码:24 / 29
页数:6
相关论文
共 50 条
  • [41] The Design and Implementation of the Random HTML']HTML Tags and Attributes-Based XSS Defence System
    Lin, Heng
    Yan, Yiwen
    Cai, Hongfei
    Zhang, Wei
    ADVANCES IN SWARM INTELLIGENCE, ICSI 2014, PT II, 2014, 8795 : 204 - 211
  • [42] Dynamic HTML']HTML: The HTML']HTML developer's guide.
    Gillespie, T
    LIBRARY JOURNAL, 1999, 124 (13) : 132 - 132
  • [43] Image labeling using key sentences of HTML']HTML
    Sagara, N
    Sunayama, W
    Yachida, M
    ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, 2006, 89 (07): : 31 - 41
  • [44] USING HTML']HTML, SPECIAL EDITION - SAVOLA,T
    VALAUSKAS, EJ
    LIBRARY JOURNAL, 1995, 120 (16) : 114 - 114
  • [45] Web indexing using HTML']HTML Priority System
    Sagar, Yashwant
    2015 1ST INTERNATIONAL CONFERENCE ON FUTURISTIC TRENDS ON COMPUTATIONAL ANALYSIS AND KNOWLEDGE MANAGEMENT (ABLAZE), 2015, : 581 - 584
  • [46] Using hyperlink to organize SAS HTML']HTML output
    Li, C
    Sun, J
    PROCEEDINGS OF THE TWENTY-THIRD ANNUAL SAS USERS GROUP INTERNATIONAL CONFERENCE, 1998, : 986 - 990
  • [47] HTML']HTML & XHTML']HTML: The definitive guide
    Robertson, A
    TECHNICAL COMMUNICATION, 2001, 48 (04) : 498 - 500
  • [48] Results of accessibility analysis of HTML']HTML and the implications for future information technologies
    Chisholm, W
    Vanderheiden, GC
    PROCEEDINGS OF THE HUMAN FACTORS AND ERGONOMICS SOCIETY 42ND ANNUAL MEETING, VOLS 1 AND 2, 1998, : 1028 - 1032
  • [49] Information extraction from HTML']HTML tables base on domain ontology
    Hsiao, SL
    Chou, SC
    Chang, LP
    IKE'03: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE ENGINEERING, VOLS 1 AND 2, 2003, : 70 - 76
  • [50] SGML to the rescue - Using SGML with modern HTML']HTML
    Reichardt, Marcus
    DOCENG'19: PROCEEDINGS OF THE ACM SYMPOSIUM ON DOCUMENT ENGINEERING 2019, 2019,