Enhanced information retrieval by using HTML']HTML tags

被引:0
|
作者
Werner, L [1 ]
Böttcher, S [1 ]
Beckmann, R [1 ]
机构
[1] Univ Gesamthsch Paderborn, C LAB, D-4790 Paderborn, Germany
关键词
typographical information; text classification; HTMEL tags;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Whenever digital libraries or knowledge management systems are to be automatically filled with web pages from the internet, document classification of the web pages is one of the major challenges. We present an approach which uses HTML tags in order to improve the quality of the hypertext document classification. Our approach uses weighting of HTML tags for separating relevant information in hypertext documents from the noise. We have evaluated our approach on the basis of a document classification algorithm. The results show that our weighting approach yields a classification which is approximately 35% better than a classification without the use of the HTML tagging information.
引用
收藏
页码:24 / 29
页数:6
相关论文
共 50 条
  • [21] Genetic Algorithm Based to Improve HTML']HTML Document Retrieval
    Al-Dallal, Ammar
    Abdul-Wahab, Rasha S.
    2009 SECOND INTERNATIONAL CONFERENCE ON DEVELOPMENTS IN ESYSTEMS ENGINEERING (DESE 2009), 2009, : 343 - +
  • [22] Improving Accessibility of HTML']HTML Documents by Generating Image-Tags in a Proxy
    Keysers, Daniel
    Renn, Marius
    Breuel, Thomas M.
    ASSETS'07: PROCEEDINGS OF THE NINTH INTERNATIONAL ACM SIGACCESS CONFERENCE ON COMPUTERS AND ACCESSIBILITY, 2007, : 249 - 250
  • [23] WEB ACCESSIBILITY TOOL FOR VISUALLY IMPAIRED ACTIVATED THROUGH HTML']HTML TAGS
    Peraza Garzon, Juan Francisco
    Estrada Lizarraga, Rogelio
    Olivarria Gonzalez, Monica del Carmen
    Zaragoza Gonzalez, Jose Nicolas
    Mendoza Zatarain, Rafael
    Ortega Carrillo, Jose Antonio
    Cobian Campos, Jose Alfredo
    INTED2015: 9TH INTERNATIONAL TECHNOLOGY, EDUCATION AND DEVELOPMENT CONFERENCE, 2015, : 7549 - 7552
  • [24] The discovery laboratory using HTML']HTML
    Lamba, RS
    DelaCuetara, R
    Sharma, SP
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1996, 211 : 102 - CHED
  • [25] Contextual weighted representations and indexing models for the retrieval of HTML']HTML documents
    Pereira, RAM
    Molinari, A
    Pasi, G
    SOFT COMPUTING, 2005, 9 (07) : 481 - 492
  • [26] An approach to measuring extent of use of Web functionalities: A content analysis of HTML']HTML tags
    Pardue, JH
    Chatterjee, D
    ASSOCIATION FOR INFORMATION SYSTEMS PROCEEDING OF THE AMERICAS CONFERENCE ON INFORMATION SYSTEMS, 1997, : 233 - 235
  • [27] Webpage stegano compression approach using attributes in html tags
    Al-Rababaa, M.S.
    Al-Nihoud, J.Q.
    International Review on Computers and Software, 2010, 5 (02) : 181 - 185
  • [28] Rec.HTML']HTML: Declarative HTML']HTML
    Reynders, Bob
    Choi, Kwanghoon
    COMPANION PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON THE ART, SCIENCE, AND ENGINEERING OF PROGRAMMING (PROGRAMMING 2021 COMPANION), 2021, : 1 - 5
  • [29] Transcoding HTML']HTML to VoiceXML using annotation
    Shao, ZY
    Capra, R
    Pérez-Quiñones, MA
    15TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2003, : 249 - 258
  • [30] HTML']HTML document broadcast method for disaster information systems
    Ishikawa, Y.
    Kosugi, Y.
    10TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY, VOLS I-III: INNOVATIONS TOWARD FUTURE NETWORKS AND SERVICES, 2008, : 1858 - 1863