Research of Information Retrieval Based on Web Page Segmentation

被引:0
|
作者
Yu, Yangxin [1 ]
机构
[1] Huaiyin Inst Technol, Fac Comp Engn, Huaian 223003, Peoples R China
关键词
Page Segment; Information Retrieval; !text type='HTML']HTML[!/text] Tag; Similarity;
D O I
10.4028/www.scientific.net/AMM.204-208.4928
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
A Web information retrieval algorithm based on Web page segment is designed, the key idea of which is to segment each Web page into different topic areas or segments according to its HTML tags and contents since Web pages are semi-structure. First, the algorithm builds a HTML tag tree, and then it combines nodes in the tree under the rule of content similarity and visual similarity. During the process of retrieval and ranking, the algorithm makes full use of the segmentation information to sequence the relevant pages. The experimental results show that this method is able to improve the precision in search significantly and it is also a good reference for the design of the future search engines.
引用
收藏
页码:4928 / 4931
页数:4
相关论文
共 50 条
  • [1] Web Page Segmentation and its Application for Web Information Crawling
    Feng, Hanyang
    Zhang, Wenzhe
    Wu, Hesheng
    Wang, Chong-Jun
    2016 IEEE 28TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2016), 2016, : 598 - 605
  • [2] Research on effective Web information retrieval based on Semantic Web
    Xiao Min
    Xiong Qianxing
    Wang Chunhua
    Pu Qiumei
    DCABES 2007 PROCEEDINGS, VOLS I AND II, 2007, : 903 - 905
  • [3] Web Page Segmentation Towards Information Extraction for Web Semantics
    Malhotra, Pooja
    Malik, Sanjay Kumar
    INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING AND COMMUNICATIONS, VOL 2, 2019, 56 : 431 - 442
  • [4] Agent Based Weighted Page Ranking Algorithm for Web Content Information Retrieval
    Nagappan, V. K.
    Elango, P.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATIONS TECHNOLOGIES (ICCCT 15), 2015, : 31 - 36
  • [5] Research and Application of Web Information Retrieval Based on Ontology
    Shen, Qi
    Zhang, Meng
    Song, Qingming
    Tang, Yan
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTER, NETWORKS AND COMMUNICATION ENGINEERING (ICCNCE 2013), 2013, 30 : 424 - 427
  • [6] A web page segmentation algorithm for extracting product information
    Wu, Changjun
    Zeng, Guosun
    Xu, Guorong
    2006 IEEE INTERNATIONAL CONFERENCE ON INFORMATION ACQUISITION, VOLS 1 AND 2, CONFERENCE PROCEEDINGS, 2006, : 1374 - 1379
  • [7] Web page segmentation based on Gestalt theory
    Xiang, Peifeng
    Yang, Xin
    Shi, Yuanchun
    2007 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-5, 2007, : 2253 - 2256
  • [8] Web page dependent vision based segmentation for web sites
    Ko, Pyungkwan
    Kang, Sanggil
    Kumar, Harshit
    7TH IEEE/ACIS INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE IN CONJUNCTION WITH 2ND IEEE/ACIS INTERNATIONAL WORKSHOP ON E-ACTIVITY, PROCEEDINGS, 2008, : 690 - +
  • [9] Research on Web Semantic Information Retrieval Technology Based on Ontology
    Zhou, Hong
    Liu, Jun
    KNOWLEDGE ENGINEERING AND MANAGEMENT, 2011, 123 : 227 - 234
  • [10] Research on web information retrieval based on vector space model
    Bo Ning, Zhang Ji
    Journal of Networks, 2013, 8 (03) : 688 - 695