An Improved HITS Algorithm Based on Analysis of Web Page Links and Web Content Similarity

被引:6
|
作者
Yang, Weiming [1 ]
机构
[1] Chongqing Normal Univ, Coll Comp & Informat Sci, Chongqing, Peoples R China
关键词
HITS algorithm; Web content similarity; Authority page; Hub page;
D O I
10.1109/CW.2016.30
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
HITS (HyperLink Induced Topic Search) is a classical link analysis algorithm for analyzing WSM (Web Structure Mining). The algorithm takes into consideration of the structural information of links but ignores the correlation between pages and topics. In some cases, the problem of "topic drift" a deviation between search and topic would appear. For this purpose, the current paper presents an improved algorithm, by taking into account both of the web content similarity and link analysis. Our experiment shows that the improved algorithm has enhanced the correlation of search results and limited the occurrence of topic drift to some degree.
引用
收藏
页码:147 / 150
页数:4
相关论文
共 50 条
  • [41] Predicting web page performance level based on web page characteristics
    Zhou, Junzan
    Zhang, Yun
    Zhou, Bo
    Li, Shanping
    International Journal of Web Engineering and Technology, 2015, 10 (02) : 152 - 169
  • [42] Analysis of Web Access Sequence Based on the Improved PrefixSpan Algorithm
    Xu, Yang
    Wang, Yu
    PROCEEDINGS OF THE 2015 INTERNATIONAL INDUSTRIAL INFORMATICS AND COMPUTER ENGINEERING CONFERENCE, 2015, : 788 - 791
  • [43] A method for supporting web page design based on impression of web page
    Watanabe, M
    Yoshida, T
    Saiwaki, N
    Nishida, S
    IEEE RO-MAN 2000: 9TH IEEE INTERNATIONAL WORKSHOP ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, PROCEEDINGS, 2000, : 13 - 17
  • [44] Identifying spam Web pages based on content similarity
    Pera, Maria Soledad
    Ng, Yiu-Kai
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2008, PT 2, PROCEEDINGS, 2008, 5073 : 204 - 219
  • [45] Entropy based content filtering for Mobile Web Page Adaptation
    Narwal, Neetu
    Sharma, Sanjay Kumar
    Singh, Amit Prakash
    PROCEEDING OF THE THIRD INTERNATIONAL SYMPOSIUM ON WOMEN IN COMPUTING AND INFORMATICS (WCI-2015), 2015, : 588 - 594
  • [46] Web Page Classification based on Context to the Content Extraction of Articles
    Patel, Ankit Dilip
    Pandya, Vimal N.
    2017 2ND INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2017, : 539 - 541
  • [47] Basic Semantic Units Based Web Page Content Extraction
    Wang, Jingqi
    Chen, Qingcai
    Wang, Xiaolong
    Guo, Hongzhi
    2008 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), VOLS 1-6, 2008, : 1488 - 1493
  • [48] Content-based Title Extraction from Web Page
    Gali, Najlah
    Franti, Pasi
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 2 (WEBIST), 2016, : 204 - 210
  • [49] Research on Improved Clustering Algorithm on Web Usage Mining based on Scientific Analysis of Web Materials
    Li, Bin
    Yang, Jin
    Liu, Caiming
    Zhang, Jiandong
    Zhang, Yan
    ADVANCED RESEARCH ON MECHANICAL ENGINEERING, INDUSTRY AND MANUFACTURING ENGINEERING, PTS 1 AND 2, 2011, 63-64 : 863 - +
  • [50] An improved focused web crawler based on hybrid similarity
    Shang S.
    Wu H.
    Ma J.
    International Journal of Performability Engineering, 2019, 15 (10) : 2645 - 2656