An Improved HITS Algorithm Based on Analysis of Web Page Links and Web Content Similarity

被引:6
|
作者
Yang, Weiming [1 ]
机构
[1] Chongqing Normal Univ, Coll Comp & Informat Sci, Chongqing, Peoples R China
关键词
HITS algorithm; Web content similarity; Authority page; Hub page;
D O I
10.1109/CW.2016.30
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
HITS (HyperLink Induced Topic Search) is a classical link analysis algorithm for analyzing WSM (Web Structure Mining). The algorithm takes into consideration of the structural information of links but ignores the correlation between pages and topics. In some cases, the problem of "topic drift" a deviation between search and topic would appear. For this purpose, the current paper presents an improved algorithm, by taking into account both of the web content similarity and link analysis. Our experiment shows that the improved algorithm has enhanced the correlation of search results and limited the occurrence of topic drift to some degree.
引用
收藏
页码:147 / 150
页数:4
相关论文
共 50 条
  • [31] A web page classification algorithm based on feature selection
    Zhou, Hongfang
    Guo, Jie
    Wang, Xinyi
    Duan, Wencong
    Wang, Peng
    Cao, Wenquan
    Journal of Information and Computational Science, 2015, 12 (04): : 1549 - 1556
  • [32] Visual similarity comparison for Web page retrieval
    Takama, Y
    Mitsuhashi, N
    2005 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, PROCEEDINGS, 2005, : 301 - 304
  • [33] Clustering web sessions by levels of page similarity
    Nichele, Caren Moraes
    Becker, Karin
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2006, 3918 : 346 - 350
  • [34] Web page analysis: Experiments based on discussion and purchase web patterns
    Kocibova, Jana
    Klos, Karel
    Lehecka, Ondrej
    Kudelka, Milos
    Snasel, Vaclav
    PROCEEDING OF THE 2007 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, WORKSHOPS, 2007, : 221 - 225
  • [35] Novel of Web Search Strategy Based on Web Page Block Granularity Analysis Algorithm and Correlation Calculation Model
    Fan, Ganglong
    Xu, Hongsheng
    CYBER SECURITY INTELLIGENCE AND ANALYTICS, 2020, 928 : 521 - 527
  • [36] Optimization of WEB Data Collection Technology Based on the HITS algorithm
    Mei, Desheng
    Li, Weibo
    He, Pin
    2013 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCES AND APPLICATIONS (CSA), 2013, : 119 - 122
  • [37] Applying web analysis in web page filtering
    Chau, M
    JCDL 2004: PROCEEDINGS OF THE FOURTH ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES: GLOBAL REACH AND DIVERSE IMPACT, 2004, : 376 - 376
  • [38] Web Page Recognition Algorithm Based on Link Analysis in Theme Search Engine
    Chen, Zude
    Liu, Jianxun
    Zhai, Haijun
    Jiang, Lei
    Cao, Buqing
    SECOND INTERNATIONAL CONFERENCE ON CLOUD AND GREEN COMPUTING / SECOND INTERNATIONAL CONFERENCE ON SOCIAL COMPUTING AND ITS APPLICATIONS (CGC/SCA 2012), 2012, : 405 - 409
  • [39] Genetic Algorithm Based Restructuring of Web Applications Using Web Page Relationships and Metrics
    Lee, Byungjeong
    Lee, Eunjoo
    Wu, Chisu
    INTELLIGENT COMPUTING, PART I: INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING, ICIC 2006, PART I, 2006, 4113 : 697 - 702
  • [40] Data Extraction from Web Forums Based on Similarity of Page Layout
    Wang, Yun
    Li, Bicheng
    Lin, Chen
    IEEE NLP-KE 2009: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2009, : 340 - 344