A Focused Crawler Based on Correlation Analysis

被引:0
|
作者
Qin, Qiuli [1 ]
Peng, Xin [1 ]
机构
[1] Beijing Jiaotong Univ, Sch Econ & Management, Logist Technol & Management Lab, Beijing 100044, Peoples R China
关键词
Focused Crawler; web crawler; VSM; TF-IDF;
D O I
10.14257/ijfgcn.2014.7.6.02
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
With the rapid development of network and information technology, there is a wealth of huge amounts of data on the internet. But it's a major problem faced by the majority of researchers how to effectively filter out a particular subject or field of information from these data. In this paper, we try to builder a focused crawler based on vector space model and TF-IDF text correlation analysis. We take the seed URL as a collection entrance and fetch web pages from internet. Then analysis page information though technological like web content extraction, page link analysis technology and get the main content of one page. By the correlation analysis method based on VSM and TF-IDF text, we calculation the correlation between pages and the topics what have been defined, so we can achieve the purpose of the focus areas of the web.
引用
收藏
页码:13 / 20
页数:8
相关论文
共 50 条
  • [21] Centroid-based focused crawler with incremental ability
    Wang, Hui
    Zuo, Wanli
    Wang, Huiyu
    Ning, Aijun
    Sun, Zhiwei
    Man, Chunlei
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2009, 46 (02): : 217 - 224
  • [22] Research on Text Mining Algorithm Based on Focused Crawler
    Zhang, Qiusheng
    Lin, Mingyu
    Jun, Jianping
    Zhang, Xingyun
    2017 12TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND EDUCATION (ICCSE 2017), 2017, : 454 - 457
  • [23] Designing Focused Crawler Based On Improved Genetic Algorithm
    Yan, Wei
    Pan, Li
    PROCEEDINGS OF 2018 TENTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTATIONAL INTELLIGENCE (ICACI), 2018, : 319 - 323
  • [24] Focused Crawler Framework Based on Open Search Engine
    Liu, Jiawei
    Huang, Yongfeng
    CLOUD COMPUTING AND SECURITY, PT III, 2018, 11065 : 56 - 68
  • [25] An improved focused web crawler based on hybrid similarity
    Shang S.
    Wu H.
    Ma J.
    International Journal of Performability Engineering, 2019, 15 (10) : 2645 - 2656
  • [26] Keyword Focused Web Crawler
    Agre, Gunjan H.
    Mahajan, Nikita V.
    2015 2ND INTERNATIONAL CONFERENCE ON ELECTRONICS AND COMMUNICATION SYSTEMS (ICECS), 2015, : 1089 - 1092
  • [27] Improvement of PageRank for focused crawler
    Yuan, Fuyong
    Yin, Chunxia
    Jian, Liu
    SNPD 2007: EIGHTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING, AND PARALLEL/DISTRIBUTED COMPUTING, VOL 2, PROCEEDINGS, 2007, : 797 - +
  • [28] A focused crawler based on semantic disambiguation vector space model
    Liu, Wenjun
    He, Yu
    Wu, Jing
    Du, Yajun
    Liu, Xing
    Xi, Tiejun
    Gan, Zurui
    Jiang, Pengjun
    Huang, Xiaoping
    COMPLEX & INTELLIGENT SYSTEMS, 2023, 9 (01) : 345 - 366
  • [29] A focused crawler based on semantic disambiguation vector space model
    Wenjun Liu
    Yu He
    Jing Wu
    Yajun Du
    Xing Liu
    Tiejun Xi
    Zurui Gan
    Pengjun Jiang
    Xiaoping Huang
    Complex & Intelligent Systems, 2023, 9 : 345 - 366
  • [30] Focused Crawler for Finding Professional Events Based on User Interests
    Ozel, Selma Ayse
    Sarac, Esra
    23RD INTERNATIONAL SYMPOSIUM ON COMPUTER AND INFORMATION SCIENCES, 2008, : 441 - 444