A Focused Crawler Based on Correlation Analysis

被引:0
|
作者
Qin, Qiuli [1 ]
Peng, Xin [1 ]
机构
[1] Beijing Jiaotong Univ, Sch Econ & Management, Logist Technol & Management Lab, Beijing 100044, Peoples R China
关键词
Focused Crawler; web crawler; VSM; TF-IDF;
D O I
10.14257/ijfgcn.2014.7.6.02
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
With the rapid development of network and information technology, there is a wealth of huge amounts of data on the internet. But it's a major problem faced by the majority of researchers how to effectively filter out a particular subject or field of information from these data. In this paper, we try to builder a focused crawler based on vector space model and TF-IDF text correlation analysis. We take the seed URL as a collection entrance and fetch web pages from internet. Then analysis page information though technological like web content extraction, page link analysis technology and get the main content of one page. By the correlation analysis method based on VSM and TF-IDF text, we calculation the correlation between pages and the topics what have been defined, so we can achieve the purpose of the focus areas of the web.
引用
收藏
页码:13 / 20
页数:8
相关论文
共 50 条
  • [31] A Semantic Focused Web Crawler Based on a Knowledge Representation Schema
    Hernandez, Julio
    Marin-Castro, Heidy M.
    Morales-Sandoval, Miguel
    APPLIED SCIENCES-BASEL, 2020, 10 (11):
  • [32] A focused crawler with document segmentation
    Yang, JY
    Kang, JB
    Choi, JM
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING IDEAL 2005, PROCEEDINGS, 2005, 3578 : 94 - 101
  • [33] A novel incremental parallel web crawler based on focused crawling
    Huang, Qiuyan
    Li, Qingzhong
    Yan, Zhongmin
    Fu, Hong
    Journal of Computational Information Systems, 2013, 9 (06): : 2461 - 2469
  • [34] Preliminary design of a context-graph-based focused crawler
    Li, Daosheng
    Zhao, Qiang
    Jisuanji Gongcheng/Computer Engineering, 2006, 32 (12): : 208 - 209
  • [35] Design and Implementation of The Topic-focused Crawler Based on Scrapy
    Xe, Dongxiang
    Xia, Wenfeng
    ADVANCES IN APPLIED SCIENCES AND MANUFACTURING, PTS 1 AND 2, 2014, 850-851 : 487 - +
  • [36] A Focused Crawler URL Analysis Algorithm based on Semantic Content and Link Clustering in Cloud Environment
    Li, Mingming
    Li, Chunlin
    Wu, Chao
    Luo, Youlong
    INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2015, 8 (02): : 49 - 60
  • [37] A Focused Crawler Based on Topic Boundary Around an Unvisited Link
    Zhang, Huan
    Liu, Nai-wen
    INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND MANAGEMENT ENGINEERING (ITME 2014), 2014, : 121 - 125
  • [38] Crawling Strategy of Focused Crawler Based on Niche Genetic Algorithm
    Fan, Huilian
    Zeng, Guangpu
    Li, Xianli
    EIGHTH IEEE INTERNATIONAL CONFERENCE ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, PROCEEDINGS, 2009, : 591 - +
  • [39] A Framework of a Hybrid Focused Web Crawler
    Sun, Yixue
    Jin, Peiquan
    Yue, Lihua
    2008 SECOND INTERNATIONAL CONFERENCE ON FUTURE GENERATION COMMUNICATION AND NETWORKING SYMPOSIA, VOLS 1-5, PROCEEDINGS, 2008, : 146 - 149
  • [40] Focused Crawler for the Acquisition of Health Articles
    Amalia, Amalia
    Gunawan, Dani
    Najwan, Atras
    Meirina, Fathia
    PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON DATA AND SOFTWARE ENGINEERING (ICODSE), 2016,