A Focused Crawler Based on Correlation Analysis

被引:0
|
作者
Qin, Qiuli [1 ]
Peng, Xin [1 ]
机构
[1] Beijing Jiaotong Univ, Sch Econ & Management, Logist Technol & Management Lab, Beijing 100044, Peoples R China
关键词
Focused Crawler; web crawler; VSM; TF-IDF;
D O I
10.14257/ijfgcn.2014.7.6.02
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
With the rapid development of network and information technology, there is a wealth of huge amounts of data on the internet. But it's a major problem faced by the majority of researchers how to effectively filter out a particular subject or field of information from these data. In this paper, we try to builder a focused crawler based on vector space model and TF-IDF text correlation analysis. We take the seed URL as a collection entrance and fetch web pages from internet. Then analysis page information though technological like web content extraction, page link analysis technology and get the main content of one page. By the correlation analysis method based on VSM and TF-IDF text, we calculation the correlation between pages and the topics what have been defined, so we can achieve the purpose of the focus areas of the web.
引用
收藏
页码:13 / 20
页数:8
相关论文
共 50 条
  • [41] An application of improved PageRank in focused crawler
    Zhang, Yulian
    Yin, Chunxia
    Yuan, Fuyong
    FOURTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2007, : 331 - 335
  • [42] A Focused Crawler for Dark Web Forums
    Fu, Tianjun
    Abbasi, Ahmed
    Chen, Hsinchun
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2010, 61 (06): : 1213 - 1231
  • [43] An architecture for a focused trend parallel Web crawler with the application of clickstrearn analysis
    Ahmadi-Abkenari, Fatemeh
    Selamat, Ali
    INFORMATION SCIENCES, 2012, 184 (01) : 266 - 281
  • [44] An algorithm OFC for the focused web crawler
    Zhu, Qiang
    PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 4059 - 4063
  • [45] FCHC: A Social Semantic Focused Crawler
    Thukral, Anjali
    Mendiratta, Varun
    Behl, Abhishek
    Banati, Hema
    Bedi, Punam
    ADVANCES IN COMPUTING AND COMMUNICATIONS, PT 2, 2011, 191 : 273 - +
  • [46] A rule-based obfuscating focused crawler in the audio retrieval domain
    Marco Montanaro
    Antonio Maria Rinaldi
    Cristiano Russo
    Cristian Tommasino
    Multimedia Tools and Applications, 2024, 83 : 25231 - 25260
  • [47] An improved focused crawler based on Semantic Similarity Vector Space Model
    Du, Yajun
    Liu, Wenjun
    Lv, Xianjing
    Peng, Guoli
    APPLIED SOFT COMPUTING, 2015, 36 : 392 - 407
  • [48] Focused Crawler Strategy Based on Improved Energy Landscape Paving Algorithm
    Liu, Jingfa
    Zhang, Wei
    Yang, Zhihe
    Liu, Ziang
    FUZZY SYSTEMS AND DATA MINING VI, 2020, 331 : 536 - 545
  • [49] LSCrawler: A framework for an enhanced focused web crawler based on link semantics
    Yuvarani, M.
    Iyengar, N. Ch. S. N.
    Kannan, A.
    2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 794 - 797
  • [50] A Focused Event Crawler with Temporal Intent
    Wu, Hao
    Hou, Dongyang
    APPLIED SCIENCES-BASEL, 2023, 13 (07):