A New Hidden Web Crawling Approach

被引:0
|
作者
Saoudi, L. [1 ]
Boukerram, A. [2 ]
Mhamedi, S. [1 ]
机构
[1] Mohammed Boudiaf Univ, Dept Comp Sci, Msila, Algeria
[2] Abderrahmane Mira Univ, Dept Comp Sci, Bejaia, Algeria
关键词
Deep crawler; Hidden Web crawler; SQLI query; form submission; searchable forms;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Traditional search engines deal with the Surface Web which is a set of Web pages directly accessible through hyperlinks and ignores a large part of the Web called hidden Web which is a great amount of valuable information of online database which is "hidden" behind the query forms. To access to those information the crawler have to fill the forms with a valid data, for this reason we propose a new approach which use SQLI technique in order to find the most promising keywords of a specific domain for automatic form submission. The effectiveness of proposed framework has been evaluated through experiments using real web sites and encouraging preliminary results were obtained
引用
收藏
页码:293 / 297
页数:5
相关论文
共 50 条
  • [21] An automatic label extraction technique for domain-specific hidden web crawling (LEHW)
    El-Desouky, Ali I.
    Ali, Hesham A.
    El-Ghamrawy, Sally M.
    2006 INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS, 2006, : 454 - +
  • [22] A Novel Approach for Crawling the Opinions from World Wide Web
    Bhatia, Surbhi
    Sharma, Manisha
    Bhatia, Komal Kumar
    INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2016, 6 (02) : 1 - 23
  • [23] Crawling the web with OntoDir
    Picariello, Antonio
    Rinaldi, Antonio M.
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2007, 4653 : 730 - +
  • [24] Crawling the infinite web
    Baeza-Yates, Ricardo
    Castillo, Carlos
    JOURNAL OF WEB ENGINEERING, 2007, 6 (01): : 49 - 72
  • [25] Crawling toward the Web
    Sinclair, Ken
    Engineered Systems, 2002, 19 (11):
  • [26] On the Stability of Web Crawling and Web Search
    Anderson, Reid
    Borgs, Christian
    Chayes, Jennifer
    Hopcroft, John
    Mirrokni, Vahab
    Teng, Shang-Hua
    ALGORITHMS AND COMPUTATION, PROCEEDINGS, 2008, 5369 : 680 - 691
  • [27] A TNATS approach to hidden web documents
    Hedley, Yih-Ling
    Younas, Muhammad
    James, Anne
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2004, 3347 : 158 - 167
  • [28] A TNATS approach to hidden web documents
    Hedley, YL
    Younas, M
    James, A
    DISTRIBUTED COMPUTING AND INTERNET TECHNOLOGY, PROCEEDINGS, 2004, 3347 : 158 - 167
  • [29] A Memory Efficient Approach for Crawling Language Specific Web: The Arabic Web as a Case Study
    Ezzat, D.
    Abdeen, M.
    Tolba, M. F.
    2009 INTERNATIONAL CONFERENCE ON INFORMATION MANAGEMENT AND ENGINEERING, PROCEEDINGS, 2009, : 584 - 587
  • [30] Crawling Deep Web Using a New Set Covering Algorithm
    Wang, Yan
    Lu, Jianguo
    Chen, Jessica
    ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2009, 5678 : 326 - 337