A method for indexing web pages using web bots

被引:0
|
作者
Szymanski, BK [1 ]
Chung, MS [1 ]
机构
[1] Rensselaer Polytech Inst, Dept Comp Sci, Troy, NY 12180 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Exploring the content of web pages for automatic indexing is of fundamental importance for efficient e-commerce and other applications of the Web. It enables users, including customers and businesses, to locate the best sources for their use. Today's search engines use one of two approaches to indexing web pages. They either (i) analyze the frequency of the words (after filtering Out common or meaningless words) appearing in the entire or a part (typically, a title, an abstract or the first 300 words) of the text of the target web page, or (ii) they use sophisticated algorithms to take into account associations of words in the indexed web page. In both cases only words appearing in the web page in question are used in analysis. Often, to increase relevance of the selected terms to the potential searches, the indexing is refined by human processing. To identify so called "authority," or "expert" pages, some search engines use the structure of the links between pages to identify, pages that are often referenced by other pages. Analyzing the density, direction and clustering of links, this method is capable of identifying the pages that are likely to contain valuable information. It is analogous to a well known citation analysis method developed in library sciences and used by such publications as the Science Citation Index. A slightly different approach is used in the Google Search Engine implementation which assigns to each page a score that depends on frequency with which this page is visited by web surfers. The basic difference between the existing methods and the one discussed here is that these methods rely on a structure of web page linkages that lead from or to the indexed page. In contrast, our method uses the content of the pages linked to or from the indexed page for indexing. So our method uses a structure of words used by the linked pages, whereas the current methods use the structure of the connections between linked pages. In this paper we propose and demonstrate usage of a new method based on bots which analyze content of the pages linked to or from the page of interest. We analyze the similarity of the word usage at the different link distance from tile Page of interest and demonstrate that a structure of words used by the linked pages enables more efficient indexing and search.
引用
收藏
页码:C1 / C6
页数:6
相关论文
共 50 条
  • [41] THE EXISTENCE PROOF SERVICE OF THE WEB PAGES New Web Service to Get Grounds of the Existence of the Web Pages
    Obu, Yuka
    Miyata, Masahiro
    Choe, Juyeon
    Terakado, Katsumi
    Yonekura, Tatsuhiro
    WEBIST 2009: PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND TECHNOLOGIES, 2009, : 572 - +
  • [42] Extraction of core web content from web pages using noise elimination
    Saravanan A.
    Bama S.S.
    Journal of Engineering Science and Technology Review, 2020, 13 (04) : 173 - 187
  • [43] Web pages classification using concept analysis
    Di Lucca, Giuseppe Antonio
    Fasolino, Anna Rita
    Tramontana, Porfirio
    2007 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE, 2007, : 435 - +
  • [44] Recommendation of Optimized Web Pages to Users Using Web Log Mining Techniques
    Bhushan, Ravi
    Nath, Rajender
    PROCEEDINGS OF THE 2013 3RD IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2013, : 1030 - 1033
  • [45] Extraction of web news from web pages using a ternary tree approach
    Laishram, Debina
    Sebastian, Merin
    2015 SECOND INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING AND COMMUNICATION ENGINEERING ICACCE 2015, 2015, : 628 - 633
  • [46] Effectual Web Content Mining using Noise Removal from Web Pages
    Sivakumar, P.
    WIRELESS PERSONAL COMMUNICATIONS, 2015, 84 (01) : 99 - 121
  • [47] Web image indexing by using associated texts
    Zhiguo Gong
    Leong Hou U.
    Chan Wa Cheang
    Knowledge and Information Systems, 2006, 10 : 243 - 264
  • [48] Making Web pages
    Busby, B
    HEALTH PHYSICS, 1999, 76 (03): : 236 - 237
  • [49] Web image indexing by using associated texts
    Gong, Zhiguo
    U, Leong Hou
    Cheang, Chan Wa
    KNOWLEDGE AND INFORMATION SYSTEMS, 2006, 10 (02) : 243 - 264
  • [50] Web user clustering and Web prefetching using Random Indexing with weight functions
    Miao Wan
    Arne Jönsson
    Cong Wang
    Lixiang Li
    Yixian Yang
    Knowledge and Information Systems, 2012, 33 : 89 - 115