Clustering and searching WWW images using link and page layout analysis

被引:31
|
作者
He, Xiaofei
Cai, Deng
Wen, Ji-Rong
Ma, Wei-Ying
Zhang, Hong-Jiang
机构
[1] Yahoo Res Labs, Burbank, CA 91504 USA
[2] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
[3] Microsoft Res Asia, Beijing, Peoples R China
关键词
algorithms; management; performance; experimentation; web mining; image search; image clustering; link analysis;
D O I
10.1145/1230812.1230816
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Due to the rapid growth of the number of digital images on the Web, there is an increasing demand for an effective and efficient method for organizing and retrieving the available images. This article describes iFind, a system for clustering and searching WWW images. By using a vision-based page segmentation algorithm, a Web page is partitioned into blocks, and the textual and link information of an image can be accurately extracted from the block containing that image. The textual information is used for image indexing. By extracting the page-to-block, block-to-image, block-to-page relationships through link structure and page layout analysis, we construct an image graph. Our method is less sensitive to noisy links than previous methods like PageRank, HITS, and PicASHOW, and hence the image graph can better reflect the semantic relationship between images. Using the notion of Markov Chain, we can compute the limiting probability distributions of the images, ImageRanks, which characterize the importance of the images. The ImageRanks are combined with the relevance scores to produce the final ranking for image search. With the graph models, we can also use techniques from spectral graph theory for image clustering and embedding, or 2-D visualization. Some experimental results on 11.6 million images downloaded from the Web are provided in the article.
引用
收藏
页数:25
相关论文
共 50 条
  • [31] Automated layout planning using link method
    Nippon Kikai Gakkai Ronbunshu C Hen, 587 (3150-3156):
  • [32] LASIC: Layout Analysis for Systematic IC-Defect Identification Using Clustering
    Tam, Wing Chiu
    Blanton, Ronald D.
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2015, 34 (08) : 1278 - 1290
  • [33] Automatic page layout using genetic algorithms for electronic albuming
    Geigel, J
    Loui, A
    INTERNET IMAGING II, 2001, 4311 : 79 - 90
  • [34] Counteracting Phishing Page Polymorphism: An Image Layout Analysis Approach
    Lam, Ieng-Fat
    Xiao, Wei-Cheng
    Wang, Szu-Chi
    Chen, Kuan-Ta
    ADVANCES IN INFORMATION SECURITY AND ASSURANCE, 2009, 5576 : 270 - +
  • [35] An approach of page layout analysis based on active contour model
    Liu, DR
    Guo, BL
    Tian, XD
    2002 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-4, PROCEEDINGS, 2002, : 1711 - 1714
  • [36] Methods for Automatic Web Page Layout Testing and Analysis: A Review
    Prazina, Irfan
    Becirovic, Seila
    Cogo, Emir
    Okanovic, Vensada
    IEEE ACCESS, 2023, 11 : 13948 - 13964
  • [37] System for extracting domain topic using link analysis and searching for relevant features
    Hwang S.W.
    Lee Y.S.
    Nam Y.K.
    Journal of Ambient Intelligence and Humanized Computing, 2024, 15 (02) : 1429 - 1441
  • [38] The analysis of cardiac velocity MR images using fuzzy clustering
    Shihab, AI
    Burger, P
    PHYSIOLOGY AND FUNCTION FROM MULTIDIMENSIONAL IMAGES - MEDICAL IMAGING 1998, 1998, 3337 : 176 - 183
  • [39] Web programming with visual FoxPro - Using the WWW Search Page Wizard as a template
    Goley, GF
    DR DOBBS JOURNAL, 1996, 21 (12): : 80 - +
  • [40] Word searching in unconstrained layout using character pair coding
    Partha Pratim Roy
    Umapada Pal
    Josep Lladós
    International Journal on Document Analysis and Recognition (IJDAR), 2014, 17 : 343 - 358