Text extraction from web images based on a split-and-merge segmentation method using colour perception

被引:16
作者
Karatzas, D [1 ]
Antonacopoulos, A [1 ]
机构
[1] Univ Liverpool, Dept Comp Sci, Pattern Recognit & Image Anal Grp, Liverpool L69 3BX, Merseyside, England
来源
PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2 | 2004年
关键词
D O I
10.1109/ICPR.2004.1334328
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes a complete approach to the segmentation and extraction of text from Web images for subsequent recognition, to ultimately achieve both effective indexing and presentation by non-visual means (e.g., audio). The method described here (the first in the authors' systematic approach to exploit human colour perception) enables the extraction of text in complex situations such as in the presence of varying colour (characters and background). More precisely, in addition to using structural features, the segmentation follows a split-and-merge strategy based on the Hue-Lightness-Saturation (HLS) representation of colour as a first approximation of an anthropocentric expression of the differences in chromaticity and lightness. Character-like components are then extracted as forming textlines in a number of orientations and along curves.
引用
收藏
页码:634 / 637
页数:4
相关论文
共 10 条
[1]  
ANTONACOPOULOS A, 2001, P SPIE INT IM 2 SAN, P198
[2]  
ANTONACOPOULOS A, 1999, VISUAL REPRESENTATIO
[3]  
Brown M.K., 2001, P 1 INT WORKSH WEB D, P59
[4]   Automatic text location in images and video frames [J].
Jain, AK ;
Yu, B .
PATTERN RECOGNITION, 1998, 31 (12) :2055-2076
[5]  
LOPRESTI AD, 2000, INFORMATION RETRIEVA, V2, P177
[6]  
LOPRESTI D, 1996, P WORKSH DOC AN SYST, P417
[7]  
MURCH G, 1987, COLOR COMPUTER, P1
[8]  
Penn G., 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition, P1074, DOI 10.1109/ICDAR.2001.953951
[9]  
Stiles WS, 2000, COLOR SCI CONCEPTS M
[10]  
ZHOU J, 1997, P 4 INT C DOC AN REC