Object detection using the statistics of parts

被引:174
作者
Schneiderman, H [1 ]
Kanade, T [1 ]
机构
[1] Carnegie Mellon Univ, Inst Robot, Pittsburgh, PA 15213 USA
关键词
object recognition; object detection; face detection; car detection; pattern recognition; machine learning; statistics; computer vision; wavelets; classification;
D O I
10.1023/B:VISI.0000011202.85607.00
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we describe a trainable object detector and its instantiations for detecting faces and cars at any size, location, and pose. To cope with variation in object orientation, the detector uses multiple classifiers, each spanning a different range of orientation. Each of these classifiers determines whether the object is present at a specified size within a fixed-size image window. To find the object at any location and size, these classifiers scan the image exhaustively. Each classifier is based on the statistics of localized parts. Each part is a transform from a subset of wavelet coefficients to a discrete set of values. Such parts are designed to capture various combinations of locality in space, frequency, and orientation. In building each classifier, we gathered the class-conditional statistics of these part values from representative samples of object and non-object images. We trained each classifier to minimize classification error on the training set by using Adaboost with Confidence-Weighted Predictions (Shapire and Singer, 1999). In detection, each classifier computes the part values within the image window and looks up their associated class-conditional probabilities. The classifier then makes a decision by applying a likelihood ratio test. For efficiency, the classifier evaluates this likelihood ratio in stages. At each stage, the classifier compares the partial likelihood ratio to a threshold and makes a decision about whether to cease evaluation-labeling the input as non-object-or to continue further evaluation. The detector orders these stages of evaluation from a low-resolution to a high-resolution search of the image. Our trainable object detector achieves reliable and efficient detection of human faces and passenger cars with out-of-plane rotation.
引用
收藏
页码:151 / 177
页数:27
相关论文
共 30 条
[1]   A neural network architecture for visual selection [J].
Amit, Y .
NEURAL COMPUTATION, 2000, 12 (05) :1141-1164
[2]  
[Anonymous], 2001, P 2001 IEEE COMPUTER
[3]   LEAST-SQUARES FITTING OF 2 3-D POINT SETS [J].
ARUN, KS ;
HUANG, TS ;
BLOSTEIN, SD .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1987, 9 (05) :699-700
[4]   Recognition of planar object classes [J].
Burl, MC ;
Perona, P .
1996 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, PROCEEDINGS, 1996, :223-230
[5]  
BURL MC, 1998, P 5 EUR C COMP VIS
[6]  
CHOW CK, 1966, IEEE T INFORMATION T, V14
[7]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
[8]   Vector quantization of image subbands: A survey [J].
Cosman, PC ;
Gray, RM ;
Vetterli, M .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 1996, 5 (02) :202-225
[9]   On the optimality of the simple Bayesian classifier under zero-one loss [J].
Domingos, P ;
Pazzani, M .
MACHINE LEARNING, 1997, 29 (2-3) :103-130
[10]   Wavelets, vision and the statistics of natural scenes [J].
Field, DJ .
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 1999, 357 (1760) :2527-2542