Rapid object recognition from a large model database

August 23, 2017 | Autor: David Chelberg | Categoría: Computer Vision, Decision Theory, Power Generation, Feature Detection, Object Recognition, Bayesian methods, Indexing, Image recognition, Domain Specificity, Indexation, Object Detection, Knowledge Compilation, Bayesian framework, Bayesian methods, Indexing, Image recognition, Domain Specificity, Indexation, Object Detection, Knowledge Compilation, Bayesian framework

Share Embed

Laporkan tautan ini

Descripción

Rapid Object Recognition From a Large Model Database June Ho Yi David M Chelberg Geometric Modeling and Perceptual Processing Laboratory School of Electrical Engineering Purdue University West Lafayette, Indiana 47907-1285

Abstract

A design for a system to perform object recognition from a large model data base is presented, focusing on ecient indexing. We propose a decision-theoretic approach using a Bayesian framework to achieve ecient indexing of model objects. A decision-theoretic measure of the discriminatory power of a feature for a model object is de ned in terms of posterior probability. Domain-speci c knowledge compiled o-line from CAD model data is used in order to estimate posterior probabilities that de ne the discriminatory power of features for model objects. Detectability of a feature de ned as a function of the feature itself, viewpoint, sensor characteristics, and the feature detection algorithm(s) is also considered in the computation of discriminatory power. In order to speed up the indexing or selection of the correct objects, we generate and verify the object hypotheses for the features detected in the scene in the order of the discriminatory power of these features for model objects.

1 Introduction

The problem of rapid object recognition with a large model database is very dicult. The methods used by most model-based object recognition systems, while often reasonably successful within their own terms, have not proven themselves capable of dealing with a large model database containing perhaps thousands of objects. The term \object recognition" implies two dierent recognition problems. We distinguish them rst as follows: P1 : Given a known object, nd the object in the

scene.

P2 : Given an unknown object in the scene, identify

the object.

This work was supported by a Digital Equipment Corporation Faculty Incentives for Excellence Grant and by NSF Grant number IRI-9011421.

The P2 type recognition problem is more dicult than the P1 type recognition problem, especially in the case of a large model database. Model-driven matching should be used for the P1 type recognition problem because features to look for in the scene are known in advance. On the other hand, in the P2 type recognition problem, a data-driven matching strategy should be used in order to index model objects for a feature detected in the scene. Indexing becomes increasingly dicult as the number of model objects increases. In this paper, we will present an approach which explicitly addresses the problem of the P2 type recognition from range data, focusing on ecient indexing to handle the complexity posed by a large model database. The proposed method will work well in case of a small number of model objects as well. We use CAD model data of an object to extract the model information because inferencing the model information from the CAD model data is well suited in the design of object recognition systems which can handle a large number of objects. Our approach to overcoming the problems inherent in a large model database is to employ o-line processing of model information instead of complicated inference during the recognition phase of processing. We de ne a decision-theoretic measure of the discriminatory power of a feature for a model object in terms of posterior probability. A measure of the detectability of a feature is considered in the computation of the discriminatory power of a feature for a model object. The detectability of a feature is represented as a function of the feature itself, viewpoint, sensor characteristics, and the feature detection algorithm(s) used. The domain-speci c knowledge compiled o-line from CAD model data is used in order to estimate posterior probabilities that de ne the discriminatory power of features for model objects. Thus matching strategy is a hybrid strategy that combines both model-driven and data-driven strategies. Object hypotheses for features detected in the scene are ordered by the information (discriminatory power) com-

piled on the model side. In order to speed up the indexing or selection of the correct objects, we generate and verify the object hypotheses for the features detected in the scene in the order of the discriminatory power of these features for model objects. By generating and verifying the object hypotheses in this order, we expect to generate only a few correct hypotheses of the scene objects, resulting in a speeding up of the indexing and matching of correct objects. The major contribution of this work is the design of a decision-theoretic approach using a Bayesian framework to achieve ecient indexing of model objects. This paper is organized as follows. The following section brie y reviews and compares related approaches with our approach. In section 3, we de ne a decision-theoretic measure of the discriminatory power of a feature for a model object and describe how it is computed. Section 4 presents how object hypotheses are generated and veri ed in an order of the discriminatory power that can achieve ecient indexing of correct model objects for a simple example feature.

2 Related work

Most object recognition work reported in the computer vision literature is aimed at the P2 type problem although it is not explicitly stated as such. However, 3DPO [2] and Ikeuchi's work [7] are tuned to locate a given object for tasks such as bin-picking of one type of object rather than to solve the recognition problem itself. In this sense, they are attacking a problem close to the P1 type. As pointed out in the beginning of the previous section, few existing object recognition systems have been evaluated under the situation of a large model database. Most object recognition systems naturally lack explicit indexing strategies because the problems of a large model database are not attacked. There has been some recent research activity among several researchers [3] [8] [5] dealing with large model databases. Burns and Kitchen [3] proposed a method for rapid object recognition for a large model database. By predicting the twodimensional appearances of three-dimensional structures and merging similar structures arising from different models, a prediction-hierarchy is structured oline. This prediction-hierarchy is traversed for recognition, resulting in a speed-up of indexing of correct objects. The actual recognition is a two-dimensional matching process. Swain [8] applied a decision-tree approach that had been used in the pattern recognition domain to object recognition. Given a probability distribution on the frequency of objects, the viewpoints from which they are seen, and the errors

which occur in the low-level system, a decision tree is built that optimizes the expected cost of recognition. It employs topological, relational, and viewpoint dependent information in its decision rules. Flynn [5] improved their object recognition system BONSAI [6] to deal with a large model database. He addresses two diculties that arise when a large model database is used. Invariant feature indexing of interpretation tables is employed. Interpretation tables summarize candidate bindings for scene feature groups at recognition time. The problem of redundant feature groups arising from object symmetries is resolved by collecting these groups into equivalence classes. A population-based measure of saliency is assigned to each feature group retrieved from the scene. Feature groups with a saliency measure below a predetermined threshold are ignored in order to employ only the most salient feature groups in object recognition. This is risky because ignored feature groups may contain the correct object hypotheses. Our work is most similar to this approach. We have a data structure called \Feature Indexing Tree" (similar to the interpretation tables in his work) that lists candidate bindings for scene features at recognition time. We de ne the discriminatory power of a feature for a model object that is similar to saliency measure in Flynn's work. The discriminatory power to be de ned in our work is a decision-theoretic measure of saliency of a feature for a model object under Bayesian framework whereas a saliency measure based on population is somewhat ad hoc. Detectability of a feature group is not incorporated into the computation of its saliency measure in Flynn's work.

3 Discriminatory power of a feature for a model object

In order to de ne the discriminatory power of a feature for a model object in terms of posterior probability, we start with the joint probability P(mk ; Mi ; viewpointj ), k = 1; ; fm , i = 1; ; N, and j = 1; ; v where mk , Mi , and viewpointj denote a feature for indexing, i-th model object, and j-th viewpoint of a set of viewpoint samplings, respectively. This joint probability encodes the information conveyed by a feature of a model object. If a feature to be used is viewpoint independent, we can ignore viewpointj in P(mk ; Mi ; viewpointj ).

3.1 De nitions and notations

P(mk ; Mi ; viewpointj ) : joint probability of mk (a feature for indexing), Mi (i-th model object), and viewpointj (j-th viewpoint of a set of viewpoint samplings), where k = 1; ; fm , i = 1; ; N,

and j = 1; ; v. P(Mi ) : The probability that the model Mi appears in the scene. Then we have N X i=1

P (Mi ) = 1:

P(mk =Mi ) : a likelihood function. P(mk =Mi ) > P(mk =Mj ) means that the model Mi is more \likely" to be the model object that the feature mk belongs to than the model Mj , in that mk would be a more plausible instance of the features of the model Mi than the model Mj . P(Mi =mk ) : This posterior probability re ects the updated belief that model Mi appears in the scene after the feature mk is observed. Dmk : Detectability of a feature mk that measures how well a feature mk can be detected. Estimates of these quantities will be denoted by a hat above the symbol. De nition :

P(Mi =mk ) is the discriminatory power of the feature mk for a particular model object Mi . Given this de nition, the feature ml such that P(Mj =ml ) = maxk fmaxi P(Mi =mk )g has the best discriminatory power. In addition, the following is always true for all mk . 1

N

maxi P (Mi =mk ) 1

(1)

The detectability of a feature is considered in the computation of the discriminatory power of a feature for a model object. The detectability of a feature, Dmk , depends on the feature mk itself. We will call this the robustness of a feature and by the robustness of a feature, we mean the detectability of a feature depending on the feature class. For example, a vertex feature may be less reliably detectable than a surface feature. Furthermore, the robustness of a feature should depend on the probability of artifact features in the scene. For example, concave dihedral edges can occur whenever a polyhedron is placed on another polyhedron: moreover, this is likely to occur due to occlusion in polyhedral scenes. On the other hand, the likelihood of a convex edge being formed as an artifact of occlusion is very low. Dmk changes as the viewpoint varies. For example, when a planar surface is detected in various viewpoints, it is more dicult to be detected for a viewpoint involving a very high sloped appearance of the planar surface than would be the case for a viewpoint giving a at appearance of the planar surface. The sensor's capability is also important for a feature to be reliably detectable. Finally, the detectability of a feature can vary according

to the feature detection algorithm used. Therefore we represent the detectability of a feature mk , Dmk as a function of the above four factors: 0 Dmk = f (mk ; viewpoint; sensor; feature detection algorithm) 1 (2)

3.2 Computation of discriminatory power

In the following, we will describe how to estimate the posterior probability, P(Mi =mk ). Estimates ^ i ) and of P(Mi) and P(mk ; viewpointj =Mi) by P(M ^ k ; viewpointj =Mi ) respectively, can be calculated P(m once we know the speci c application domain and de^ i ) can be computed termine which feature to use. P(M by observing the frequency of the appearance of the model object Mi in the scene and normalizing it by the total number of observations of all models. Once a certain feature mk is decided to be used for indexing ^ k ; viewpointj =Mi ) of model objects, we compute P(m by counting the number of appearances of the feature mk in model object Mi for the viewpointj . However, the feature mk is often not perfectly detectable, i.e., Dmk is not 1.0. We are making the assumption of the conditional independence of the two terms, P(mk ; viewpointj =Mi ) and Dmk . Thus we use the ^ k ; viewpointj =Mi ) D^ mk for the compuproduct P(m ^ k =Mi ). P(m ^ k ; viewpointj =Mi) is estitation of P(m mated as P^?(mk ; viewpointj =Mi) =

# occurrences of the feature mk in Mi for viewpointj # occurrences of all features ml ;l=1;;fm in Mi for all viewpoints (3)

The model dependent information and the sensing dependent information are separated into ^ k ; viewpointj =Mi ) and D^ mk , respectively. This P(m way, feature detectability can be incorporated into the computation of the discriminatory power when CAD model data is used. Therefore, the likelihood ^ k =Mi ) and the posterior probability P(M ^ i =mk ) P(m are computed as P^ (mk =Mi) =

v X j=1

P^ (mk ;viewpointj =Mi ) D^ mk

(4)

and

P P^(Mi ) vj=1 P^(mk ; viewpointj =Mi) D^ mk ^ : (5) P (Mi =mk ) = PN Pv ^ ^ ^ i=1 P (Mi ) j=1 P (mk ; viewpointj =Mi ) Dmk

Note that if a feature mk does not exist in the model ^ k =Mi) = 0:0. As previously stated, the object Mi , P(m same formulation (4) and (5) can be applied to viewpoint independent features by ignoring the viewpointj term. Given a particular feature and viewpoint, estimating Dmk amounts to determining how dierent feature detection algorithms behave under dierent sensor characteristics (for example, signal/noise ratio).

For the case of edge detection in which the feature is an edge, the Sobel operator is known to perform better in noisy situation than the Robert's cross. Therefore Dmk for edge features would be higher for the Sobel operator than for the Robert's cross. In fact, it is possible to analytically determine the probability of detecting an edge using either algorithm with given signal/noise ratio. We will use a similar method to determine Dmk for our features. ^ i =mk ) An example showing the computation of P(M will be given in the following section for two simple objects using a view dependent feature.

H0

K=0 ridge planar valley

K < surface-type: ridge > < simultaneously-visible-adjacent-surfaces: [ ( p , planar, area, radius (nil), 0 o )

p3

In this section, we take a simple feature (mk ) as an example to illustrate how to compute the discrimina^ i =mk ). We also present how object tory power P(M hypotheses are generated and veri ed in an order that can achieve ecient indexing and matching of correct object based on this feature.

1

( p

2

( p

C1

}

3

, planar, area, radius (nil), 90o ) , planar, area, radius (nil), 45 o ) ] >

p

4

(a) model object M 1

4.1 An example feature, mk

We use a local surface group as a feature for object hypothesis generation. We will call this a \LSG (local surface group)". The signs of Gaussian curvature (K) and mean curvature (H) yield eight possible types of surface primitives [1] as shown in Table 1. These eight primitive surfaces are considered in a LSG. Surface orientation is de ned for planar surfaces as the direction of surface normal and for straight circular cylindrical (ridge, valley) surfaces as the direction of the axis, respectively. Figure 1 shows an example of a LSG. The last entry of each surface patch in the attribute is the angle between the surface orientation of the seed surface and the adjacent surface. This angle is applicable only when two surface types are either planar or cylindrical (ridge, valley). For pairs of other surface types, the angle value is set to nil. Note that in model object M1 in Figure 1, surfaces P1, P2 , P3, and P4 are the topological neighbors of the surface C1 , however, only the surfaces P1, P2, P3, and C1 are simultaneously visible for the given viewpoint. In addition, jump edges between the seed surface patch and any adjacent surface patches are not allowed. Note that the number of LSGs for a viewpoint is at most the number of visible surface patches for this viewpoint. Indexing involves a tradeo between the complexity of the indexing feature and eciency of indexing using this feature. If the indexing feature is complex (for example, a whole object as an indexing feature), indexing will not be ecient but will result in only a few candidate mod-

K>0 peak (none) pit

(b) an example of LSG

Figure 1: An example of LSG els. On the other hand, if a simple feature is used for indexing (for example, a surface patch as the indexing feature), indexing will be easy while many model objects are indexed for a feature detected in the scene. In this example, we will use a subset of the LSG as an indexing feature. We will call this indexing feature \Indexing?LSG". The following is an example of the Indexing? LSG. fsurface-type-of-seed-surface:

ridge

< surface-types-of-simultaneously-visible-adjacent-surfaces: (planar, planar, planar)

g

< sum-of-angles: 135 >

In this Indexing? LSG, only surface type information of simultaneously visible adjacent surface patches and the sum of the surface orientation differences between the seed surface patch and simultaneously visible adjacent surface patches are encoded without distinguishing respective adjacent surface patches. This way, indexing is very easy and ef cient. Dierent instances of this Indexing? LSG are the mk 's. In the < surface-types-of-simultaneouslyvisible-adjacent-surfaces: > attribute, we order the surface types in a predetermined order for computational eciency.

4.2 Construction of indexing table

For each viewpoint on the viewing sphere, a range image of the object is rendered along with the surface

V1

V3 V4

P2

V2 V1

P3

V6

V7

V2

seed adjacent surface(s)

P1

C1

P1 P2 C1

90

P2 P1 P3

135

P3 P2 C1

90

seed adjacent surface(s)

P1 P3

0

P3 C1

45

C1 P P 45 1 3

C1

C1 P1 P2 P3 135

P1 C1

V5 V3

V8

Figure 2: Eight viewpoints by tessellation of viewing sphere using octahedron. Faces in the viewing directions V1, V2, V5, and V6 are visible in this gure. label of each surface patch. This range image and the surface labels are used for feature extraction routines to estimate the viewpoint dependent features of interest (in our example, LSGs). An example of the process of compiling LSGs is shown in Figures 3 and 4 for two simple objects M1 and M2 using a coarse viewpoint sampling (eight viewpoints). For simplicity, we assume Dmk = 1 for k = 1; ; fm . Figure 2 shows the eight viewpoints used in order to compile the LSGs of the two objects, M1 and M2 . Viewing directions are the vectors normal to the surfaces of an octahedron. For viewpoints V 3 and V 4 in Figure 4, the planar surface patch P1 is not adjacent to the seed surface patch C1 according to the de nition of LSG because the edge between them is entirely a jump discontinuity. Note that, for the viewpoint V 2 in Figure 3, the planar surface patch P3 is an adjacent surface patch to the seed surface patch C1 because the shared edge between them is not entirely a jump discontinuity. Similar cases are found for the viewpoints V 5 and V 8 in Figure 3 and for the viewpoints V 5 and V 6 in Figure 4. The Feature Indexing Tree is shown in Figure 5 for the two objects M1 and M2 using the Indexing? LSG we described in the previous section. Fourteen dierent Indexing? LSGs (mk , k = 1; ; 14) are compiled and the LSGs that provide each Indexing?LSG (i.e. mk ) are attached to the corresponding leaf of the Feature Indexing Tree. LSGs attached to each Indexing? LSG (mk for a xed k) are then ordered in the order of the ^ i =mk ), i = 1; 2, as shown discriminatory power P(M in Table 2. The following shows the computation of ^ k =Mi) and the feature probability the likelihood P(m ^P(mk ) in order to obtain the posterior probabilities ^ i =mk ), i = 1; 2 and k = 1; ; 14 shown in Table P(M 2. Likelihood P^ (m1 =M1 ) = 0 2 P^ (m4 =M1 ) = 23 4 P^ (m7 =M1 ) = 23 P^ (m10 =M1 ) = 0 2 P^ (m13 =M1 ) = 23 2 P^(m1 =M2 ) = 32 P^(m4 =M2 ) = 0 P^(m7 =M2 ) = 0 5 P^(m10 =M2) = 32 6 P^(m13 =M2) = 32

6 P^ (m2 =M1 ) = 23 2 P^ (m5 =M1 ) = 23 P^ (m8 =M1 ) = 0 3 P^ (m11 =M1 ) = 23 2 P^ (m14 =M1 ) = 23

P^ (m2 =M2 ) = 0 P^ (m5 =M2 ) = 0 P^ (m8 =M2 ) = 322 2 P^ (m11 =M2 ) = 32 P^ (m14 =M2 ) = 0

1 P^ (m3 =M1) = 23 P^ (m6 =M1) = 0 P^ (m9 =M1) = 0 1 P^ (m12 =M1) = 23

P^ (m3 =M2 ) = 0 8 P^ (m6 =M2 ) = 32 7 P^ (m9 =M2 ) = 32 P^ (m12 =M2 ) = 0

V4

P1 c 1

0

c 1 P1

0

P1 P2 C1

90

P2 P1 P3

135

P3 P2 C1

90

C1 P P2 P 135 1 3

V5

P2 C1

V6 seed adjacent surface(s)

adjacent surface(s)

P2 C1

90

P4 C1

0

P4 C1 0 C1 P 4

C1

0

C1 P P 90 2 4

P4

P4

V8

V7

seed adjacent surface(s)

adjacent surface(s)

seed

P4 C1 0 C1 C1

P2 C1

P2

P4 0

90

P4 C1

C1

0 P2 P4 90

C1

P4

P4

Figure 3: Compilation of LSGs for model object M1, other attributes of LSG are not shown here. V1

V2

seed adjacent surface(s)

seed

P1 P2 P3 C1 270

C1

P3

P3 P1 P 2

180

C1 P P 2 1

90

P3 P1 P4

P1 P3

P4

C1 P1 P4

P4

P5

seed

90 C1

P4 P1 P5 C1 180

C1

P5 P4 C1

180

C1 P4 P

90

5

P5

adjacent surface(s)

P1 P2

90 P2 P1 P C1 180 5

P1

P2

P5 P2 C1

180

C1 P P 2 5

90

V5 C1

seed

P3 P6

P3 P2 P6

180

P6 P2 P3

180

C1 P 2

P5

P4 P6

P3

seed

P4 P6

V8

C1

P5 P4 P6 C1 270 P4 P5

180

C1 P4 P5

90

180 0

C1 P 4

seed

adjacent surface(s)

P4 P5 P6 C1 180

adjacent surface(s)

180 P3 P4 P6 P4 P3 P C1 270 6 P6 P3 P4

0

V7

P6

V6

C1

adjacent surface(s)

P2 P3 P6 C1 180 P2

90

V4 adjacent surface(s)

P1 P4

P1

180

P4 P1 P3 C1 180

V3 seed

adjacent surface(s)

P1 P3 P4 C1 270

C1

P2 P1 P3 C1 180

P1 P2

seed

(7)

P3 C1

seed

adjacent surface(s)

seed

P2

C1

C1

(6)

P1

seed adjacent surface(s)

P1

P2

P5 P6

adjacent surface(s)

P2 P5 P6 C1 180 P5 P2 P6 C1 270 P6

P2 P5

180

C1

P2 P5

90

Figure 4: Compilation of LSGs for model object M2, other attributes of LSG are not shown here.

seed surface patch

P

number of simultaneously visible surface patches with the seed surface patch

surface types

sum of angles

2

1

C

P

90

0

m1

m2

C

P, P

3

1

2

P, C

P, P, C

P

P, P

90

135 180

90 180

180 270

0

m3 m4

m 5 m6

m7 m8

m 9 m10

m11

45

45

3

Indexing LSG P ? P ? 90 P ?C?0 P ? C ? 45 P ? C ? 90 P ? PP ? 135 P ? PP ? 180

P^ (M1 =mk ) 0.0 1.0 1.0 1.0 1.0 0.0

P^ (M2 =mk ) 1.0 0.0 0.0 0.0 0.0 1.0

m7

P ? PC ? 90

1.0

0.0

m8

P ? PC ? 180

0.0

1.0

m9

P ? PPC ? 180

0.0

1.0

m10

P ? PPC ? 270

0.0

1.0

m11

C?P ?0

0.676

0.324

m12 m13

C ? PP ? 45 C ? PP ? 90

1.0 0.317

0.0 0.683

m14

C ? PPP ? 135

1.0

0.0

P, P, P 90

135

m12 m13 m14

Figure 5: Feature Indexing Tree for the model objects M1 and M2 . P and C denote planar and ridge surface patches, respectively. Feature probability 2 ) = 0:0313 P^ (m1 ) = 21 (0 + 32 1 + 0) = 0:0217 P^ (m3 ) = 21 ( 23 2 + 0) = 0:0435 P^ (m5 ) = 21 ( 23 4 + 0) = 0:087 P^ (m7 ) = 21 ( 23 7 ) = 0:1101 P^ (m9 ) = 21 (0 + 32 3 + 2 ) = 0:0965 P^ (m11 ) = 21 ( 23 2 + 32 6 P^ (m13 ) = 12 ( 23 32 ) = 0:1372

mk m1 m2 m3 m4 m5 m6

6 + 0) = 0:1304 P^(m2 ) = 21 ( 23 2 + 0) = 0:0435 P^(m4 ) = 12 ( 23 8 ) = 0:1250 P^(m6 ) = 12 (0 + 32 2 ) = 0:0313 P^(m8 ) = 12 (0 + 32 5 ) = 0:0781 P^(m10 ) = 21 (0 + 32 1 + 0) = 0:0217 P^(m12 ) = 12 ( 23 2 + 0) = 0:0435 P^(m14 ) = 12 ( 23

(8)

Before we describe the matching strategy, let us make several comments about the information displayed in Table 2. There are two cases for the Indexing? LSGs: First, the Indexing? LSGs whose in^ i =mk ) = dexing power is 1.0. That is, maxi=1;2P(M 1:0 for mk 's, k = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 14. Second, the indexing power is not 1.0 for m11 and m13. If the indexing power of a feature is 1.0, the feature is unique to one of the model objects. In other words, if a feature detected in the scene corresponds to a feature mk in the Feature Indexing Tree ^ j =mk ) = maxi P(M ^ i =mk ) = 1:0, it is such that P(M certain that the model Mj is in the scene. However, even though indexing power is 1.0, LSGs listed for each mk can be more than one depending on how simpli ed an Indexing? LSG is used for indexing of model objects. Only one LSG is attached to the Indexing? LSGs m3 , m4 , m5 , m12 , and m14 while more than one LSG are listed for the Indexing? LSGs m1 , m2 , m6 , m7 , m8 , m9 , and m10 . Recall that we adopted a simpli ed version of a LSG as an Indexing?LSG in section 4.1 for the purpose of easy and ecient feature indexing. On the other hand, if the indexing power of a feature is not 1.0, several objects containing the feature may be in the scene. For example, m2 and m13 are such cases in Table 2. In our example, for m11, LSGs that belong to model object M1 are listed rst and then those thath belong to model object i ^ 1 =m11) = 0:676 > M2 are listed next because P(M h i ^ 2 =m11) = 0:324 . This indicates the belief that P(M model object M1 is more plausible to appear in the

LSGs [M2 ;P1 ; (P4 )] [M2 ;P1 ; (P2 )] [M1 ;P1 ; (C1 )] [M1 ;P4 ; (C1 )] [M1 ;P3 ; (C1 )] [M1 ;P2 ; (C1 )] [M1 ;P2 ; (P1 ;P3 )] [M2 ;P2 ; (P1 ;P2 )] [M2 ;P3 ; (P1 ;P4 )] [M2 ;P3 ; (P2 ;P6 )] [M2 ;P6 ; (P2 ;P3 )] [M2 ;P3 ; (P ;P )] [M24;P66; (P3 ;P4 )] [M2 ;P6 ; (P4 ;P5 )] [M2 ;P6 ; (P2 ;P5 )] [M1 ;P1 ; (P2 ;C1 )] [M1 ;P3 ; (P2 ;C1 )] [M2 ;P5 ; (P4 ;C1 )] [M2 ;P5 ; (P2 ;C1 )] [M2 ;P4 ; (P1 ;P3 ;C1 )] [M2 ;P2 ; (P1 ;P5 ;C1 )] [M2 ;P1 ; (P2 ;P3 ;C1 )] [M2 ;P1 ; (P3 ;P4 ;C1 )] [M1 ;C1 ; (P4 )] [M1 ;C1 ; (P1 )] [M2 ;C1 ; (P2 )] [M2 ;C1 ; (P4 )] [M1 ;C1 ; (P1 ;P3 )] [M2 ;C1 ; (P1 ;P2 )] [M2 ;C1 ; (P1 ;P4 )] [M2 ;C1 ; (P4 ;P5 )] [M2 ;C1 ; (P2 ;P5 )] [M1 ;C1 ; (P2 ;P4 )] [M1 ;C1 ; (P1 ;P2 ;P3 )]

Table 2: For each Indexing? LSG, candidate LSGs for object hypotheses are ordered in the order of the discriminatory power. Seed surface type, adjacent surface type(s), and sum of the angles are separated by \-" in representing an Indexing?LSG scene given m11. Similarly LSGs for model object M2 are put rst for feature m13 . The problem of appearance of more than one LSGs for each model due to the use of a less constrained de nition of Indexing?LSG can be resolved by employing a series of lters using viewpoint dependent features. An example illustrating this will be given in the following section.

4.3 Hypothesis generation Suppose a scene containing model object M1 in a particular viewpoint as shown in Figure 6. Two LSGs detected from the scene are shown in the same gure. Given these two LSGs, the following procedure generates the object hypotheses.

LSG #1 { seed-surface-label:

S2

S

1

S

1 2 adjacent-surfaces }

LSG #2 { seed-surface-label:

S2

simultaneously-visible

Figure 6: A scene containing model object M1 and two LSGs detected from the scene 1. Find the Indexing?LSGs in the Feature Indexing Tree (Figure 5) that match with LSG#1 and LSG#2 detected in the scene and get the candidate LSGs for object hypotheses (Table 2). In practice, due to uncertainty in the sensed data, we nd Indexing? LSGs in the Feature Indexing Tree within a tolerance of angle " where " depends on the magnitude of uncertainty. For example, for the LSG#1 detected in the scene in Figure 6, we pull out all candidate LSGs for object hypotheses listed in the Indexing? LSGs P ?C ?[4?"; 4+"] and similarly those listed in C ? P ? [4 ? "; 4 + "] for LSG #2. Say, " = 10. Then, the Indexing? LSGs in the Feature Indexing Tree that match LSG#1 and LSG#2 are P ?C ?0(m2 ) and C ? P ? 0(m11 ), respectively. Thus the candidate LSGs of object hypotheses for LSG#1 are [M1 ; P1; (C1)], [M1; P4; (C1)] and those for LSG#2 are [M1 ; C1; (P4)], [M1; C1; (P1)], [M2; C1; (P2)], and [M2; C1; (P4)]. 2. Apply a series of lters to the candidate LSGs for object hypotheses using other unused feature attributes (either viewpoint dependent attributes or viewpoint independent attributes) before ordering them according to the discriminatory power ^ i =mk ). For example, area (P1 ) = 0.88 and P(M area (P4 ) = 1.77 in the model object M1 . Thus, [M1; P1; (C1)] for LSG#1 is ltered out because area (S1 ) is 1.5 and visible area of the surface P1 of model object M1 cannot be greater than 0.88. Other kinds of lters such as radius lter are used as well. Radius (C1) is 0.75 in model object M1 and radius (C1)= 0.5 in model object M2 . The candidate LSGs listed for m11 are ltered using radius and area attributes and only [M1; C1; (P4)] passes the ltering. The remaining LSGs that have passed the ltering are ordered

^ i =mk ) according to the discriminatory power P(M and object hypotheses are generated and veri ed in this order. In our example, there are two LSGs, [M1; P4; (C1)] from m2 and [M1; C1; (P4)] from m11 that remain after ltering. Model object M1 is hypothesized and veri ed for [M1 ; P4; (C1)] by matching S1 and P4 , S2 and C1 and then for ^ 1 =m2 ) > P(M ^ 1 =m11). [M1; C1; (P4)] because P(M Since the scene object is identi ed by the hypothesis for the rst LSG [M1 ; P4; (C1)], the hypothesis for the second LSG [M1 ; C1; (P4)] will not be generated, unless the rst hypothesis fails veri cation.

4.4 Hypothesis veri cation

A problem faced is how to set the recognition criterion. Chen and Kak [4] used a recognition criterion that requires at least 33% of a candidate model's features to be found. They employed only surface features for veri cation of object hypotheses. In other words, 33% of the surfaces have to be found in a scene for a hypothesis to be considered to be valid. However, the recognition criteria should include not only percentage of a candidate model's features to be found but also the degree of uniqueness (the discriminatory power) of the features that generated the object hypothesis. Suppose that only one object (Mi ) has a pit surface patch. Then it is certain that model object Mi is present in the scene, given that a pit surface patch is detected in the scene under the closed world assumption. Even if other surfaces of the object are not found, the object hypothesis should be considered correct. Therefore, we have to take into account the dis^ i =mk ) as well as criminatory power of the feature P(M percentage of a candidate model's features to be found for a hypothesis in order that the hypothesis can be considered to be valid. That is, an object hypothesis ^ i =mk ) t (a value close to 1:0) should be with P(M considered valid even if other features of the candidate ^ i =mk ) < t, model are not found in the scene. If P(M at least % of candidate model's features have to be found. We are investigating how to set these threshold values in order to produce a good recognition criteria. In our LSG example, structural veri cation (feature level veri cation) can be easily applied if a model object is represented by an attribute relational graph. Structural veri cation veri es spatial relations among the features that should be present in the scene by matching additional features to increase con dence in the object hypothesis. This approach is convenient because the features have already been detected as a part of the matching process. Once an object hypothesis is generated, we have already found a subgraph

in the attribute-relational graph of the hypothesized object. We can increase the evidence for the hypothesis by nding the largest subgraph in the attributerelational graph of the hypothesized object.

4.5 Recognition of objects in the presence of occlusion

The methods presented so far could be directly used for the recognition of single isolated objects. However, our interest also lies in recognizing objects under occluded conditions as well. When the range images to be interpreted are scenes of occluded objects, feature (in our example, LSGs) computed from the scene may not have corresponding indexing features (in our example, Indexing?LSGs) in the Feature Indexing Tree. In the LSG example, this may happen in the case of complete occlusion of one of the surface patches. In order to overcome this problem, we consider the following two methods. One immediate method is to employ local features for object hypothesis generation and simply ignore the features from the scene that do not have corresponding indexing features in the Feature Indexing Tree. In the LSG example, by considering only scene LSGs that can be indexed to Indexing? LSGs in the Feature Indexing Tree, any object that is not completely occluded by another object can be correctly hypothesized so long as at least one LSG is completely visible. However, indexing performance may degrade. Another method is to look for all indexing features that contain the indexing feature computed from the scene as a subset. In our example, suppose that we have detected a scene LSG which has an Indexing? LSG, C ? P ? 60. Then we look for all Indexing? LSGs containing C ? P ? 60. As a result, we get C ? PPP ? 135.

5 Conclusion

We have presented a decision-theoretic approach using a Bayesian framework for ecient indexing of model objects that handles the complexity posed by a large model database. The proposed approach is being implemented with the investigation of several theoretical issues: First, what distribution of the posterior probabilities P(Mi =mk )'s is desirable (optimal) for the rapid indexing of the correct objects? What should be considered in determining the features to use in order to achieve the desirable distribution of the posterior probabilities? And how can we incorporate the entire cost (time) for recognizing an object into the proposed decision-theoretic framework?

References

[1] P. J. Besl and R. C. Jain, \Segmentation Through Variable-Order Surface Fitting," IEEE Trans.

Patt. Anal. Machine Intell., vol. 10, No. 2, pp. 167-

[2]

[3]

[4]

[5]

[6]

[7]

[8]

192, March, 1988. R. C. Bolles and P. Horaud, \3DPO: A ThreeDimensional Part Orientation System," International Journal of Robotics and Automation, vol. 5, no. 3, pp. 3-26, 1986. J. B. Burns and L. J. Kitchen, \Rapid Object Recognition From a Large Model Base Using Prediction Hierarchies," Proceedings of DARPA Image Understanding Workshop, pp. 711-719, 1988. C. H. Chen and A. C. Kak, \A Robot Vision System for Recognizing 3-D Objects in Low-Order Polynomial Time," IEEE Trans. Systems, Man and Cybernetics, vol. 19, no. 6, pp. 1535-1563, November/December, 1989. P. J. Flynn, \Saliencies and Symmetries: Toward 3D Object Recognition from Large Model Databases," Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 322-327, 1992. P. J. Flynn and A. K. Jain, \BONSAI: 3-D Object Recognition Using Constrained Search," IEEE Trans. Patt. Anal. Machine Intell., vol. 13, no. 10, pp. 1066-1075, October, 1991. K. Ikeuchi, \Precompiling a geometrical model into an interpretation tree for object recognition in bin-picking tasks," Proceedings of DARPA Image Understanding Workshop, pp. 321-339, 1987. M. Swain, \Object Recognition from a Large Database Using a Decision Tree," Proceedings of DARPA Image Understanding Workshop, pp. 690696, 1988.

Lihat lebih banyak...

Rapid object recognition from a large model database

Descripción

Comentarios