Local phase quantization descriptor for improving shape retrieval/classification

June 8, 2017 | Autor: Alessandra Lumini | Categoría: Cognitive Science, Electrical And Electronic Engineering

Descripción

Pattern Recognition Letters 33 (2012) 2254–2260

Contents lists available at SciVerse ScienceDirect

Pattern Recognition Letters journal homepage: www.elsevier.com/locate/patrec

Local phase quantization descriptor for improving shape retrieval/classiﬁcation Loris Nanni a,⇑, Sheryl Brahnam b, Alessandra Lumini c a

Department of Information Engineering, Via Gradenigo 6/A, 35131 Padova, Italy Computer Information Systems, Missouri State University, 901 S. National, Springﬁeld, MO 65804, USA c DEIS, Università di Bologna, Viale Risorgimento 2, 40136 Bologna, Italy b

a r t i c l e

i n f o

Article history: Received 9 February 2012 Available online 23 July 2012 Communicated by S. Sarkar Keywords: Shape classiﬁcation Local phase quantization Inner distance shape context Shape context Height functions Texture descriptors

a b s t r a c t Shape classiﬁcation is a ﬁeld of study with applications ranging from object classiﬁcation to leaf recognition. In this paper we present an approach based on a matrix descriptor, the local phase quantization, for improving the performance of such widely used shape descriptors as the inner distance shape context (ID), shaper context (SC), and height functions (HF). The basic idea of our novel approach is to transform the shape descriptor obtained by ID/SC/HF into a matrix so that matrix descriptors can be extracted. These matrix descriptors are then compared with the Jeffrey distance and combined with standard ID/SC/HF shape similarity. Since it has recently been shown that ID/SC/HF shape similarities can be coupled with learning context-sensitive shape similarity using graph transduction (LGT) for improving the results, we have also coupled our approach with LGT. Our proposed approach is tested on a wide variety of shape databases including MPEG7 CE-Shape-1, Kimia silhouettes, Tari dataset, a leaf dataset, a tools dataset, a myths ﬁgures dataset, and our new human dancer dataset. The experimental results demonstrate that the proposed approach yields signiﬁcant improvements over baseline shape matching algorithms. All Matlab code used in the proposed paper is available at bias.csr.unibo.it/nanni/Shape.rar. Ó 2012 Elsevier B.V. All rights reserved.

1. Introduction The proliferation of cameras and image capturing devices has resulted in the generation of many large databases of digital images and the simultaneous need to develop machine systems for searching and retrieving relevant images from these databases. Human beings can quickly and very reliably categorize objects even in cluttered environments; however, attempting to manually label images in these databases is cost prohibitive and impractical. Machine classiﬁcation is desirable, but developing machine retrieval and matching systems is complex: images of objects in any given class can vary widely depending on lighting, scale, rotation, and other transformations. Some critical issues that must be addressed in building robust matching systems include strategies for selecting appropriate attributes, image representation, query speciﬁcations, match metrics, and indexing methods. Moreover, different properties can be used to retrieve objects of the same class: texture, brightness, color, and shape. Shape matching, in particular, has been a long standing area of investigation. Numerous shape matching strategies, for example, Fourier analysis, moments analysis, and distance measures, have been proposed (for a survey of methods, see Veltkamp and ⇑ Corresponding author. E-mail addresses: [email protected] (L. Nanni), [email protected] (S. Brahnam). 0167-8655/$ - see front matter Ó 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.patrec.2012.07.007

Hagedoorn, 2001). Since shape is commonly deﬁned mathematically as an equivalence class under a group of transformations, most of these methods have focused on measuring shape similarity. In (Belongie et al., 2002), for instance, a 2D histogram representation, called shape context (SC), is presented. In (Latecki and Lakamper, 2000) parts of objects are represented using simpliﬁed polygons. One of the best performing methods based on similarity measures is the inner distance shape context method (ID) proposed by Ling and Jacobs (2007), where the part structures of shapes are deﬁned using the inner-distance. In (Wang et al., 2012) a novel shape descriptor called the height function (HF) is proposed. In HF, the contour of each object is represented by a ﬁxed number of sample points. For each sample point, HF is deﬁned based on the distances of the other sample points to its tangent line. A problem with approaches based on similarity measures is that differences between shapes vary in signiﬁcance, and the relevance of these differences is not considered. This is the case, for instance, in (Sebastian et al., 2004a; Siddiqi et al., 1999; Shokoufandeh et al., 2005; Gorelick et al., 2006). Shapes belonging to the same class can exhibit large differences because of nonrigid transformations and distortions. What these methods fail to take into account are the intrinsic properties of shapes (Bai et al., 2010). Statistical deﬁnitions of shape, such as those in (Bookstein, 1991; Kendall, 1984), address shape differences but assume correspondences are known. Recently, methods have been developed that consider the

L. Nanni et al. / Pattern Recognition Letters 33 (2012) 2254–2260

relevance of shape differences by capturing the essence of a shape class, as in (Bai et al., 2010) (and brieﬂy discussed in Section 2), where shape similarity is learned within the context of a set of shapes using graph transduction. The best state-of-the-art performances have been obtained by methods that use the similarity learning approach based on shape similarities (see, for instance Ling et al., 2010; Temlyakov et al., 2010; Bai et al., May 2012; Yang et al., 2012; Yang, 2009). In Bai et al., May 2012 a novel way of fusing similarity/distance measures through a semisupervised learning framework, called co-transduction is proposed. The authors of Bai et al. (May 2012) obtained outstanding results in the MPEG-7 data set: 97.72% (bull’s-eye measure) using two shape similarities; 99.06% combining three shape similarities. In (Yang et al., 2012) is shown that a graph diffusion process on tensor product graph (TPG) is equivalent to a novel iterative algorithm on the original graph, which is guaranteed to converge. After its convergence new edge weights are obtained that can be interpreted as new, learnt afﬁnities. In this way the learned metric permits to obtain an accuracy 100% on the MPEG7. Shape retrieval methods have greatly improved the last few years. Ten years ago the best result using the MPEG7 database was under 80%. Today there are several pairwise matching algorithms with performance 90%, and several context-based algorithms have recorded performances between 97% and 99.9% (Bai et al., 2012; Yang et al., 2012). To the best of our knowledge, very few pairwise matching algorithms obtain performance higher than 90% (the highest result we found is reported in (Gopalan et al., 2010) at 93.67%). These results, however, are still lower than those obtained by human beings. In this paper we improve the performance of ID/SC/HF by adding information that can be extracted from the descriptor obtained by ID/SC/HF for representing a given shape. In our approach, ﬁrst the shape descriptor obtained by ID/SC/HF is transformed into a matrix then we use the local phase quantization descriptor (LPQ) to extract information from that matrix. The Jeffrey distance is then used to compare different LPQ descriptors. We combine this information with the standard shape matching approaches for reﬁning their results. By using descriptors based on local neighborhoods, as in LPQ, we are able to investigate the correlation among the elements of the matrix (e.g., orientation bins and the log distance bins in ID/SC descriptor) that belong to a given neighborhood. We are able to improve performance further by coupling our approach with learning context-sensitive shape similarity using the graph transduction technique mentioned above and developed by Bai et al. (2010) and Yang et al. (2012). We label this method LGT in this work. In (Bai et al., 2010; Yang et al., 2012) it is shown that LGT can be coupled with ID for improving performance. In our experiments we conﬁrm that LGT + ID/SC/HF outperforms ID/SC/HF, and our approach, coupled with LGT + ID/SC/HF, outperforms LGT + ID/SC/HF. Finally, we study the fusion of the different shape matchers to obtain very high performance system. As shown in (Bai et al., 2012) also if SC and ID gain comparable retrieval rate their fusion should outperform the stand-alone approaches due to the different shape representations. Our approach is validated using six well-know datasets (MPEG7 CE-Shape-1, Tari, Kimia, tool, myths, and a Leaf Database) and a new human dancer dataset. All experiments reported in this paper can be replicated on these datasets using the MATLAB code of our proposed system at bias.csr.unibo.it/nanni/Shape.rar. The remainder of this paper is organized as follows. In Section 2, we outline our approach and provide mathematical descriptions as well as pseudocodes for implementing LGT, ID, SC, HF, and LPQ. In Section 3, we describe the databases used in our experiments as well as the experimental results. We conclude in Section 4 by summarizing the signiﬁcance of our work and by highlighting future directions.

2255

2. Description of our approach The most important result in this work is that we show that it is possible to extract a set of matrix descriptors from the matrix extracted by ID/SC/HF that improves the performance of ID/SC/HF. Another result is that our method can be coupled with LGT to obtain a better performance than when LGT is coupled with standard shape matchers. Below is a simple pseudocode description of our proposed algorithm: (1) EXTRACT CONTOUR: First all the images of a given dataset are cropped and resized to the same dimension. Then, to handle mirrored and rotated shapes, we extract contour not only from the original image (OR) but also from the ﬂipped left/right1 image (LR). Then each image (original and ﬂipped) is rotated between 22.5° and 337.5° (step 22.5°) and another contour is extracted. In this way 32 different contours are extracted. (2) CALCULATE MATRIX: For each of the contours extracted in the STEP1 ‘‘EXTRACT CONTOUR’’, we calculate a matrix starting from the shape descriptors of ID/SC/HF. (3) DESCRIBE EACH MATRIX WITH LPQ: Each matrix extracted in step 2 is described by LPQ. The matrix is divided in K1 K2 sub-windows without overlap and the LPQ histogram is calculated separately on each of the K1 K2 sub-windows. We have used the same setting in all tests: K1 = K2 = 4 for ID and SC; and K1 = K2 = 2 for HF (due to the lower dimension of the matrix built by HF). Note: we tested several approaches (e.g., Gabor ﬁlters, local binary patterns, histogram of gradients) but only report in this paper the LPQ method that resulted in the best performance. (4) MATCH: Given two shapes, i and j, we compare them (a different descriptor is extracted from each of the images created in the STEP1) using standard ID/SC/HF shape matching approaches. The lower distance is used as the match value between the two shapes. The LPQ descriptors extracted from the contours are compared using Jeffrey distance, and the lower distance is used as match value between the two shapes. Notice that when we compare the two shapes, i and j, to reduce the computation time for the ﬁrst shape we consider only the original contour. While for the latter shape all the contours extracted from each of the images created in the STEP1 ‘‘EXTRACT CONTOUR’’ are considered. (5) NORMALIZE & FUSE: We combine the resulting distances obtained using LPQ and standard shape matchers using the weighted sum rule. Before the fusion, the distances obtained by each method are normalized to mean 0 and standard deviation 1. We have used LPQ with two different radius (see Section 2.4) let use deﬁne these two LPQ descriptors as LPQ5 and LPQ3, let us deﬁne a given shape matcher as SM, then the weighted sum rule is given by 6 SM + LPQ5 + LPQ3. (6) UPDATE DISTANCES (optional): Update the scores using LGT (Bai et al., 2010; Yang et al., 2012). In Fig. 1 we show an example original image, its ﬂipped images, and the ﬂipped image rotated of 180°.

1

Function ﬂiplr.m of Matlab.

2256

L. Nanni et al. / Pattern Recognition Letters 33 (2012) 2254–2260

Fig. 1. Different images and the corresponding ID based descriptor.

2.1. Inner distance shape context2 (ID) The inner-distance is deﬁned as the length of the shortest path between landmark points within a shape silhouette (Ling and Jacobs, 2007). Unlike the Euclidean distance, which does not consider whether a line segment crosses the shape boundary, the inner-distance is sensitive to the structure of parts within a shape while simultaneously insensitive to their articulation (see Ling and Jacobs, 2007 for mathematical proofs regarding these properties). The inner-distance captures the structure of parts by considering interior shape boundaries. Closely related to skeleton techniques, especially the shock-graph (Siddiqi et al., 1999; Kimia et al., 1995; Sebastian et al., 2004b), the inner-distance is computed between two landmark points by ﬁrst approximating the closest points on the shape boundary, or skeleton, and then measuring them. Unlike other skeleton techniques, once these lengths are measured, the inner-distance discards the structure. In this way, the inner-distance is more robust to boundary disturbances and more ﬂexible for building shape descriptors. Given a connected and closed shape O in R2 and two boundary points x; y 2 O, the inner-distance d(x, y; O) between x and y can be deﬁned as the shortest path connecting x to y within O. If more than one such path exists, then one is selected arbitrarily. Below is a simple pseudocode description for computing the inner-distance. (1) BUILD A GRAPH: Take a pair of sample points, p1 and p2. (a) Treat each point as a node in a graph. (b) If the line segment between p1 and p2 falls within the boundary of the object, then add the edge to the graph with a weight equal to the Euclidean distance.

2

http://www.ist.temple.edu/hbling/code_data.htm.

(2) APPLY SHORTEST PATH ALGORITHM TO THE GRAPH (such as those in Cormen and et al. (2001)). In ID, as well as SC described below, for each point of the contour, a histogram is extracted (where the x axis denotes the orientation bins and the y axis denotes the log distance bins). We turn this matrix into a vector, and the 2D histogram from which the LPQ descriptor is extracted is given by the concatenation of all the points of the contour (a different row for each point) (for more details, see the available MATLAB code at bias.csr.unibo.it/nanni/ Shape.rar). 2.2. Shape context (SC) First introduced by Belongie et al. (2002), SC describes the relative spatial distance and orientation, or distribution, of landmark points around feature points. SC can be deﬁned as follows. Given n sample points x1, x2, . . . , xn on a given shape, SC at point xi is a histogram hi of the relative coordinates of the remaining n 1 points:

hi ¼ #fxj : j–i; xj xi 2 binðkÞg;

ð1Þ

where the bins divide the log-polar space uniformly, and the distance between two SC histograms is deﬁned using the chi square statistic. For shape comparison, Belongie et al. use thin-plate-splines (TPS) (see Belongie et al., 2002). In this paper, we use dynamic programming (DP) to replace TPS for the matching process, thereby achieving better performance (Bai et al., 2012). 2.3. Height functions (HF) First introduced by Wang et al. (2012), the HF descriptor is insensitive to noise and occlusion and is invariant to such geometric transformations as translation, rotation, and scaling. HF is com-

L. Nanni et al. / Pattern Recognition Letters 33 (2012) 2254–2260

2257

Fig. 2. Sample where LPQ works well and HF works poorly.

puted by representing each contour of each object by a ﬁxed number of sample points. For each sample point, a height function is deﬁned based on the distances of the other sample points to its tangent line. Let X = {xi} (i = 1, 2, . . . , N) denote the sequence of equidistant sample points along the outer contour of a given shape, with the sampling point i following the contour in an ordered counterclockwise direction. For each sample point, xi, a reference axis is determined using the tangent line li. The tangent line inherits its orientation via the contour orientation. The distance between the jth (j = 1, 2, . . . , N) sample point xj and the tangent line li is then deﬁned as the height value hi,j. The height value hi,j is either positive, negative, or zero depending on whether the jth sample point xj is to the left, to the right, or on axis li. Being positive or negative provides a precise representation of the relative location of xj to the axis li, since both the distance and the side that xj lies on is represented by the sign. Height values are calculated for every sample point. The shape descriptor of point xj with respect to shape X is the ordered sequence of height values: 1

2

N

Hi ¼ ðhi ; hi ; . . . ; hi ÞT ¼ ðhi;i ; hi;iþ1 ; ; . . . ; hi;N ; hi;N ; hi;1 ; . . . hi;i1 ÞT ;

ð2Þ

where hi,j (j = 1, . . . , N) denotes the height value of the jth sample point xj according to the reference axis li of the point xi. Because this descriptor contains the relative location of every sample point to the reference axis, it may be overly sensitive to local boundary deformations. One solution to this problem is to use a smoothing function (see Wang et al., 2012 for details). In our method, LPQ descriptors are extracted from the height representation matrix (for more details, see the available MATLAB code at bias.csr.unibo.it/nanni/Shape.rar). 2.4. Local phase quantization3 (LPQ) [15] Local phase quantization (LQP) operator, ﬁrst proposed by Ojansivu and Heikkila (2008), is a image descriptor based on the blur invariance property of the Fourier phase spectrum. In this work, according to valuable results obtained in (Nanni et al., 2012), we use LPQ as a matrix descriptor for each SCH image. The LPQ method uses the local phase information extracted using STFT computed over a rectangular neighborhood Nx of size M by M at each pixel position x of an image f(x):

Fðu; xÞ ¼

X

Ty

f ðx yÞej2pu

¼ wTu fx ;

ð3Þ

y2Nx

where wu is the basis vector of the 2-D DFT at frequency u, and fx is a vector containing all image samples from Nx. Only four complex coefﬁcients are considered in LPQ, and they correspond to the 2-D frequencies: u1 = [a,0]T, u2 = [0,a] T, u3 = [a, a]T, and u1 = [a,-a]T,, where a is the ﬁrst frequency where the phase of the DFT is shown to be invariant to centrally symmetric blur (Ojansivu and Heikkila, 2008). 3

Code available at http://www.cse.oulu.ﬁ/CMV/Downloads/LPQMatlab.

Then the sign of the real and the imaginary part of each Fourier coefﬁcient is extracted resulting in a binary coefﬁcient. The eight binary code obtained gives the phase information and, if represented as integer values between 0 and 255, produce an image Q. The ﬁnal LPQ descriptor is obtained as the histogram of these integer values. In this paper, we use LPQ with radius 5 and 3 (a difference distance is calculated for each radius, then they are summed). For local frequency estimation, we use the Gaussian derivative quadrature ﬁlter pair. Several distance measures are explored to compare the LPQ descriptors, as implemented in http://www.mathworks.com/ matlabcentral/ﬁleexchange/15935-computing-pairwise-distancesand-metrics. Finally, we obtain the best result using the Jeffrey distance. The rationale of our idea is that by using descriptors based on local neighborhoods, as in LPQ, we are able to investigate the correlation among the elements of the matrix that belong to a given neighborhood (Ojansivu and Heikkila, 2008). For example, when we use ID/SC descriptor we built the matrix (from which LPQ is extracted) considering the orientation bins and the log distance bins of the different points of the contour. Since the LPQ descriptor is extracted from a given region of the matrix it extract a set of features for describing that region and it is affected by the correlation among the elements of that region. Instead standard shape matchers are based on a distance that does not consider the correlation among different elements of the compared descriptors. In the following Fig. 2 we shown samples where LPQ (extracted from HF descriptor) permits good matching while standard HF shape matcher works poorly. 2.5. Learning context-sensitive shape similarity by graph transduction (LGT) Introduced in Bai et al. (2010), LGT learns a new distance function from a set of shapes. Given a query shape q and a similarity function s0, a new similarity function s is learned through graph transduction. Graph transduction has the advantage of focusing on the manifold formed by existing shapes. Moreover, learning is unsupervised and can be built on any shape matching algorithm. Given a set of objects X = {x1, . . . , xn} and a similarity function sim: X X ! Rþ that assigns positive values to pairs of objects, and assuming that x1 in X is the query shape, then the remaining elements in X can be ranked based on their similarity to the query shape. A new similarity function can be iteratively deﬁned to reduce the number of errors in the ranking. This can be accomplished by recursively solving an equation that is a simpliﬁed version of PageRank (Page et al., 1998), which forms the basis of Google search. The parameters of LGT used in our experiments are a = 0.27 and neighborhood size K = 10 (notice that we have used these parameters for all the datasets), for details on the parameters of LGT see (Bai et al., 2010). The afﬁnities calculated with LGT are then used as input for the diffusion process on a Tensor Product Graph (TPG) proposed in Yang et al. (2012). The output of that approach are the similarities used for the retrieval.

2258

L. Nanni et al. / Pattern Recognition Letters 33 (2012) 2254–2260

3. Experimental results To evaluate our approach we use ﬁve well-know datasets: MPEG7 CE-Shape-1 (Latecki et al., 2000), Kimia silhouettes (Sebastian et al., 2004b), Tari dataset (Aslan et al., 2008), Tool dataset (Bronstein et al., 2008), Myths ﬁgures dataset (Bronstein et al., 2008) and the Smithsonian Isolated Leaf Database (Anon). We run experiments as well on our own human dancer dataset. The MPEG7 CE-Shape-1 (Latecki et al., 2000) is a widely used database (Latecki et al., 2000) that contains 1400 silhouette images from 70 classes with 20 shapes each. The recognition rate is measured by the so-called Bullseye test: for every image in the database, it is matched with all other images and the top 40 most similar candidates are counted. The retrieval rate is the ratio of correct hits of the top 40 to the maximum possible number of hits (20 1400). The Tari dataset (Aslan et al., 2008) contains 1000 silhouette images from 50 classes, with 20 images per class. Also in this dataset the Bullseye test is used as performance indicator. The Kimia silhouettes (Sebastian et al., 2004b; Sharvit and et al., 1998) has produced two databases of silhouettes. We use the second (Sebastian et al., 2004) which contains 99 images from 9 classes. Results are summarized based on the Bullseye test (here considering the 10 most similar candidates for each query image). The Smithsonian Isolated Leaf Database (Anon) comes from the Smithsonian project. This dataset contains 343 leaves from 93 species. In the LEAF experiments, the retrieval performance is evaluated using the error rate among the top 5 classes. TOOLS dataset4 (Bronstein et al., 2008), it contains two-dimensional articulated shapes (silhouettes) for non-rigid shape similarity experiments. The data set contains 35 shapes: 5 2 scissors, 5 3 pliers, 5 pincers, 5 knives. Results are summarized based on the Bullseye test (here considering the 5 most similar candidates for each query image). Myths dataset4 (Bronstein et al., 2008), contains two-dimensional articulated shapes (silhouettes) for partial similarity experiments. The data set contains 15 shapes (5 humans, 5 horses and 5 centaurs). Each shape differs by an articulation and additional parts. The retrieval performance is evaluated using the Bullseye test considering the 5 most similar candidates for each query image. The Human Dancer Dataset was created from video sequences of a dance as part of a work performed at Missouri State University. Repeated phrases in larger movements were videotaped from different angles and extracted with the goal of matching the phrases. The extraction method was as follows: 200 videos of 150 frames were extracted from the phrases without overlap; for each of the 200 videos, the most similar of the remaining videos was retrieved (a shape descriptor was extracted from each frame with the distance between two videos computed as the sum of the distances among their 150 frames). The correctness of matches was determined by two expert judges (two choreographers, one of whom choreographed the dance). The phrases ranged in difﬁculty from easy (see Fig. 3, top) to difﬁcult (see Fig. 3, bottom). The images are segmented (foreground extraction) using the ViBe approach available at http://www2.ulg.ac.be/telecom/research/vibe/download.html. The human dancer dataset is available at bias.csr.unibo.itnnannindancerdata.rar. In the benchmark datasets we maintained the same parameters for all the approaches: ID/SC (viz., nd = 8; nh = 12; n = 100); HF (k = 5; M = 19); LGT (K = 0.27; Neighbor = 10;). We do this because our goal is not to optimize performance for each dataset but rather

4

http://tosca.cs.technion.ac.il/book/resources_data.html.

Fig. 3. Example masks from query video sequences in the Human Dancer Dataset. The bottom phrases, some with ﬁgures prone (face down) and others, supine (face up) and rolling, proved difﬁcult to match.

Table 1 Performance, accuracy, comparison among tested methods. Dataset MPEG7

KIMIA

TARI

LEAF

DANCE

TOOL

MYTH

Contour ID SC HF ID + SC ID + SC + HF

85.01 86.24 89.25 86.72 89.98

95.15 96.67 97.07 96.36 97.58

95.12 93.69 94.16 95.71 95.54

85.26 85.26 83.97 87.17 87.82

75 77.5 77.5 80 80

84.57 58.29 84.57 72.57 80.57

77.33 76 90.67 78.67 88

ContTXT ID SC HF ID + SC ID + SC + HF

85.82 86.83 89.83 87.27 90.24

95.56 96.06 97.58 96.06 97.58

95.49 94.25 94.36 96.06 95.64

89.1 87.82 85.9 89.74 88.46

85 85 82.5 82.5 85

85.71 62.29 82.86 75.43 81.71

77.33 77.33 93.33 76 88

Contour + LG ID SC HF ID + SC ID + SC + HF

93.97 94.56 95.31 95.36 96.05

100 100 100 100 100

99.76 98.75 99.61 99.74 99.85

– – – – –

80 79.5 80 82.5 82.5

87.43 53.14 90.66 77.71 86.29

74.67 78.67 94.67 78.67 93.33

ContTXT + LG ID SC HF ID + SC ID + SC + HF

93.97 94.85 95.87 95.42 96.16

100 100 100 100 100

99.93 99.34 99.61 99.90 99.81

– – – – –

85 85 85 86.5 86.5

90.86 61.71 89.71 81.14 85.14

74.67 78.68 93.33 80 90.67

to show that our method improves the performance of shape matchers. The tests reported in Table 1 compare the following methods: ContourðSÞ, the base shape similarity approaches, S = {ID; SC; HF}; ContourðS1 þ S2Þ, fusion by sum rule between S1 and S2; ContTXðSÞ, our proposed system based on the combination of S and texture descriptors; ContTXðS1 þ S2Þ,5 fusion by sum rule between ContTXðS1Þ and ContTXðS2Þ; 5 When we combine ID and SC we use the sum rule. When we combine ID, SC and HF we use the weighted sum rule where the weights of ID and SC are both 1 and the weight of HF is 4.

2259

L. Nanni et al. / Pattern Recognition Letters 33 (2012) 2254–2260 Table 2 Performance comparison with the state-of-the-art in the KIMIA dataset. Approach

Year

SC (Belongie et al., 2002) Generative model (Tu and Yuille, 2004) ID (Ling and Jacobs, 2007) Symbolic (Daliri and Torre, 2008) HF (Wang et al., 2012) ID_here SC_here HF_here ContTX (ID + SC) ContTX (ID + SC + HF) ContTX + LG (ID + SC) ContTX + LG (ID + SC + HF)

2002 2004 2007 2008 2011 – – – – – – –

Accuracy 1st

2nd

3rd

4th

5th

6th

7th

8th

9th

10th

97 99 99 99 99 99 98 99 99 99 99 99

91 97 99 99 99 97 97 99 97 99 99 99

88 99 99 99 99 97 97 99 96 99 99 99

85 98 98 98 99 96 96 98 96 98 99 99

84 96 98 99 98 96 98 98 97 98 99 99

77 96 97 98 99 96 95 97 97 99 99 99

75 94 97 98 99 96 97 98 95 98 99 99

66 83 98 95 96 96 97 95 95 98 99 99

56 75 94 96 95 94 96 95 94 92 99 99

37 48 79 94 88 73 86 83 85 86 99 99

Table 3 Performance comparison with the state-of-the-art (without considering context based algorithms) in the MPEG7 and Tari datasets. Approach

Year

Accuracy (%)

MPEG SC (Belongie et al., 2002) Generative model (Tu and Yuille, 2004) ID (Ling and Jacobs, 2007) Symbolic (Daliri and Torre, 2008) HF (Wang et al., 2012) Here

2002 2004 2007 2008 2011 2012

76.51 80.03 85.40 85.92 90.35 90.24

TARI SC (Belongie et al., 2002) ID (Ling and Jacobs, 2007) ASC (Ling et al., 2010) Here

2002 2007 2010 2012

94.17 95.33 95.44 96.06

Contour þ LGðSÞ, the method ContourðSÞ coupled with LGT; Contour þ LGðS1 þ S2Þ, the method ContourðS1 þ S2Þ coupled with LGT; ContTX þ LGðSÞ, the method ContTXðSÞ coupled with LGT; ContTX þ LGðS1 þ S2Þ, the method ContTXðS1 þ S2Þ coupled with LGT. The results reported in Table 1 experimentally conﬁrm that our idea improves standard shape matchers. The advantage of using our approach is also demonstrated by the use of the Wilcoxon Signed-Rank test (Demsar, 2006). The null hypothesis (that is there is no difference between the performance of the standard shape matchers and our improved versions) is rejected with a level of signiﬁcance of 0.10. It is interesting to note that our proposed approach is notably accurate using the MPEG-7 (see http://www.dabi.temple.edu/ ~shape/MPEG7/results.html, where the accuracy of several approaches are reported), using no ad-hoc parameter tuning (same parameters in all the tested datasets). In the following Table 2 we compare our approaches with the state-of-the-art in the KIMIA dataset. Table 2 reports the performance as follows: one shape contour is selected from the data set as the template and then matched to the remaining shape contours. The top 10 best matches are checked and only matches that are in the same shape class as the template are counted as correct matches. This process is repeated by taking each one of the 99 shape contours in the whole data set as the template. In the following Table 2 we report both the performance of ID/ SC/HF as reported in the original papers and the results that we

have obtained (methods named {ID/SC/HF}_here) using in KIMIA the same parameters used in MPEG7. Finally in Table 3 we report the performance of some state-ofthe art in the MPEG7 dataset. 4. Conclusions In this paper we run experiments using matrix descriptors for extracting features from the shape descriptors for improving the performance of the standard shape matchers (ID/SC/HF). Our idea was to use the shape descriptors obtained from the standard shape matchers as ID/SC/HF. These are represented as a matrix and matrix descriptors are extracted and compared with the Jeffrey distance. These values are then combined with the standard shape similarities. Our proposed methods were tested using six benchmark database (MPEG7 CE-Shape-1, Kimia, Tools, Tari, Myths ﬁgures and the Leaf Database) as well as our own human dancer dataset. The presented experimental results demonstrate that the proposed approach provides signiﬁcant improvements over the baseline shape matching algorithms. In the future we plan to study different matrix descriptors on this problem as well as better methods for combining the different distances. Acknowledgments This work was funded in part by Missouri State University Futures Grant: AI-ARTISTIC PROC 6-2011. The authors would like to thank Ruth Barnes and Darryl Clark for their contributions to the dancer database. References Anon. An electronic ﬁeld guide: plant exploration and discovery in the 21st century. http://www1.cs.columbia.edu/cvgc/efg/index.php. Aslan, C., Erdem, A., Erdem, E., Tari, S., 2008. Disconnected skeleton: shape at its absolute scale. IEEE Trans. Pattern Anal. Mach. Intell. 30 (12), 2188–2203. Bai, X. et al., 2010. Learning context-sensitive shape similarity by graph transduction. IEEE Trans. Pattern Anal. Mach. Intell. 32 (5), 861–874. Bai, X., Wang, B., Yao, C., Liu, W., Tu, Z., 2012. Co-Transduction for Shape Retrieval. Image Processing, IEEE Transactions on 21 (5), 2747–2757. Belongie, S., Malik, J., Puzicha, J., 2002. Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24 (24), 509–522. Bookstein, F.L., 1991. Morphometric tools for landmark data: Geometry and biology. Cambridge University Press. Bronstein, A.M., Bronstein, M.M., Bruckstein, A.M., Kimmel, R., 2008. Analysis of two-dimensional non-rigid shapes. Intl. J. Computer Vision (IJCV) 78/1 (6), 67– 88. Cormen, T.H. et al., 2001. Introduction to algorithms, second ed. The MIT Press, Cambridge. Daliri, M.R., Torre, V., 2008. Robust symbolic representation for shape recognition and retrieval. Pattern Recogn. 41 (5), 1782–1798.

2260

L. Nanni et al. / Pattern Recognition Letters 33 (2012) 2254–2260

Demsar, J., 2006. Statistical comparisons of classiﬁers over multiple data sets. Journal of Machine Learning Research 7, 1–30. Gopalan, R., Turaga, P.K., Chellappa, R., 2010. Articulation-invariant representation of non-planar shapes. In: Proc. ECCV, 2010, pp. 286–299. Gorelick, L. et al., 2006. Shape representation and classiﬁcation using the poisson equation. IEEE Trans. Pattern Anal. Mach. Intell. 28 (12), 1991–2005. Kendall, D., 1984. Shape manifolds, procrustean metrics and comples projective spaces. Bulletin of London Mathematical Society 16, 81–121. Kimia, B.B., Tannenbaum, A.R., Zucker, S.W., 1995. Shapes, shocks, and deformations, I: The components of shape and the reaction-diffusion space. Int. J. Comput. Vision 15 (3), 189–224. Latecki, L.J., Lakamper, R., 2000. Shape similarity measure based on correspondence of visual parts. IEEE Trans. Pattern Anal. Mach. Intell. 22 (10), 1185–1190. Latecki, L.J., Lakamper, R., Eckhardt, U., 2000. Shape descriptors for non-rigid shapes with a single closed contour. In: IEEE Conference on Computer Visions and, Pattern Recognition, pp. 424–429. Ling, H., Jacobs, D.W., 2007. Shape classiﬁcation using the inner-distance. IEEE Trans. Pattern Anal. Mach. Intell. 29 (2), 286–299. Ling, H., Yang, X., Latecki, L., 2010. Balancing deformability and discriminability for shape matching. In: Proc. ECCV 2010, pp. 411–424. Nanni, L., Brahnam, S., Lumini, A., 2012. Matrix representation in pattern classiﬁcation. Expert Syst. Appl. 39 (3), 3031–3036, 15 February. Ojansivu, V., Heikkila, J., 2008. Blur insensitive texture classiﬁcation using local phase quantization. In: ICISP2008. Page, L., et al., 1998. The pagerank citation ranking: Bringing order to the web. In: Stanford Digital Libraries Working Paper 1998.

Sebastian, T.B., Klein, P.N., Kimia, B.B., 2004a. Recognition of shapes by editing their shock graphs. IEEE Trans. Pattern Anal. Mach. Intell. 25 (5), 116–125. Sebastian, T.B., Klein, P.N., Kimia, B.B., 2004b. Recognition of shapes by editing their shock graphs. IEEE Trans. Pattern Anal. Mach. Intell. 26 (5), 550–571. Sharvit, D. et al., 1998. Symmetry-based indexing of image database. J. Vis. Commun. Image Represent. 9 (4), 366–380. Shokoufandeh, A. et al., 2005. Indexing hierarchical structures using graph spectra. IEEE Trans. Pattern Anal. Mach. Intell. 27 (7), 1125–1140. Siddiqi, K. et al., 1999. Shock graphs and shape matching. Int. J. Comput. Vision 35 (1), 13–32. Temlyakov, A., Munsell, B., Waggoner, J., Wang, S., 2010. Two perceptually motivated strategies for shape classiﬁcation. In: Proc. CVPR 2010, pp. 2289– 2296. Tu, Z., Yuille, A.L., 2004. Shape matching and recognition using generative models and informative features. In: ECCV: European Conference on Computer Vision, pp. 195–209. Veltkamp, R.C., Hagedoorn, M., 2001. State of the Art in Shape Matching. Principles of Visual Information Retrieval, 89–119. Wang, J., Bai, X., You, X., Liu, W., Latecki, L.J., 2012. Shape matching and classiﬁcation using height functions. Pattern Recogn. Lett. 33 (2), 134–143. Xingwei Yang, Suzan Köknar-Tezel, Longin Jan Latecki: Locally constrained diffusion process on locally densiﬁed distance spaces with applications to shape retrieval. In: CVPR 2009. Yang, Xingwei, Prasad, Lakshman, Latecki, Longin Jan, 2012. Afﬁnity learning with diffusion on tensor product graph. IEEE Trans. Pattern Anal. Mach. Intell. 28 (2).

Lihat lebih banyak...

Local phase quantization descriptor for improving shape retrieval/classification

Descripción

Comentarios