Incremental pairwise discriminant analysis based visual tracking

July 8, 2017 | Autor: Xinbo Gao | Categoría: Engineering, Neurocomputing
Share Embed


Descripción

Neurocomputing 74 (2010) 428–438

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Incremental pairwise discriminant analysis based visual tracking Jing Wen a, Xinbo Gao a, Xuelong Li b,n, Dacheng Tao c, Jie Li a a

School of Electronic Engineering, Xidian University, No.2, South Taibai Road, Xi’an 710071, Shaanxi, P. R. China Center for OPTical IMagery Analysis and Learning (OPTIMAL), State Key Laboratory of Transient Optics and Photonics, Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an 710119, Shaanxi, P. R. China c School of Computer Engineering, Nanyang Technological University, Singapore b

a r t i c l e in f o

a b s t r a c t

Article history: Received 5 February 2010 Received in revised form 28 April 2010 Accepted 26 July 2010 Communicated by Qingshan Liu Available online 27 August 2010

The distinguishment between the object appearance and the background is the useful cues available for visual tracking, in which the discriminant analysis is widely applied. However, due to the diversity of the background observation, there are not adequate negative samples from the background, which usually lead the discriminant method to tracking failure. Thus, a natural solution is to construct an object–background pair, constrained by the spatial structure, which could not only reduce the negsample number, but also make full use of the background information surrounding the object. However, this idea is threatened by the variant of both the object appearance and the spatial-constrained background observation, especially when the background shifts as the moving of the object. Thus, an incremental pairwise discriminant subspace is constructed in this paper to delineate the variant of the distinguishment. In order to maintain the correct the ability of correctly describing the subspace, we enforce two novel constraints for the optimal adaptation: (1) pairwise data discriminant constraint and (2) subspace smoothness. The experimental results demonstrate that the proposed approach can alleviate adaptation drift and achieve better visual tracking results for a large variety of nonstationary scenes. & 2010 Elsevier B.V. All rights reserved.

Keywords: Pairwise discriminant analysis Log-Euclidean Riemannian Incremental learning Visual tracking

1. Introduction Visual tracking is a fundamental and challenging task in pattern recognition and computer vision, which has wide application in video surveillance [52], robot, human–machine interaction [42,45,46], and object recognition [31,50]. Influenced by the view, illumination variation, and shape deformation, etc., the change of the object may ruin the prespecified visual measurement (or observation) model and lead to tracking failure. Most existing tracking methods can be classified into two types of approaches. One is to exploit the invariant feature [3] of the object. However, it is very difficult to find invariants, although learning methods [1,2,6,12,16] can be employed. Moreover, these kinds of methods usually need an off-line training process. The other type of approaches is to adapt the visual model to the changes, e.g., by online updating the appearance models [5,7,10,14,15], or selecting the best visual features [8,17–19,24] during the procedure of the tracking. Compared to the invariantsbased methods, the adaptation-based methods are more flexible, since the measurement models are adaptive or the features used for tracking can be adaptively selected [20,21,28].

n

Corresponding author. E-mail address: [email protected] (X. Li).

0925-2312/$ - see front matter & 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.neucom.2010.07.014

However, the adaptation drift, i.e., the appearance models adapt to other image regions rather than the object of interest and lead to tracking failure, can be seen commonly in most existing adaptation-based methods. Many methods have been proposed to alleviate the drift, e.g., by enforcing the similarity to the initial model [8,10]. In most existing adaptive tracking methods, the model at current time instant is update by the new data that are closest to the model at previous time step, with a hidden assumption that the optimal model up to time t  1 also adapt for time t. Unfortunately, this assumption may not hold when the new data is far away from the model. The nature of the adaptive tracking problem lies in a chickenand-egg dilemma [21]: the right data at time t are found by the right model at time t, while the right model can only be adapted by using the right data at time t. Thus, a supervised mechanism may be required to introduce the negative information, here the background in the tracking issue, to constrain the correct object. If no constraints are enforced, any new data can lead to a valid and stable adaptation, since the adapted model tends to best fit the new data. Therefore, we could introduce the discriminative scheme, which usually need to obtain effective and typical negative samples from the background, as well as good data-driven constraints from the image observations at the current time instant, and they should be reasonable and allow a wide range of adaptation.

429

J. Wen et al. / Neurocomputing 74 (2010) 428–438

In this paper, the general adaptation problem is substantialized as a pairwise discriminant subspace adaptation problem in nonstationary appearance tracking. By discriminating the positive and negative data, the optimal object state is estimated with the largest separation. Moreover, the discriminant subspace could not only represent the object appearance variant, but also exclude the background observation far from the object. This subspace could be obtained under the assumption that there exists a kind of object–background data pair, and have the characteristic that in the discriminant subspace, it is only the object–background data pair can get the large measurement, otherwise the data pair would be other type of data pair, e.g., the background–background data pair. Here we could also pose the object–background pair as the positive–negative data pair. In order to obtain the discriminant subspace, the data pair to construct the subspace should satisfy the condition that the relationship between the positive and negative sample should be constrained. In the tracking problem, this relationship could be the spatial, for example, the relative location. Relative to the positive class, the negative class usually takes on diversity; it is hardly to obtain the sufficient samples. However, for the visual tracking, the negative class only distribute in the non-object region, be more exact, the region surrounding and near the object of interest. Therefore, the object would be tracked if the excellent discriminant scheme is built, thus the object of interest is found right among the sample pair satisfying the assumption above. Moreover, with the assumption that the variant during the short interval should be linear, the subspace smoothness is adopted to constrain the discriminant subspace update, in case that the subspace would bend to arbitrary disturbance. In the next section, we briefly review some related tracking algorithms in terms of different observation models, afterwards the dilemma in the traditional adaptation schemes is investigated. Our approach is proposed to deal with the dilemma of the adaptation is elaborated in Section 3. In Section 4, we present the tracking flow based on the incremental pairwise discriminant analysis. The experimental results and discussions are presented in Section 5. The concluding remarks are given in Section 6.

2. Related work and motivation Object tracking could be formulated into a dynamic system, which mainly depends on two respects: the dynamic association and the measurement matching. Due to the state transformation usually formulated as the one-order Markov process, the dynamic association predicts the state parameters of the object at time t based on the previous time t 1. In the measurement matching, the similarity of the observed evidence and the visual model is computed to estimate the optimal state. In tracking, we usually use the observation model instead of the measurement matching. Generally, the performance of the tracking depends much more on the observation model than the dynamic association. Without any prior about the moving object, the appearance based methods are more stable and general compared to the others. Whereas, whether the optimal observation could be obtained usually depends on the ability of the object appearance model to describe the object. Here, we would investigate the visual observation model from the respect that whether the appearance model is updated as the tracking process. For the type of no update, i.e., the fixed model, the tracking procedure mainly directed by the similarity between the feature of the image observations z, which can be edge, color histograms, feature points, etc. However, the image feature of the object appearance would also be influenced by many factors, e.g., illumination and occlusion, as well as the deformation, which

would lead the fixed model to two kinds of limitations: (1) weak generalization ability and (2) the possibility of the complex computation for training the appearance model. Moreover, the off-line training process is usually required. Owing to the impact of the factors above, the simple appearance model could not cover all the variant of the object appearance. Although the first limitation could be improved by enumerating the appearance as many as possible, the construction of the appearance model would refer to the large computational consumption, especially the nonlinear manifold is concerned. Thus the updating schemes for the appearance model are exploited during the tracking. In general, there is a common assumption that the manifold during a short time interval is linear [10,23]. The nonlinear manifold is approximated by piecewise linear subspace [9] or mapped to low-dimensional manifold [25] using nonlinear mapping, or the learned general subspace could be updated to a specific one during the tracking [22], or the multilinear methods [38–41,51,52] are employed for modeling the object. Among these methods, model drift is one of the common and fundamental challenges. On the basis of the assumption about the linearity over a short time interval, we assume the object appearances (or visual features) z A Rm lie in a linear subspace spanned by r linearly independently columns of a linear transform A A Rmr , i.e., z is a linear combination of the columns of A, z ¼Ab. The projection of z on the subspace Rr is given by the least square solution of z¼Ab, i.e., b ¼ ðAT AÞ1 AT z T

ð1Þ

1

where ðA AÞ is the pseudo-inverse of A. The reconstruction of the projection in Rr is given by z~ ¼ AAT , z ¼ Pz

ð2Þ T

mm

is called the projection matrix. The where P ¼ AA A R subspace L delineated by a random vector process {z} is given by the following optimization problem: 2

P ¼ arg min99zPz99 P

ð3Þ

The optimization problem is equivalent to apply the principal component analysis on the data. In the tracking scenario, the problem [5,8,11,13] becomes 8 2  > 99zðxt ÞPt1 zðxt Þ99 > < xt ¼ arg min xt ð4Þ 2  > 99zðxt ÞPt zðxt Þ99 > : Pt ¼ arg min Pt

where xt is the motion parameters to be tracked, xt is the optimal motion parameter estimated by Pt  1. With this setting, we are facing a dilemma: if {xt} cannot be determined, then P neither can, nor vice versa. Namely, given any tracking result, good or bad, we can always find an optimal subspace that can best explain this particular result. Though the discriminant information may be applied, the dilemma still exists. Moreover, the negative samples are hardly selected typically. The reason for that is there are no constraints on the relationship of the positive and negative sample for the discriminant, as well as on P for the subspace update.

3. Incremental pairwise discriminant analysis According to the analysis in Section 2, it is clear that we should make full use of the background observation surrounding the object as the discriminative information, and set reasonable constraints to the adaptation appearance model. It would be a

430

J. Wen et al. / Neurocomputing 74 (2010) 428–438

reasonable appearance model if the following characteristics are met: (1) It could discriminate the object from the background. (2) It has the consecutiveness between the successive models, when the incremental scheme is involved. Therefore, we impose the following two constraints for the visual appearance model: Pairwise discriminant constraint: Given a pair of data points, it is much easier to determine whether or not they belong to the same class. Enlightened by this idea, the observed evidences are provided by the pairwise manner. The data pair projected on the discriminant subspace is separated far from each other, and then it is possible the true object exists among the data pair. The farther the separation of the data pair, the most possibly the positive data are the true object. It should be noticed that the data pair should maintain some structure relationship. Subspace smoothness constraint: The smoothness constraint is very important for the discriminant subspace, since the subspace at time t is updated on the basis of the subspace at time t  1, with the assumption that the difference between the consecutive subspaces is small. As shown in Fig. 1, the regions in red rectangle denote the object of interest, the regions between the red and blue dashed rectangle correspond to the object are the background observations, which are the only negative information should be concerned at the current time/frame. By projecting the data pair into the discriminant subspace, we could determine which pair is the most possibly the object–background pair. The green line denotes the discriminant subspace update online. The set within the green line is the positive class, while the set out of the green line is negative class. Under the help of the pairwise dicriminant constraint, the discriminant subspace could be delineated and updated by the new arrival data pair. Due to the current subspace takes advantage of the previous subspace, the conjunctive subspaces should have little difference. The smoothness is the powerful basis for constraining the discriminant subspace when it is updated. Note that the data pairs are obtained by specific spatial structure relationship, such as in Fig. 1. 3.1. Formulation of the appearance model According to the demonstration above in this section, an optimal subspace should have three features: firstly, it is premised that the positive data have larger projection on this subspace, that is, the 2 larger of the projection 99ATt Ctþ At 99 the better ability of the subspace on expressing the positive class; secondly, the negative 2 data have smaller projection 99ATt Ct At 99 , i.e., the negative data are far away from their projection on the subspace; thirdly, the current subspace should be close to the previous one. The optimal subspace at current time t should be formulated as 2

2

2

min J0 ðAt Þ ¼ minf99ATt Ct At 99 a99ATt Ctþ At 99 þ b99Pt Pt1 99F g At

At

ð5Þ

 T where Ctþ ¼ ztþ ztþ T and C are the positive and negative t ¼ zt zt covariance matrix at time t, respectively, b 40 is a weighting factor, a 40 is a tuning parameter, 99U99F is the Frobenius norm operation. The aforementioned properties could be reflected in the terms in Eq. (5). The optimal subspace would ensure ztþ have large projection and z t with small value, and the projection matrices Pt and Pt  1 in successive frames are close. Note that both the positive and negative data should remove their respective mean. Eq. (5) also can be approximately rewritten as 2

min J1 ðAt Þ ¼ minftraceðATt Ct- At Þa traceðATt Ctþ At Þ þ b99Pt Pt1 99F g At

At

ð6Þ for the purpose of computational convenience. The solution to the problem in Eq. (6) is given by Pt ¼ UUT, where U is constituted by the r eigenvectors that corresponds to the r smallest eigenvalues of a matrix C^ ¼ Ct aCtþ þ bðIPt1 Þ

ð7Þ

When requiring that At is spanned by r orthogonal vectors, then At ¼U, since At might be not unique without this assumption. 3.2. Incremental learning for the pairwise discriminant subspace Many discriminant analysis methods have been proposed in [29,30,33–37,44,47], as well as some matrix decomposition methods [43,48,49] developed to improve the proposed discriminant analysis. But most of these discriminant methods would be computational and memory intensity to keep all the data to determine the discriminant subspace. At the same time, the negative samples would be much more than the positive samples, due to the diversity of the negative class. Therefore, an incremental updating scheme is adopted to approximate the true distribution of both object and background observation, and keep the discriminant between the pos- and neg-class. Note that the background (or negative) data have same number as the object, since the requirement of the pairwise constraints on the data pair is one-to-one form, that is, the object–background pair. Both the object and background in this paper are presented by the covariance matrices described in the following section. As the result, one image observation will be generated to pos- and negsamples as shown in Fig. 2. In Section 3.2.2, an incremental scheme on the pairwise discriminant subspace is introduced. 3.2.1. Covariance variable based feature descriptor The observed evidence (or image feature) here is represented by the covariance matrix descriptor proposed by Tuzel et al. [27]. Denote I as a W  H one-dimensional intensity or three-dimensional color image, and F as the W  H  d dimensional feature image extracted from I: Fðx,yÞ ¼ cðI,x,yÞ

Fig. 1. Pairwise discriminant constrains.

ð8Þ object

object

background

background

Fig. 2. Covariance descriptor for the object and background.

431

J. Wen et al. / Neurocomputing 74 (2010) 428–438

where c is a function for extracting image features. For a given rectangle region RCI, denote {fi}i ¼ 1,y,L as the d-dimensional feature points obtained by c within R. Consequently, the image region R can be represented as a d  d covariance matrix P CR ¼ ð1=L1Þ Li ¼ 1 ðfi mÞðfi mÞT , where m is the mean of the {fi}i ¼ 1,y,L. For our tracking issue, there are two covariance matrix with the object and background observation in some image region as shown in Fig. 2. We define the c(I,x,y) as " # qffiffiffiffiffiffiffiffiffiffiffiffi 9Ix 9 2 2 x y9Ix 9 9Iy 9 Ix þIy 9Ixx 9 9Iyy 9arctan ð9Þ 9Iy 9 where x and y are the pixel location, Ix,Ixx,y are intensity derivatives, and the last term is the edge orientation. The covariance description is used in this paper, since the shape of the image region would not be concerned when the covariance matrix is computed. In this paper, the background region would be selected in larger size than the object region at the same center as the object region. In this paper, the region size for the background is 2.5 times than that for the object. Thus, for an observed evidence, there are two parts zt ¼ fztþ ,z t g, the image feature for a sample is {Cobject, Cbackground}, which is obtained by computing the covariance matrix of the object region and background region with the object part removed. 3.2.2. Incremental pairwise discriminant analysis Based on the research of the Riemannian metric, it can be easily drawn the conclusion that both the object and background covariance metric are symmetric positive definite (SPD) matrix [26] lying on a connected Riemannian manifold. Enlightened by the work of Arsigny et al. [26] on log-Euclidean Riemannian metric for statistics on SPD matrices, in this section, an incremental learning for pairwise discriminant analysis will be exploited in detail. In our tracking framework, the data is represented by two covariance matrices, Cobject and Cbackground from Fig. 2, supposed to be the object and background observation, respectively. By the log-Euclidean mapping, the two covariance tensor Cobject and Cbackground are transformed into ( Lgobject ¼ flogCobject g ð10Þ Lgbackground ¼ flog Cbackground g which is called the log-Euclidean covariance tensor. Due to the vector space structure of log C under the log-Euclidean Riemannian metric, log C is unfolded into a d2-dimensional vector z which is formulated as ( zobject ¼ UTflog Cobject g ð11Þ zbackground ¼ UTflog Cbackground g where UT(U) is an operator unfolding a matrix into a column vector. The classic R-SVD algorithm [4] efficiently computes the singular value decomposition of a dynamic matrix with newly added columns or rows, based on the existing SVD. However, the R-SVD algorithm [4] is based on the zero mean assumption, leading to the failure of tracking subspace variabilities. Based on [4], Lim et al. [14] improves the R-SVD algorithm to compute the eigenbasis of a scatter matrix with the mean update. Based on the improved R-SVD [14], we apply the update method to the object and background subspace, so as to keep the C þ and C  instead of maintaining all the data so far, Table 1 is the pseudo code for updating the pairwise discriminant subspace. In Table 1, a, b 40 are weighting factors, and 0 o l r1 is the forgetting factor used to alleviate the influence of the old data on 7 the subspace update. Notice that the old covariance matrices Cold can also be substituted by the eigenvector from SVD decomposi7 tion of the old data, when the energy of Cold can be ignored for the consideration of memory saving.

Table 1 Incremental learning for pairwise discriminant subspace. Input: 2

new data: zðiÞ object ,zðiÞ background A Rd , i¼1,y,l new data number: l 7 old covariance matrices: Cold old discriminant matrix: Uold 7

old data mean: I old old data number: n Output: 7 new covariance matrices: Cnew new discriminant matrix: Unew 7

new data mean: I new updated data number: n 7

1. compute the new data mean I new ¼

7 7 ln l ln þ l I old þ ln þ l Iunew

2. compute the new covariance matrices lnl 7 7 2 7 7 7 ðIu 7 I ÞðIu 7 I ÞT , ¼ l Cold þ C unew þ Cnew ln þ l new old new old

7 where Iunew ¼ þ ¼ Cunew

Cu new ¼

1 l

l P

l P zðiÞ , background i¼1 ðiÞ ðiÞ T þ þ ðz I unew Þðzobject I unew Þ i ¼ 1 object

i¼1 X l

Xl

zðiÞ , I u new ¼ object

i¼1

1 l

ðiÞ T  ðzðiÞ Iu new Þðzbackground Iunew Þ background

3. compute the new discriminant matrix:  þ ½U,D ¼ SVDðCnew aCnew þ bðIPold ÞÞ, T where Pold ¼ Uold Uold , I is a unit matrix Unew is the r eigenvector of U corresponding to the largest r eigenvalue of D. 4. Compute the updated number n¼ ln+ m

Complexity: The space consumption for incremental pairwise discriminant analysis algorithm is O(d4) if the old data is stored in 7 Cold , while it will be O(d2) if the old data is stored in the form of {eigenvector, eigenvalue}, which are only kept by most energy, say, 90%. The computation cost is: O(d4)from updating the variance matrix, O(d6)from SVD decomposition. In the paper, the feature number d will be 7 for gray image and 23 for color image.

4. Visual tracking based on incremental pairwise discriminant subspace The tracking procedure is in the frame of Bayesian state inference, assuming the motion between the consecutive frames to be an affine motion. Let xt denotes the state variable describing the affine motion parameters of an object at time t. Given a set of observed evidence zt ¼ fz1 ,. . .,zt g, the posterior probability is formulated by Bayesian theorem as Z ð12Þ pðxt 9zt Þppðzt 9xt Þ pðxt 9xt1 Þpðxt1 9zt Þ dxt1 where pðzt 9xt Þ denotes the observation model, and p(xt9xt  1) represents the dynamic model. In the tracking framework, we apply an affine image warping to model the objet motion of two consecutive frames. The six parameters of the affine transform are used to model p(xt9xt  1) of a tracked object. Let xt ¼(xt,yt,rt,st,at,kt) where the six parameters denote the x, y translations, the rotation angle, the scale, the aspect ratio, and the skew direction at time t, respectively. Due to the motion of the object from one frame to the next can be modeled by a one-order Markov model, the state parameter at time t depends on the last time t  1, a Gaussian distribution is used to describe the state transform as pðxt 9xt1 Þ ¼ Nðxt ; xt1 , SÞ

ð13Þ

432

J. Wen et al. / Neurocomputing 74 (2010) 428–438

where S is a diagonal covariance matrix whose elements are the corresponding variances of affine parameters, i.e., s2x , s2y , s2r , s2s , s2a , s2k . The observation model pðzt 9xt Þ reflects the probability that a sample is generated from the subspace. In this paper, the evidence consists of two sets of data with the positive and negative sample zt ¼ fztþ ,z t g as shown in Fig. 2. The similarity of the sample to the discriminant subspace also is constituted by two parts as þ

þ

T ðztþ I Þ99 pðzt 9xt Þpexpf½99ðztþ I ÞUold Uold

ð14Þ

where the first term on the right hand in the exponential operation in Eq. (14) which denotes the reconstructive error of the positive sample on the subspace, while the second term represents the projection of negative sample on the subspace. According to the analysis in section, the subspace would make the positive sample close to the subspace with small reconstruction error, and the relative negative sample far away from the subspace with small projection. The entire procedure of the proposed algorithm is summarized as follows:

 Initialization: At t ¼0, the object x is specified by the user input. The covariance descriptions for the object and background are computed, and the prior for the affine parameter should be provided. Iteration: For t 40, perform the following 3 steps until the video is over: J Step 1: Generate the samples with the affine parameter, warp the frame by the affine parameters, and compute the covariance description of the sample image patch. J Step 2: Compute the weights of the samples by Eq. (14), keep the state parameter with maximum weight as the optimal estimation, and draw the tracking result. J Step 3: Keep the feature data of the optimal estimation into a buffer memory. While the data number in the buffer is

Table 2 The test video sequences.

Dudek Indoor Toy Basketball Corrider Skiing

In this paper, the sample number is set to 100 and l is 5.

5. Experimental results and discussion

2

 2 T þ 99Uold ðz t I Þ99 g



more than l, apply the incremental learning in Section 3 to update the pairwise discriminant subspace.

Colorful

Object type

No. of frames

No. of d

No No No Yes Yes Yes

Face Face Toy Human body Pedestrian Human body

500 340 1100 280 400 300

7 7 7 23 23 23

In order to evaluate the performance of the proposed tracking algorithm, we collected six videos as shown in Table 2. The human face, pedestrians and toy as the tracking objects, where the first three videos are captured indoor undergoing large pose variant and drastic illumination, the last three video sequences are recorded with moving human in shopping center, basketball court and water-skiing in a lake. Among the six video sequences, the first three video are gray and the others are colorful. 5.1. Tracking a human face In this section, we desire two experiments to evaluate the tracking performance mainly undergoing pose variant and occlusion, in Exp1 and Exp2, respectively. Exp1: In this experiment, the test video sequence is the ‘‘dudek’’, which is widely used in various tracking research, with the characteristics: occlusion, fast motion, pose, and appearance variant. The appearance change in this video sequence is drastic and challenging for many tracking method. Our experiment shows that the discriminant subspace stick to the difference of the true object and background. The pairwise condition constrains the discriminant subspace close to the object subspace and far away from the background subspace as shown in Fig. 3. Exp2: The video sequence ‘‘indoor’’ is taken indoor with two people walking around the camera. The human face of interest has much appearance variant because of the drastic pose change and occlusion. Our experimental results show the good tracking performance even when the face recovers from the large pose change. Generally, the tracker would drift the appearance to the most likelihood sample and lose the track when the drastic appearance variant happens. In our method, the tracker could find the object by the pairwise discriminant analysis with background information as shown in Fig. 4. 5.2. Tracking a toy object The video sequence ‘‘sylv’’ is also a challenging video for the tracking task because the cumulated error during the long-time

Fig. 3. The tracking results in video sequence ‘‘dudek’’ with the frame number #46, #120, #133, #158, #165, # 193, #207, and #215, in order.

J. Wen et al. / Neurocomputing 74 (2010) 428–438

433

Fig. 4. The tracking results in video sequence ‘‘indoor’’ with the frame number #12, #43, #54, #62, #72, #82, #88, #106, #111, #170, #180, #220, #229, #280, #285, and #297, in order.

process. Though many trackers could get good performance in the forepart frame of the sequence ‘‘sylv’’, they usually could not bear the long duration of the tracking, due to that the appearance model has been gradually adapted to the observed evidence which is so different from the object. As shown in Fig. 5, our approach could keep good tracking performance with the dicriminant analysis from the background information.

5.3. Tracking a human body In this section, the test video sequences are colorful, the memory and computational consumption are much larger than those in gray videos, due to the image feature will be computed in each channel of the color images. In this part of experiments, the objects of interest are human body, which have more appearance deformation than face, since the free degree of the human body is much more than face. Therefore, the human body would be more challenging in the nature video with much non-rigid motion. Exp1: In this video sequence as shown in Fig. 6, the object of interest is a human playing basketball, whose appearance undergoes the influence of the pose, scale variant, and the occlusion by the other objects. Note that the background around the object has the similar color to the object in about frame #106, #118, and #126. However, the tracker in our method could still track the

object effectively, because our tracker could keep the consistency of the discriminant subspace during the subspace update. Exp2: As shown in Fig. 7, the object of interest is a pedestrian walking far away from the camera. Although the object appearance itself has little change, the observed image evidence is still influenced by the illumination and occlusion by the other people. The results in about frame #100 and #216 show the robustness of our tracking method. Exp3: In the ‘‘skiing’’ sequence, the object of interest is a waterskiing person, who makes many actions during his skiing, with the appearance deformation resulted from the motion variant. As shown in Fig. 8, the proposed tracking method could get good performance, though the object appearance takes on deformation.

5.4. Comparison results In this section, we use the sequences ‘‘indoor’’ and ‘‘sylv’’ to compare the tracking performance of the tracker in [14,32], and our approach. In [14], the improved incremental subspace learning (ISL) is proposed to compute eigenvectors of the updated scatter matrix with the mean updated. The ISL tracker gets good tracking performance in a short-time tracking. However, this is achieved under the condition that the optimal estimated object appearance data in early period should not be polluted by drastic

434

J. Wen et al. / Neurocomputing 74 (2010) 428–438

Fig. 5. The tracking results in video sequence ‘‘sylv’’ with the frame number #67, #98, #136, #156, #169, #259, #328, # 423, #593, #654, #839, #849, #855, #976, #998, and # 1016, in order.

Fig. 6. The tracking results in video sequence ‘‘basketball’’ with the frame number #18, #76, #81, #104, #118, #126, #148, and #182, in order.

J. Wen et al. / Neurocomputing 74 (2010) 428–438

435

Fig. 7. The tracking results in video sequence ‘‘corridor’’ with the frame number #11, #68, #100, #142, #177, #192, #218, and # 270, in order.

Fig. 8. The tracking results in video sequence ‘‘skiing’’ with the frame number #6, #46, #71, #91, #99, #127, #158, and #169 in order.

deformation and the disturbing of the extinct surrounding, such as, illumination and occlusion; otherwise, the error would be cumulated in the procedure of the subspace updating, and thus losing the track. In ISL, the subspace for tracking only depends on the maximum likelihood of the sample to the subspace, the tracker would be lost, once the tracker start to adapt the subspace to the observed image region, which does not cover the true object, tracker would be lost. In [32], the subspaces used to describe the object of interest are constructed by five types of covariance matrices with the transformation to the log-Euclidean Riemannian matrices, and the five covariance matrices are updated by the same mechanism as [14]. The incremental log-Euclidean Riemannian subspace learning (IRSL) tracker is supposed to consider most condition of the possible object appearance under the five covariance matrix. However, without the consideration of the background knowledge, the tracker in [32] could not get stable tracking result in the video with the similar texture and intensity in the background to the object. Moreover, due to the computation of five subspaces update and measurement in observation, the tracker in [32] processes very computationally.

Exp1: As shown in Fig. 9, the top, middle and bottom row is the tracking results in [14,32], and our approach, named as IPDA. As the analysis above, due to the requirement of the subspace update in the early period in ISL is not satisfied, the ISL tracker would lose the object in the top row in Fig. 9. While the IRSL tracker takes on instable tracking results, though the tracker could cover the object region as shown in the middle row in Fig. 9. The proposed method could get the stable tracking results, even though the object undergoes drastic motion and pose variant. Exp2: In the comparison in video sequence ‘‘sylv’’ in Fig. 10, the ISL tracker could get good performance in the early procedure of the tracking. However, the tracker starts to be lost in about frame #620 in Fig. 10. The IRSL could only cover part of the object region during all the procedure of the tracking. The proposed method could keep the excellent tracking performance even bearing longtime tracking process. Exp3: As shown in Fig. 11, the two figures are the comparison results of the location error on the sequence ‘‘indoor’’ and ‘‘toy’’, respectively, by the ISL [14], IRSL [32] and the IPDA. The left one in Fig. 11 is the comparison result of the sequence ‘‘indoor’’, the object undergoes the drastic pose variation with fast motion and

436

J. Wen et al. / Neurocomputing 74 (2010) 428–438

Fig. 9. The comparison results of [14,32] and the proposed method on the video sequence ‘‘indoor’’. The top, middle and bottom row is the tracking results by ISL [14], IRSL [32] and the proposed method, respectively.

Fig. 10. The comparison results of [14], [32] and the proposed method on the video sequence ‘‘sylv’’. The top, middle and bottom row is the tracking results by ISL [14], IRSL [32] and the proposed method, respectively.

60

45 ISL IRSL IPDA

40

30

Location Error

Location Error

35

ISL IRSL IPDA

50

25 20 15

40 30 20

10 10 5 0

0 0

50 100 Frame Number

150

0

50 100 150 200 250 300 350 400 450 500 Frame Number

Fig. 11. The comparison results of the proposed IPAD and ISL in [14] and IRSL in [32] on the video sequences ‘‘indoor’’ and ‘‘toy’’. The left and right are the comparison results on the ‘‘indoor’’ and ‘‘toy’’ sequence, respectively.

J. Wen et al. / Neurocomputing 74 (2010) 428–438

scale change, which leads to the shift from the true location by both the ISL and IRSL methods as the blue and black curves shown in the Fig. 11. Similarly, the tracking methods of the ISL and IRSL on the sequence ‘‘toy’’ also lose the tracker, because the object has the same characteristics as the one in the sequence ‘‘indoor’’, as well as the complicated illumination. The red curves in Fig. 11 show the ability of the IPDA could not only get the accurate location of the object, but also keep the tracker in a rather long duration. 5.5. Discussion All the experiments above have validated the proposed approach. Due to the adaptation to the false object feature, the updated model usually tends to far away from the true model, in addition of the forgetting factor may be included, and the tracker usually drifts to the non-object evidence. Moreover, once the drift starts, there is no remedy in most method to pull them back. Therefore, the drift is unstable and catastrophic. In contrast, the utilization of the discriminant information between the object and the background could make a feedback for the uncertainty of the object estimation; moreover, the subspace consistency constrains the update subspace would not bend to the emergency of the polluted appearance, since the object subspace in short interval is suppose to be linear.

6. Conclusion In this paper, an incremental pairwise discriminant analysis based object tracking is proposed to deal with the drift of the object model. The proposed method could get good tracking results with the consideration the following factors: (1) premising the ability to delineate the object and push away the background from the object; (2) the pairwise adaptation of the object–background to the discriminant subspace; and (3) the subspace consistency of the successive frame. The proposed method could prevent the tracking drift. Our further work will focus on fusion of the pairwise discriminant and subspace consistency constraints, once the contrast between the two constrains occurs.

Acknowledgment This research is supported by the National Basic Research Program of China (973 Program) (Grant No. 2011CB707000), National Natural Science Foundation of China (Grant Nos. 60771068, 60702061, 60832005, and 61072093), the Open-End Fund of National Laboratory of Pattern Recognitions of CAS, National Laboratory of Automatic Target Recognition of Shenzhen University, and the Program for Chang-Jiang Scholars and Innovative Research Team in University of China. References [1] M.J. Black, A.D. Jepson, Eigentracking: obust matching and tracking of articulated objects using view-based representation, in: Proceedings of the Fourth European Conference on Computer Vision, Cambridge, UK, vol. 1, April 15–18, 1996, pp. 329–342. [2] G.D. Hager, P.N. Belhumeur, Real-time tracking of image regions with changes in geometry and illumination, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, vol. 1, June 18–20, 1996, pp. 403–410. [3] S. Birchfield, Elliptical head tracking using intensity gradients and color histograms, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Santa Barbara, CA, USA, vol. 1, June 23–25, 1998, pp. 232–237.

437

[4] A. Levy, M. Lindenbaum, Sequential Karhunen–Loeve basis extraction and its application to images, IEEE Trans. Image Process. 9 (8) (2000) 1371–1374. [5] A.D. Jepson, D.J. Fleet, T.R. El-Maraghi, Robust online appearance models for visual tracking, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Kauai, HI, vol. 1, December 9–14, 2001, pp. 415–422. [6] K. Toyama, A. Blake, Probabilistic tracking in a metric space, in: Proceedings of the IEEE International Conference on Computer Vision, Vancouver, BC, Canada, vol. 2, July 7–14, 2001, pp. 50–57. [7] J. Vermaak, P. Perez, M. Gangnet, A. Blake, Towards improved observation models for visual tracking: selective adaptation, in: Proceedings of the Seventh European Conference on Computer Vision, Copenhagen, Denmark, vol. 1, May 2002, pp. 645–660. [8] R.T. Collins, Y. Liu, On-line selection of discriminative tracking features, in: Proceedings of the IEEE International Conference on Computer Vision, Nice, France, vol. 1, October 13–16, 2003, pp. 346–352. [9] K.C. Lee, J. Ho, M.H. Yang, D. Kriegman, Video-based face recognition using probabilistic appearance manifolds, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Madison, WI, vol. 1, June 18–20, 2003, pp. 313–320. [10] J. Ho, K.C. Lee, M.H. Yang, D.J. Kriegman, Visual tracking using learned linear subspace, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, June 27–July 2, 2004, pp. 782–789. [11] D. Ross, J. Lim, M.H. Yang, Adaptive probabilistic visual tracking with incremental subspace update, in: Proceedings of the European Conference on Computer Vision, Prague, Czech Republic, vol. 1, May 2004, pp. 215–227. [12] S. Avidan, Support vector tracking, IEEE Trans. Pattern Anal. Mach. Intell. 26 (8) (2004) 1064–1072. [13] J. Ho, K.C. Lee, M.H. Yang, D.J. Kriegman, Visual tracking using learned linear subspace, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Grand Hyatt, Washington, vol. 1, June 27–July 2, 2004, pp. 782–789. [14] J. Lim, D. Ross, R.-S. Lin, M.-H. Yang, Incremental learning for visual tracking, in: Proceedings of the 17th Advances in Neural Information Processing Systems, Vancouver, BC, Canada, December 13–18, 2004, pp. 801–808. [15] S.K. Zhou, R. Chellappa, B. Moghaddam, Visual tracking and recognition using appearance-adaptive models in particle filters, IEEE Trans. Image Process. 13 (11) (2004) 1491–1506. [16] A. Elgammal, Learning to track: conceptual manifold map for closed-form tracking, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, CA, vol. 1, June 20–26, 2005, pp. 724–730. [17] J. Wang, X. Chen, W. Gao, Online selecting discriminative tracking features using particle filter, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, CA, vol. 2, June 20–26, 2005, pp. 1037–1042. [18] S. Avidan, Ensemble tracking, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, CA, vol. 2, June 20–25, 2005, pp. 494–501. [19] A.P. Leung S. Gong, Online feature selection using mutual information for real-time multi-view object tracking, in: Proceedings of the IEEE International Workshop Analysis and Modeling of Faces and Gestures, Beijing, China, October 16, 2005, pp. 184–197. [20] F. Tang, H. Tao, Object tracking with dynamic feature graph, in: Proceedings of the IEEE Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, Beijing, China, October 15–16, 2005, pp. 25–32. [21] M. Yang, Y. Wu, Tracking non-stationary appearances and dynamic feature selection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, CA, vol. 2, June 20–26, 2005, pp. 1059–1066. [22] K.C. Lee, D.J. Kriegman, Online learning of probabilistic appearance manifolds for video-based recognition and tracking, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, CA, vol. 1, June 20–25, 2005, pp. 852–859. [23] X. He, D. Cai, S. Yan, H. Zhang, Neighborhood preserving embedding, in: Proceedings of the IEEE Conference on Computer Vision, Beijing, China, vol. 2, 2005, pp. 1208–1213. [24] H. Grabner, M. Grabner, H. Bischof, Real-time tracking via on-line boosting, in: Proceedings of the Conference on British Machine Vision, Edinburgh, vol. 1, September 4–7, 2006, pp. 47–56. [25] H. Lim, V.I. Morariu, O.I. Camps, M. Sznaier, Dynamic appearance modeling for human tracking, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, New York, NY, vol. 1, June 17–22, 2006, pp. 751–757. [26] V. Arsigny, P. Fillard, X. Pennec, N. Ayache, Geometric means in a novel vector space structure on symmetric positive-definite matrices, SIAM J. Matrix Anal. Appl. 29 (1) (2007) 328–347. [27] O. Tuzel, F. Porikli, P. Meer, Human detection via classification on Riemannian manifolds, in: Proceedings of the Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, June 17–22, 2007, pp. 1–8. [28] Z. Yin, R. Collins, On-the-fly object modeling while tracking, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, vol. 1, June 17–22, 2007, pp. 1–8. [29] D. Cai, X. He, J. Han, Efficient kernel discriminant analysis via spectral regression, in: Proceedings of the International Conference on Data Mining, Omaha, Nebraska, USA, October 28–31, 2007, pp. 427–432. [30] D. Cai, X. He, J. Han, Semi-supervised discriminant analysis, in: Proceedings of the International Conference on Computer Vision, Rio de Janeiro, Brazil, October 14–20, 2007, pp. 1–7.

438

J. Wen et al. / Neurocomputing 74 (2010) 428–438

[31] D. Xu, S. Lin, S. Yan, X. Tang, Rank-one projections with adaptive margin for face recognition, IEEE Trans. Syst. Man Cybern. Part B 37 (5) (2007) 1226–1236. [32] X. Li, W. Hu, Z. Zhang et al., Visual tracking via incremental log-Euclidean Riemannian subspace learning, in: Proceedings of the Conference on Computer Vision and Pattern Recognition, Anchorage, AK, vol. 1, June 23– 28, 2008, pp. 1–8. [33] D. Cai, X. He, J. Han, SRDA: an efficient algorithm for large-scale discriminant analysis, IEEE Trans. Knowledge Data Eng. 20 (1) (2008) 1–12. [34] Y. Yuan, Y. Pang, Discriminant adaptive edge weights for graph embedding, in: Proceedings of the IEEE Conference on Acoustics, Speech, and Signal Processing, Las Vegas, NV, USA, 2008, pp. 1993–1996. [35] Y. Yuan, Y. Pang, Boosting simple projections for multi-class dimensionality reduction, in: Proceedings of the IEEE Conference on Systems, Man, and Cybernetics, Singapore, 2008, pp. 2231–2235. [36] X. He, D. Cai, J. Han, Learning a maximum margin subspace for image retrieval, IEEE Trans. Knowledge Data Eng. 20 (2) (2008) 189–201. [37] X. Li, S. Lin, S. Yan, D. Xu, Discriminant locally linear embedding with highorder tensor data, IEEE Trans. Syst. Man Cybern. Part B 38 (2) (2008) 342–352. [38] D. Xu, S. Yan, S. Lin, T.S. Huang, Convergent 2-D subspace learning with null space analysis, IEEE Trans. Circuits Syst. Video Technol. 18 (12) (2008) 1753–1759. [39] D. Xu, S. Yan, L. Zhang, H. Zhang, T.S. Huang, Reconstruction and recognition of tensor-based objects with concurrent subspaces analysis, IEEE Trans. Circuits Syst. Video Technol. 18 (1) (2008) 36–47. [40] D. Xu, S. Yan, S. Lin, T.S. Huang, S.F. Chang, Enhancing bilinear subspace learning by element rearrangement, IEEE Trans. Pattern Anal. Mach. Intell. 31 (10) (2009) 1913–1920. [41] D. Xu, S. Yan, Semi-supervised bilinear subspace learning, IEEE Trans. Image Process. 18 (7) (2009) 1671–1676. [42] Y. Yuan, Y. Pang, J. Pan, X. Li, Scene segmentation based on IPCA for visual surveillance, Neurocomputing 72 (10–12) (2009) 2450–2454. [43] Y. Yuan, X. Li, Y. Pang, X. Lu, D. Tao, Binary sparse nonnegative matrix factorization, IEEE Trans. Circuits Syst. Video Technol. 19 (5) (2009) 772–777. [44] Y. Lu, Q. Tian, Discriminant subspace analysis: an adaptive approach for image classification, IEEE Trans. Multimedia 11 (7) (2009) 1289–1300. [45] H. Zhou, Y. Yuan, C. Shi, Object tracking using SIFT features and mean shift, Comput. Vision Image Understanding 113 (2) (2009) 345–352. [46] H. Zhou, Y. Yuan, Y. Zhang, C. Shi, Non-rigid object tracking in complex scenes, Pattern Recognition Lett. 30 (2) (2009) 98–102. [47] T. Zhang, B. Fang, Y. Tang, Z. Shang, B. Xu, Generalized discriminant analysis: a matrix exponential approach, IEEE Trans. Syst. Man Cybern. Part B: Cybern. 40 (1) (2010) 186–197. [48] X. Li, Y. Pang, Deterministic column-based matrix decomposition, IEEE Trans. Knowledge Data Eng. 22 (1) (2010) 145–149. [49] X. He, Laplacian regularized d-optimal design for active learning and its application to image retrieval, IEEE Trans. Image Process. 19 (1) (2010) 254–263. [50] Y. Yuan, Y. Pang, X. Li, Footwear for gender recognition, IEEE Trans. Circuits Syst. Video Technol. 20 (1) (2010) 131–135. [51] J. Wen, X. Gao, Y. Yuan, D. Tao, Incremental tensor biased discriminant analysis: a new color based visual tracking, NeuroComputing 73 (4–6) (2010) 827–839. [52] X. Li, Y. Pang, Y. Yuan, L1-norm-based 2DPCA, IEEE Trans. Syst. Man Cybern. Part B 40 (4) (2010) 1170–1175.

Jing Wen received the B.Sc. degree in Electronic Information Science and Technology from Shanxi University, Taiyuan, China, in 2003, and the M.Eng. degree in Signal and Information Processing from Xidian University, Xi’an, China, in 2006. Since August 2006, she has been pursuing her Ph.D. degree in Pattern Recognition and Intelligent System at Xidian University. Her research interests include pattern recognition and computer vision.

Xinbo Gao received his Bachelor degree in Electronic Engineering, Master degree and Ph.D. degree in Signal and Information Processing from Xidian University, Xi’an, China, in 1994, 1996, and 1999, respectively. From 1997 to 1998, he was a Research Fellow in Dr. Hiroyuki Iida’s Group, Department of Computer Science at Shizuoka University, Hamamatsu, Japan. From 2000 to 2001, he was a Postdoctoral Fellow in Dr. Xiaoou Tang’s Group, Department of Information Engineering at the Chinese University of Hong Kong, Shatin, NT, Hong Kong SAR, China. Since 2003, Dr. Xinbo Gao has been a full Professor in School of Electronic Engineering at Xidian University, Xi’an, China, and the

Director of Video & Image Processing System Laboratory (VIPSL). Since 2005, he has been the Director of Office of Cooperation and Exchange, Xidian University. Since 2008, he concurrently served as the Dean of School of International Education, Xidian University. His research interests include visual information processing and analysis, pattern recognition, machine learning and computational Intelligence. In 2004, Dr. Gao was selected as a member of the program for New Century Excellent Talents in University of China by the Ministry of Education (MOE). He was authorized the title Pacemaker of Ten Excellent Young Teacher of Shaanxi Province in 2005. In 2006, he was awarded the Young Teacher Award of High School by the Fok Ying Tung Education Foundation. From 2006, he was selected as an Expert enjoying the Government Special Subsidy. In 2007, as one of the principal members, he and his colleagues founded an Innovative Research Team in University, MOE, China. In 2008, he was awarded one of 10 distinguished teachers of Xidian University. This year, he was just selected as a candidate of the One Hundred plus One Thousand plus Ten Thousand Talents Project of the New Century. So far, Dr. Gao is a Fellow of IET/IEE, and Vice Chairman of IET Xi’an Network; Senior Member of IEEE; Member of IEEE Xi’an Section Executive Committee, and the Membership Development Committee Chair, Vice President of Computational Intelligence Chapter, IEEE Xi’an Section and Member of Technical Committee of Cognitive Computing, IEEE SMC Society; Senior Member of China Computer Federation (CCF) and Academic Committee Member of YOCSEF, Xi’an, and Senior Member of the Chinese Institute of Electronics (CIE), an Executive member of China Society of Image and Graphics (CSIG) Council, and Members of Editorial board for EURASIP Signal Processing Journal, Neurocomputing, International Journal of Multimedia Intelligence and Security, and International Journal of Image and Graphics.

Xuelong Li is a Researcher (i.e., full professor) with the State Key Laboratory of Transient Optics and Photonics and the director of the Center for OPTical IMagery Analysis and Learning (OPTIMAL), Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an 710119, Shaanxi, PR China.

Dacheng Tao received the B.Eng. degree from the University of Science and Technology of China (USTC), the Mphil degree from the Chinese University of Hong Kong (CUHK), and the Ph.D. degree from the University of London (Lon). Currently, he is a Nanyang Assistant Professor with the School of Computer Engineering in the Nanyang Technological University, a Visiting Professor in Xi Dian University, a Guest Professor in Wu Han University, and a Visiting Research Fellow at Birkbeck in Lon. His research is mainly on applying statistics and mathematics for data analysis problems in data mining, computer vision, machine learning, multimedia, and visual surveillance. He has published nearly 100 scientific papers including IEEE TPAMI, TKDE, TIP, TMM, TCSVT, TIFS, TSMCB, TSMC, TITB, CVPR, ECCV, ICDM; ACM TKDD, Multimedia, KDD, etc., with one best paper runner up award. Previously he gained several Meritorious Awards from the International Interdisciplinary Contest in Modeling, which is the highest level mathematical modeling contest in the world, organized by COMAP. He is an associate editor of IEEE Transactions on Knowledge and Data Engineering, Neurocomputing (Elsevier) and the Official Journal of the International Association for Statistical Computing—Computational Statistics and Data Analysis (Elsevier). He has authored/edited six books and eight special issues, including CVIU, PR, PRL, SP, and Neurocomputing. He has (co)chaired for special sessions, invited sessions, workshops, and conferences. He has served with more than 50 major international conferences including CVPR, ICCV, ECCV, ICDM, KDD, and Multimedia, and more than 15 top international journals including TPAMI, TKDE, TOIS, TIP, TCSVT, TMM, TIFS, TSMC-B, Computer Vision and Image Understanding (CVIU), and Information Science. He is a member of IEEE, IEEE Computer Society, IEEE Signal Processing Society, IEEE SMC Society, and IEEE SMC Technical Committee on Cognitive Computing.

Jie Li received the B.Sc., M.Sc. and Ph.D. degrees in Circuit and System from Xidian University, China, in 1995, 1998 and 2005, respectively. In 1998, she joined the School of Electronic Engineering at Xidian University. Currently, she is a Professor of Xidian University. Her research interests include computational intelligence, machine learning, and image processing. In these areas, she has published over 30 technical articles in refereed journals and proceedings including IEEE TCSVT, IJFS, etc.

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.