Real-Time Interactively Distributed Multi-Object Tracking Using a Magnetic-Inertia Potential Model

Share Embed


Descripción

Real-Time Interactively Distributed Multi-Object Tracking Using a Magnetic-Inertia Potential Model Wei Qu, Dan Schonfeld ECE Department, University of Illinois Chicago, IL, USA 60612 {wqu, ds}@ece.uic.edu Abstract This paper breaks with the common practice of using a joint state space representation and performing the joint data association in multi-object tracking. Instead, we present an interactively distributed framework with linear complexity for real-time applications. When objects do not interact on each other, our approach performs like multiple independent trackers. When the objects are in close proximity or present occlusions, we propose a magnetic-inertia potential model to handle the “error merge” and “labelling” problems in a particle filtering framework. Specifically, we propose to model the interactive likelihood densities by a “gravitation” and “magnetic” repulsion scheme and relax the common first-order Markov chain assumption by using an “Inertia” Markov chain. Our model represents the cumulative effect of virtual physical forces that objects undergo while interacting with others. It implicitly handles the “error merge” and “labelling” problems and thus solves the difficult object occlusion and data association problems using an innovative scheme. Our preliminary work has demonstrated that the proposed approach is far superior to existing methods not only in robustness but also in speed.

1. Introduction Multiple object tracking (MOT) for objects which are distinctive is much easier since the objects can be tracked by multiple independent trackers. However, MOT of similar objects in appearance such as faces fails when they are in close proximity or present occlusions. In this circumstance, how to model the interaction among objects and solve the data association [2], namely, establishing the correspondence between objects and observations, are the most critical problems for MOT. As a single object tracker, the conventional particle filter [1], [3], can be used for MOT. This is accomplished by estimating the non-Gaussian posterior density of the objects

Magdi Mohamed Motorola Labs Schaumburg, IL USA 60196 [email protected]

[8] where the multi-modality implies the presence of multiple objects—each mode representing an object. However, this method is problematic since using particles to convey multiple modes leads to loss of tracked objects during occlusion. Without an effective scheme to model the interaction and solve the data association, both multiple independent trackers and conventional particle filter can not perform well for MOT. Such trackers usually suffer from the well-known “error merge” problem, where the tracker loses its associated object and falsely coalesces with others, as well as the “labelling” problem, where trackers falsely switch labels after occlusion. A widely accepted approach to MOT that addresses many of the difficulties inherent in this complex task is by exploiting a joint state space representation which concatenates all of the objects’ states together to form a large meta state. It infers the joint data association problem by characterization of all possible associations between objects and observations [2], [6], [4], [5]. Although this centralized approach can handle, in principle, the “error merge” problem and track multiple objects that undergo partial and complete occlusion, it requires a tremendous computational cost due to the complexity introduced by the high dimensionality of the joint state representation. The particle filter implementation requires exponentially increasing complexity in terms of the number of objects and thus cannot be used in realtime applications. Researchers also propose to use multiple filters to track multiple objects, one filter per object where each one has its own particles evolving in single object state space. However, due to the complexity of interaction among objects, in the event of occlusion existing methods either use joint data association or rely on a joint state space representation. Yu et. al. [9] allow for collaboration among filters by modelling the objects’ joint prior using a Markov Random network to solve the “error merge” problem. Although this approach provides a promising direction to resolve the problems with the high complexity of the joint state space,

Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05) 1550-5499/05 $20.00 © 2005 IEEE

Figure 1. Our Dynamic Graphic Model for Multiple Interactive Objects.

cluded, this attractive force becomes very strong. This is so called “interaction”. Although the relative forces between two trackers are equal according to Newton’s Third Law, when their “masses” are different, they have different accelerations by Newton’s Second Law. As a result, after several frames, the tracker with small “mass” (big acceleration) will be attracted to merge with the object with bigger observation (small acceleration) and thus “error merge” happens. This analysis is always consistent with the simulation results and thus inspired us to develop a scheme to resist the excessive attraction incurred by “gravitational force”. We present our magnetic repulsion model later.

2.1. Graphic Modeling of Multi-Object Tracking

Figure 2. Inertia Markov chain. it does not deal with the data association or the labelling problem. Moreover, its performance suffers when the objects’ motion is independent. In this paper, we derive an Interactively Distributed MOT (IDMOT) particle filter. We propose a “magnetic-inertia potential” model to solve the “error merge” and “labelling” problems. Its computational complexity increases linearly with the number of objects which yields a much faster implementation than existing MOT algorithms.

2.

Interactively Tracking

Distributed

Multi-Object

We denote the state of an individual object by xit , where i = 1, . . . , M is the index of objects and t is the time index, the image observation of xit by zti , the set of all states up to time t by xi0:t where xi0 is the initialization prior, the set of i . all observations up to time t by z1:t To understand better the underlying principle of “error merge”, we give an analogy with the theories in physics. Newton’s Universal Law of Gravitation states that “Any two objects attract each other by a gravitational force that is proportional to the product of their masses and inversely proportional to the square of the distance between them”. In our scenario, the object’s observation can be imagined as the associated tracker’s “mass”. Big size of the observation (in the sense of more object’s visual image resource) represents big “mass”. When objects are far apart, the “gravitational force” between their trackers is very weak and can be ignored. In such case, the tracker can successfully estimate object’s state. However, when objects are adjacent or oc-

In Fig. 1, we illustrate our dynamic graphic model with two consecutive frames for multiple interactive objects. It has two layers. The hidden layer are circle nodes that represent the states of objects, xi . The observable layer represents the observations z i associated with the hidden states. The directed link from object xi to its image observation z i represents the observation likelihood p(z i |xi ). The undirected link between observation nodes represents the interaction. Once the spatial relations of objects are roughly determined in each frame, the structure of its observable layer is fixed. The directed link between consecutive states associated with the same object represents the state transition which is a 1st-order Markov chain. We also inherit the common assumption that the observations in different frames are conditionally independent. Although the above model can solve the “error merge” problem, we find that this formulation still can not handle the “labeling” problem. The reason of failure comes from the naive assumption of 1st-order Markov chain which disregards most motion information from history states except the last one. This motivates us to develop a more sophisticated graphic model illustrated in Fig. 2. When object’s observation has interaction with others at time t, we relate the current state xt with the previous states xt−1 and xt−2 . Instead of using 2nd-order Markov chain which is too complicated for analysis, we generate a specific node vt which represents the motion vector from the previous state xt−2 to xt−1 . The directed link between states and motion vector characterizes the “inertia information” which we will discuss later. We call this model as an “Inertia Markov chain”.

2.2. Interactively Distributed Formulation and Particle Filter Implementation Now the core problem is how to model the density propagation for the interactive objects. We will estimate the posterior of xit based on all involved observations j j i i , z1:t ) instead of p(xit |z1:t ) for object i where z1:t p(xit |z1:t

Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05) 1550-5499/05 $20.00 © 2005 IEEE

Interaction Determination

Repulsion Iteration

xt-1

xt-2

di,n1 dij di,n3

di,n2

Figure 4. Inertia weighting. Figure 3. Magnetic repulsion weighting in half of one repulsion iteration cycle.

i . Notice represent all observations that interacts with z1:t i that since there is no direct link between x and z j at any i time, therefore when there is no interaction between z1:t j j i i i i i and z1:t , p(xt |z1:t , z1:t ) = p(xt |z1:t ), namely, xt is conj . In this case, our formuditionally independent with z1:t lation is consistent with the traditional modeling of tracking problem. Furthermore, because of the conditionally independent assumption between the consecutive observai tions, z1:t−1 is conditionally independent with ztj . Therefore, plus the above consciousness that there is no direct j i , z1:t ) = link between xi and z j at any time, p(xi0:t |z1:t−1 j i i p(x0:t |z1:t−1 , z1:t−1 ). This is used in the following derivation from equation (2) to (3). Parallel with the derivation of Sequential Importance Sampling Algorithm as shown in [1], we formulate the interactively distributed density propagation as following: j i , z1:t ) p(xi0:t |z1:t

= = =

j j i i p(zti |xi0:t , z1:t−1 , z1:t )p(xi0:t , z1:t−1 , z1:t ) i , zj ) p(z1:t 1:t

j j i i p(zti |xit , ztj )p(xi0:t |z1:t−1 , z1:t )p(z1:t−1 , z1:t ) j j i i p(zti |z1:t−1 , z1:t )p(z1:t−1 , z1:t )

j i , z1:t−1 ) p(zti |xit , ztj )p(xi0:t |z1:t−1 j i p(zti |z1:t−1 , z1:t )

j i , z1:t−1 ). ∝ p(zti |xit , ztj )p(xi0:t |z1:t−1

(1) (2) (3) (4)

From (1) to (2), we use two assumptions. The first is that zti is conditionally independent with the observations in other frames. The second is that zti is independent with previj i , z1:t ) in (3) is a normalization ous states xi0:t−1 . p(zti |z1:t−1 i i j constant. In (4), p(zt |xt , zt ) is the interactive likelihood j i density, p(xi0:t |z1:t−1 , z1:t−1 ) is the interactive prior density. We further model these two densities individually in the following. For the interactive likelihood, we have p(zti |xit , ztj )= ∝

p(z j |xi , z i ) p(zti |xit ) t j t i t p(zt |xt ) i i p(zt |xt )ϕ1 (xit , zti ; ztj ).

The local likelihood p(zti |xit ) characterizes the “gravitation” between interactive observations which produces “error merge” problem as discussed. Since we model ϕ1 (·) by a magnetic repulsion model, we call it “magnetic function”. When there is no interaction, ϕ1 (·) = 1, we have the same likelihood model as the conventional particle filter. For the interactive prior density of xi0:t , we have j i , z1:t−1 ) p(xi0:t |z1:t−1 j j i i = p(xit |xi0:t−1 , z1:t−1 , z1:t−1 )p(xi0:t−1 |z1:t−1 , z1:t−1 ) (6) j i = p(xit |xit−1 , xit−2 )p(xi0:t−1 |z1:t−1 , z1:t−1 )

= ∝

(7)

p(xit−2 |xit , xit−1 )p(xit |xit−1 ) j i p(xi0:t−1 |z1:t−1 , z1:t−1 ) p(xit−2 |xit−1 ) j i p(xit |xit−1 )ϕ2 (·)p(xi0:t−1 |z1:t−1 , z1:t−1 ). (8)

From (6) to (7) we use the assumption that xit only depends on xit−1 and xit−2 as illustrated in Fig. 2. In (8), p(xit |xit−1 ) is the “one-step” state transition density as usual in the 1stj i order Markov chain. p(xi0:t−1 |z1:t−1 , z1:t−1 ) is the posterior for time t-1. Function ϕ2 (·) characterizes the interaction among xit , xit−1 and xit−2 . Since we model it by our inertia model, we call it “inertia function”. When ϕ2 (·) = 1, we have the same prior model as the conventional particle filter. Now let’s estimate the desired posterior by Sequential Importance Sampling method [1]. We denote i,n Ns {xi,n , w } as a random measure that characterizes t 0:t n=1 j i , z1:t ), where {xi,n = the posterior pdf p(xi0:t |z1:t 0:t , n 1, . . . , Ns } is a set of support particles with associated weights {wti,n , n= 1, . . . , Ns }. The weights are normalized so that n wti,n = 1. According to the importance sampling theory [1], if the particles xi,n 0:t were drawn j i , z1:t ), then the corfrom an importance density q(xi0:k |z1:t responding weights are wti,n ∝

j i p(xi,n 0:t |z1:t , z1:t )

j i q(xi,n 0:k |z1:t , z1:t )

(9)

By substituting (4),(5) and (8) into (9), the weight update equation can then be shown to be

(5)

Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05) 1550-5499/05 $20.00 © 2005 IEEE

wti,n ∝

j i,n i j p(zti |xi,n t , zt )p(x0:t |z1:t−1 , z1:t−1 )

i,n j i,n j i i q(xi,n t |x0:t−1 , z1:t , z1:t )q(x0:t−1 |z1:t , z1:t )

(10)

=

i,n i,n p(zti |xi,n t )ϕ1 (·)p(xt |xt−1 ) i,n j i q(xi,n t |x0:t−1 , z1:t , z1:t )

×

j i ϕ2 (·)p(xi,n 0:t−1 |z1:t−1 , z1:t−1 )

i,n = wt−1

j i q(xi,n 0:t−1 |z1:t−1 , z1:t−1 )

(11)

i,n i,n p(zti |xi,n t )ϕ1 (·)p(xt |xt−1 )ϕ2 (·) i,n j i q(xi,n t |x0:t−1 , z1:t , z1:t )

From (10) to (11), we exploit the assumption that xi0:t−1 is independent of zti and ztj . Furthermore, if j i q(xit |xi0:t−1 , z1:t , z1:t ) = q(xit |xit−1 , zti , ztj ), the importance density becomes only dependent on xit−1 and zti , ztj . This is particularly useful in the common case when only a filtered j i estimate of p(xit |z1:t , z1:t ) is required at each time. In such n n scenarios, only xt ,xt−1 , and xnt−2 need to be stored. Then the modified weights are i,n wti,n ∝ wt−1

i,n i,n p(zti |xi,n t )ϕ1 (·)p(xt |xt−1 )ϕ2 (·) i,n i j q(xi,n t |xt−1 , zt , zt )

(12)

How to model the densities and functions in (12) is definitely very critical. We further present our models for the magnetic function and the inertia function next. For other densities, we can use the same models as in the conventional particle filter.

2.3. Magnetic Repulsion Model As discussed, we want to introduce a “repulsion” force to resist the excessive attraction among the interactive observations. Magnetic field theory provides us a suitable idea. Let’s first consider a simple case which has only pairwise interaction. We assume the two objects to be two magnetic monopoles with same polarity. Since object generates observation while magnet produces magnetic field, the observations bear the analogy with the magnetic fields. These assumptions are consistent with former assumptions in the graphic model, i.e. objects’ states (magnets here) at a certain time are independent while interaction only lies between observations (magnetic fields here). In this analogy, the local likelihood p(z i |xi ) in (5) only characterizes the intensity of the corresponding LOCAL magnetic field while the function ϕ1 (·) represents the mutual repulsion between two magnetic fields. When we use it to calculate the system weights by (12), we call ϕ1 (·) “repulsion weight” for each particle xi,n t . It has an analogy to the concept of potential difference in magnetic theory which is related with the distance between two points in repulsive magnetic fields. When the distance is small, the repulsion is strong, and vice versa. As a specific example, we can model this function as i j ϕ1 (xi,n t , zt ; zt ) ∝ 1 −

d2i,n 1 exp{− 2 } β1 σ1

(13)

where β1 is a normalization term. σ1 characterizes the allowable maximal interaction distance. For example, we choose it as the sum of the semi-major axes of the ellipses which we use to represent objects in the experiments. di,n is the Euclidean distance between the current particle xi,n t and the temporary estimate of object j, i.e., x ˆjt,k as illustrated in Fig. 3. The equilibrium between magnetic repulsion and gravitation can not achieve at a draught but after a process of oscillation, so we iterate the repulsion process until it calms down. In Fig. 3, we illustrate half of one repulsion iteration cycle where only the particles of object i, i.e., xi,n t,k are repelled by object j’s temporary estimate, x ˆjt,k . After that, we first get the new temporary estimate of object i, i.e., x ˆit,k and then use it to repel the particles of object j, i.e., xj,n t,k back to complete one iteration cycle. In all these notations, the subscript k = 1, . . . , K represents the iteration time. The dash circles represent the particles while the solid circles represent the temporary estimates of states in Fig. 3. The two objects’ estimates are first roughly decided by using independent trackers. If the distance between their centers dij < dT H , where dT H is a predefined threshold, then we consider them as an interactive pair and begin the repulsion iteration, otherwise they will be tracked independently. In our simulation, K=4∼6 is enough to guarantee a convergence. For 3-clique {z i , z j1 , z j2 }, any magnet repels other two simultaneously. Therefore we have to revise (13) to be ϕ1 (·) ∝ (1 −

d2i,j ,n d2i,j ,n 1 1 exp{− 21 })(1 − exp{− 22 }) β11 σ11 β12 σ12

Similarly, we can derive the formula for r−clique where r > 3.

2.4. Inertia Model Our motivation now is to model the inertia function ϕ2 (xit , xit−1 , xit−2 ) which is related with two posteriors p(xit−2 |xit , xit−1 ) and p(xit−2 |xit−1 ). Since xit−1 and xit are propagated from xit−2 , what information can we infer backwards without any prior knowledge? The answer is Inertia. Inertia Law in physics states that “An object at rest tends to stay at rest and an object in motion tends to stay in motion with the same speed and in the same direction unless acted upon by an unbalanced force”. Fig. 4 illustrate the analysis of one object’s motion in three consecutive frames where xt−1 , xt−2 are the object’s states in previous two frames, vt is the reference motion vector from xt−2 to xt−1 . By shifting vt along its direction, we get the “inertia state” xt and  its “inertia motion vector” vt for the current frame. Without any external force, the new state xt should be same as xt according to Inertia Law. Even if there are external forces,

Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05) 1550-5499/05 $20.00 © 2005 IEEE

• • • • •



i i,n i j Set k=1; Draw xi,n t ∼ q(xt |xt−1 , zt , zt ) i,n i,n wt =Weight(xt ) // Equ.(12) without ϕ1 , ϕ2 wti,n =Normalize(wti,n ) Ns i,n i,n wt xt Temporary estimate x ˆit,k = n=1 FOR j=1:i // Search pairwise links xit,k , x ˆjt,k ) < dT H //Interaction − IF dij (ˆ  Save Link(i,j)=1// To search r-clique  FOR k=1:K // Inertia Repulsion Iteration  Compute ϕ1 , ϕ2  Reweight wti,n = wti,n ϕ1 ϕ2  Normalize (wti,n ) Ns i,n i,n  Estimate x ˆit,k+1 = n=1 wt xt // Repel Back  Compute ϕ1 , ϕ2  Reweight wtj,n = wtj,n ϕ1 ϕ2  Normalize (wtj,n ) Ns j,n j,n  Estimate x ˆjt,k+1 = n=1 wt xt − END IF END FOR

as long as the frame rate is high enough, we can always asn2 sume that xt is not far away xt . xn1 t , xt are particles of n1 n2 state xt . vt and vt are the associated motion vectors. We define the inertia weights as ϕ2 (·) ∝

n  2 θt,n 1 (vt  − vt )2 exp{− 2 }exp{− } 2 β2 σ21 σ22 n

(14)



where β2 is a normalization term, θt,n =  (vt ,vt ) is the angle between the particle’s motion vector and the reference n  inertia vector. vt  and vt  are the Euclidean norm of the vectors. σ21 and σ22 characterize the allowable variance of motion vector’s direction and speed.

3. Experimental Results and Analysis A pseudo-code of our IDMOT’s implementation for one object at one time step is shown in the list. We only illustrate the implementation for pairwise repulsion in it. The repulsion of r-clique can be added similarly. We also accept resampling scheme [1] to alleviate the degeneracy problem after finishing all objects’ estimates at each frame. Different videos are used to test our algorithm. They are captured by 320 × 240 with 25 fps. In all experiments, we used a parametric ellipse to model the object. The local likelihood p(zti |xit ) is calculated by the combination of color histogram model and PCA-based appearance model [7], [9]. The state transition is modeled as a Gaussian random walk model. The importance density is chosen as the

Table 1. Normalized Computation Time of Different Operations Local Likelihood

Repulsion Weight

Inertia Weight

p(zt |xt )

ϕ1 (·)

ϕ2 (·)

1

1 10

1 17

state transition. We compare the performance with Multiple Independent Particle Filters [3] (MIPF) and MFMC algorithm [9]. For all methods, we use 50 particles for each object. Different color and an index attached to the ellipse is used to label the object. The synthetic sequence Tennisballs was developed by Yu et al. [9]. It has five identical and moving tennis balls with changing background. In Fig. 5, we present the comparison of results. MIPF (1st row) severely suffers from “error merge” problem due to lacking modeling of interaction. Since MFMC (2nd row) only models the motion interaction but not handle the observation interaction and data association, it can not work well for severe occlusion and suffers from “labeling” problem. However, our approach (3rd row) performs far superiorly. The video Lab4faces contains four moving persons in a lab. It challenges many existing algorithms because of the various frequent present of occlusion. In Fig. 6 we compare the results. MIPF (1st row) suffers from “error merge” problem and lose tracks after occlusion. MFMC (2nd row) can keep tracking all objects after occlusion but the results are not accurate in occlusion and objects’ labels are wrong after occlusion. On the contrary, our approach (3rd row) performs very well solving both “error merge” and “labeling” problems even in the severe occlusion. The experiments are performed in C on a 3.2GHz Pentium IV PC. Without code optimization, our algorithm can achieve robust tracking at 18 ∼ 22 fps for the test videos. Compared with MIPF, our approach only pay the additional cost of calculating magnetic repulsion weights and inertia weights. In Table 1, we analyze the normalized computation time of different operations. The same denominator of all ratios is the time used to weight all particles’ local likelihood once. The data is obtained by running IDMOT on Tennisballs sequence with 5 times repulsion iteration. It can be seen that most of computation time is spent on local likelihood weighting using image features.

References [1] M. S. Arulampalam, S. Maskell, N. Gordon, and T. Clapp. A tutorial on particle filters for online nonlinear/non-gaussian

Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05) 1550-5499/05 $20.00 © 2005 IEEE

Frame 102

Frame 140

Frame 203

Frame 242

Frame 362

Frame 102

Frame 140

Frame 203

Frame 242

Frame 362

Frame 102

Frame 140

Frame 203

Frame 242

Frame 362

Figure 5. Comparison of MIPF (1st row), MFMC (2nd row) and our IDMOT (3rd row) for a synthetic sequence Tennisballs.

Frame 175

Frame 207

Frame 216

Frame 238

Frame 266

Frame 175

Frame 207

Frame 216

Frame 238

Frame 266

Frame 175

Frame 207

Frame 216

Frame 238

Frame 266

Figure 6. Comparison of MIPF (1st row), MFMC (2nd row) and our IDMOT (3rd row) for indoor scenario sequence Lab4faces.

[2] [3]

[4] [5]

baysian tracking. IEEE Trans. Signal Processing, 50(2):174– 188, Feb. 2002. Y. Bar-Shalom and A. G. Jaffer. Tracking and Data Association. Academic Press, San Diego, CA, 1998. M. Isard and A. Blake. Condensation – conditional density propagation for visual tracking. Int. J. Computer Vision, 29(1):5–28, 1998. M. Isard and J. MacCormick. Bramble: A bayesian multipleblob tracker. In Proc. of ICCV’01, Vancouver, Canada. Z. Khan, T. Balch, and T. Dellaert. An mcmc-based particle filter for tracking multiple interacting targets. In Proc. of

ECCV’04, Prague, Czech Republic. [6] J. Maccormick and A. Blake. A probabilistic exclusion principle for tracking multiple objects. Int. J. Comput. Vis., 39(1):57–71, 2000. [7] B. Moghaddam and A. Pentland. Probabilistic visual learning for object representation. IEEE Trans. Pattern Anal. Machine Intell., 19(7):696–710, 1997. [8] J. Vermaak, A. Doucet, and P. P´erez. Maitaining multimodality through mixture tracking. In Proc. ICCV’03. [9] T. Yu and Y. Wu. Collaborative tracking of multiple targets. In Proc. of CVPR’04, Washington DC, USA.

Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV’05) 1550-5499/05 $20.00 © 2005 IEEE

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.