Video Tracking: A Concise Survey

May 24, 2017 | Autor: Emanuele Trucco | Categoría: Computer Vision, Image Processing, Motion Analysis, Video Processing, Video Tracking, Maritime Engineering

Share Embed

Laporkan tautan ini

Descripción

520

www.DownloadPaper.ir

IEEE JOURNAL OF OCEANIC ENGINEERING, VOL. 31, NO. 2, APRIL 2006

Video Tracking: A Concise Survey Emanuele Trucco and Konstantinos Plakas

Abstract—This paper addresses video tracking, the problem of following moving targets automatically over a video sequence, and brings three main contributions. First, we give a concise introduction to video tracking in computer vision, including design requirements and a review of recent techniques, with some details of selected algorithms. Second, we give an overview of 28 recent papers on subsea video tracking and related motion analysis problems, arguably capturing the state-of-the-art of subsea video tracking. We summarize key features in a comparative, at-a-glance table, and discuss this work in comparison to the state-of-the-art in computer vision. Third, we identify well-proven computer vision techniques not yet embraced by the subsea research community, suggesting useful research directions for the subsea video processing community. Index Terms—Image processing, underwater vision, video tracking.

I. INTRODUCTION IDEO tracking is the problem of following image elements moving across a video sequence automatically. This is an essential building block for vision systems addressing robotic tasks like visual servoing [1]–[4], pipe and cable inspection [5], [6], metrology [7], and surveying and mapping via image mosaicing [8]–[11]. For example, visual servoing systems estimate their motion with respect to a target by tracking points or features on the target itself; pipe inspection, if based on vision, requires continuous (frame-by-frame) location of the pipe; mosaicing relies on the estimation of motion of pixels from frame to frame. Tracking systems must address two basic problems: motion and matching. Motion problem: predict the location of an image element being tracked in the next frame, that is, identify a limited search region in which the element is expected to be found with high probability. Matching problem: (also known as detection or location) identify the image element in the next frame within the designated search region. The simplest approach to the motion problem is to define the search area in the next frame as a fixed-size region surrounding the target position in the previous frame. The size is chosen according to the characteristic of the problem, crucially the expected frame-to-frame displacement. Obviously, this knowledge is not often available, reliable, or time-independent, so performance is limited. A well-known solution from control

V

Manuscript received September 1, 2002; accepted September 22, 2004. Associate Editor: J. Leonard. The authors are with the Electrical, Electronic and Computer Engineering, School of Engineering and Physical Sciences, Heriot-Watt University, Edinburgh EH14 4AS, U.K. (e-mail: [email protected]). Digital Object Identifier 10.1109/JOE.2004.839933

theory is the Kalman filter (KF) [12], a well-known optimal, recursive estimator of the state of a dynamic system. KFs have been adopted widely in video tracking in air [13]–[16] and subsea [6], [17]. Nonlinear state dynamics have been approached by extended [12, vol. 2] and iterated extended versions (e.g., [18]) of the linear filter. Notice that in video tracking, the instantaneous acceleration is nearly invariably assumed constant, although more complex motion models are adopted occasionally (e.g., [79]). The matching problem requires, in essence, a similarity metric to compare candidate pairs of image elements in the previous and current frame. This is closely related to the correspondence problem of stereo vision [19]–[21], where the same scene element must be detected in two (or more) images acquired simultaneously from different viewpoints. Matching metrics and search techniques are often similar in stereo and tracking, but the latter can take advantage of motion predictions, not available to static stereo. A tracking-specific problem is data association, that is, finding the true position of the moving target in the presence of equally valid candidates for the similarity metric. This occurs in the presence of clutter and interfering targets (i.e., intersecting trajectories). Bar-Shalom and Fortmann’s book [22] is a standard reference on data association. The basic problem with KF is that all probability distributions are Gaussian, hence unimodal, so that a single candidate is necessarily selected at any time. Blake and Isard [23] introduced particle filtering as a solution to the multiple-target problem within statistical estimation. The resulting algorithm, CONDENSATION, has been adopted widely in computer vision. It is worth sketching its essential elements. The classic KF seeks to maximize, at each instant, the conditional probability of the state given the past history

Conditional probability densities are propagated from 1 to in three steps. First, a deterministic drift takes place, according to the state evolution equation. Second, the distribution is relaxed to model the increased uncertainty due to time passing (stochastic diffusion). This generates the prediction of the KF. Third, the new, instantaneous measurement is incorporated, which attracts the mode and sharpens the peak, reflecting the increased certainty brought by the observation. The problem is that all KF distributions are Gaussian, hence unimodal, but the typical clutter of real sequences encourages multiple, simultaneous peaks. As only one target can be tracked, the real one may be lost in favor of a sufficiently strong, spurious alternative. The solution of particle filtering is to allow arbitrary, multimodal conditional distributions. The latter are now represented numerically by a finite number of samples. This way several modes, and therefore several targets, can coexist at any time

0364-9059/$20.00 © 2006 IEEE Authorized licensd use limted to: IE Xplore. Downlade on May 10,2 at 19:0458 UTC from IE Xplore. Restricon aply.

www.DownloadPaper.ir

TRUCCO AND PLAKAS: VIDEO TRACKING: A CONCISE SURVEY

and possibly be tracked simultaneously, as tracking one targets does not imply ignoring other candidates. Condensation works in the same three stages of the KF. However, the Riccati equation does not apply any longer, and the stochastic diffusion step is implemented by statistical sampling methods (factored sampling). The complete algorithm is described by Blake and Isard in [23], and condensation resources including papers and code is available at the CVonline website [24]. As a final note on problem definition, we summarize the design requirements for a video tracker. Robustness to clutter: the tracker should not be distracted by image elements resembling the target being tracked. Robustness to occlusion: tracking should not be definitely lost because of temporary target occlusion (drop-out), but resumed correctly when the target reappears (drop-in). False positives/negatives: only valid targets should be classified as such, and any other image element ignored (in practice, the number of false alarms should be as small as possible). Agility: the tracker should follow targets moving with significant speed and acceleration (“agile” motion). Stability: the lock and accuracy should be maintained indefinitely over time. The remainder of this paper is organized as follows. Section II presents a concise review of video tracking algorithms in computer vision. Section III presents an overview of recent work on subsea video tracking, and discusses the latter in comparison to the state-of-the-art of computer vision work. Finally, Section IV summarizes the contributions of this paper and gives some concluding considerations. II. VIDEO TRACKING IN COMPUTER VISION This section presents some fundamental video tracking techniques developed by the computer vision community. The motion problem is nearly invariably approached as described in the previous section (KF or simple predictions of the search region), whereas algorithms solving the matching problems depend substantially on the complexity of the target. We therefore organize this review by increasing order of target complexity, starting with simple window tracking, and culminating with methods learning the shape and dynamics of nonrigid targets. We sketch some reference techniques, chosen among those which have proven their potential in computer vision applications but are currently underexploited by the subsea community (Section III). A. Window Tracking The simplest tracking target possible is just a small image window, i.e., a small rectangular region. Notice that we impose no requirement on the intensities in the window. Windows can be tracked from frame to frame by correlation-like correspondence (matching) methods, many of which have been recently surveyed and evaluated by Scharstein adnd Szeliski in [20]. The assumption is that the intensity pattern changes little from frame to frame. The reference algorithm (from Trucco and Verri [21]) is as be the frames at time and , and follows. Let and , two pixels in and , respectively. Call the Authorized licensd use limted to: IE Xplore. Downlade on May 10,2 at 19:0458 UTC from IE Xplore. Restricon aply.

521

the search width (in pixels) of the correlation window, associated with , and a function quanregion in tifying the similarity of two pixel values, , . Then, for each pixel of 1) for each displacement such that , compute

(1) 2) the motion of , giving the position in , is given by that maximizes over the disparity vector , that is, . Well-known choices for are the cross correla, and the sum of squared differences (SSD), tion, . These and many other metrics are reviewed by Scharstein and Szeliski [20]. The search region in the next frame can be determined by a KF, or simply taken as a fixed-size window centered around the previous target position. To compensate for window distortion between frames due to, e.g., changes of viewpoint, one can incorporate deformation models [25], [26]. Window matching techniques can be applied at any pixel, and are often used in stereo systems to produce dense surface reconstructions [20], [27], [28]. However, in the context of tracking, some patterns of grey levels have better properties than others, e.g., their appearance may change more slowly over time. Also, tracking a limited number of reliable points is preferable to tracking all points, or points picked blindly. These observations lead to the idea of tracking image features, explained next. B. Feature Tracking We define features as detectable parts of an image which can be used to support a vision task,1 for instance corners, lines, contours, or specially defined regions. Feature tracking first locates , then matches features in two subsequent frames, and (if such a match exeach feature in with one feature in ists). Notice that the two steps are not necessarily sequential [29]. This process is very similar to window matching; the key difference is that features are now particular image elements, not just any window, and can, therefore, be chosen so to maximize desirable properties, e.g., robustness to noise or clutter. This section reviews tracking method for local features, defined by local image properties, and extended features, defined as larger areas with special properties. We also touch on optic flow methods, another powerful class of techniques for computing displacements in an image sequence. 1) Tracking Local Features: Local features are image locations with special properties, e.g., edges (point of locally maximal contrast) [30]–[33], lines [34]–[36], and corners (points where contrast is high along two directions) [37]–[40]. Smith [41] gives a good review of local feature tracking; see also [42, ch. 15]. Local features offers some invariance to image changes caused by scene or illumination changes, improving 1Notice that this definition is entirely functional to our discussion; it is not meant to capture the many usages of “feature” in all fields of computer vision.

522

www.DownloadPaper.ir

IEEE JOURNAL OF OCEANIC ENGINEERING, VOL. 31, NO. 2, APRIL 2006

detectability over time. For instance, an edge remains usually detectable from a range of different viewpoints. Moreover, extracting features reduces substantially the volume of data passed on to further processing, say, several tens of small feature descriptors instead of a full frame. As local features are defined as functions of image intensities, and not of properties of targets, they occur not only on interesting targets, but across the whole image. This implies that the features belonging to the target must be selected as soon as the target appears, which is usually performed by an exhaustive search over the whole image. After this, the search region is limited as described in Section I. Another consequence of the local nature of these features are mismatches (matching a wrong pair of features). Outlier detection using robust statistics is a well-known technique to limit this problem [11], [29], [43], at the price of an increased computational effort. Hartley and Zisserman [44] give a recent, detailed introduction to the subject. Of the many examples of local feature tracking (see [21] and [41], we describe briefly a well-tested corner tracking algorithm [9], [29], based on [40], which we have incorporated successfully in subsea applications [1], [45]. Corners have proven stable features in many subsea environments and can support several vision tasks. , with , the Consider an image sequence coordinates of an image point. We assume, as customary, that intensity values within small regions remain practically . unchanged between frames, so that Here, is the motion field, specifying the displacement of each point between frames. For discrete digital sequences, . Considering the simple translational model (see [40] for an affine warping model) , we determine the motion parameter by solving (2) support window . The linear system assoover the ciated to the above least-squares problem is , where and depend on the image gradients [9]. For accuracy, the solution is iterated with a Newton–Raphson scheme until conis the vergence of the displacement estimates. If displacement estimate at iteration , the algorithm is

with . This framework leads naturally to a choice of reliable features. A region can be tracked reliably if the linear system solving problem (2) is numerically stable. This requires that is well conditioned and its entries (which depend on the image intensities) well above noise level. In practice, since the intensity variation within the window is bound by the maximum intensity representable, the matrix is well conditioned when the smaller eigenvalue is sufficiently large to guarantee that the entries are well above noise level. Therefore, calling and the eigenvalues of associated to an image region, we select this region for tracking if , where is a threshold which can be assessed experimentally from the histogram of the eigenvalues [21]. The intensity in the image Authorized licensd use limted to: IE Xplore. Downlade on May 10,2 at 19:0458 UTC from IE Xplore. Restricon aply.

regions so identified form a corner-like pattern, as there are two directions of significant intensity changes, and for this reason the system is a corner tracker. The metric used in practice is actually a photometrically normalized SSD, that is (3) are the average grey levels and the where standard deviations in the support windows around in and in . The normalization limits the effects of intensity variations at corresponding points between frames, a not infrequent event with vehicles carrying their own illumination. We use robust statistics to discard spurious feature matches, or outliers, for which the metric in (2) is substantially larger than for correct matches [29]. The normalized SSD allows us to deviates, so that outlier detecmodel the residuals (3) as tion can be solved through a simple but effective rejection rule, X84 [43]. X84 rejects values larger than median absolute deviations (MADs) away from the median, where MAD The indexes , span a given time interval. A value of , corresponding to about 3.5 standard deviations, is adequate in practice. X84 has a breakdown point of 50%, that is, any majority of the data overrules any minority. Frame-to-frame displacements are, obviously, unknown in advance. The risk exists that any fixed search region may underestimate at least some large displacements in the sequence. At the same time, fixed, large search regions imply an unnecessary computational load. A solution is multiresolution (multiscale) tracking. Each image is represented by a multiresolution pyramid [30], and tracking is carried out in a coarse-to-fine manner. This extends considerably the range of disparities that the tracker can capture. Our C++ implementation tracks approximately 50 corners at 10 frames per second on a Pentium-III PC at 700 MHz. Full details are given in [46]. 2) Optic Flow Methods: Optic flow methods do not use features, but are nevertheless well-studied, local methods producing dense displacement fields which have been used underwater [2], [10], [47]. They are based on the celebrated image brightness constancy constraint, which assumes that local intensities do not vary significantly from frame to frame. Writing this constancy as a first-order expansion, one obtains and the an equation linking the optic flow derivatives of the intensity function (4) This is one scalar constraint for two unknowns: many methods have been devised to constrain the flow completely, for instance writing (4) at all pixels of a local neighborhood (assuming locally constant flow) [48], using regularization constraints [49], and introducing parametric motion models [2]. Barron et al. [50] and Mitiche and Bouthemy [51] are two useful surveys; the latter includes an experimental performance evaluation of different algorithms. Notice that only an approximation of the real

www.DownloadPaper.ir

TRUCCO AND PLAKAS: VIDEO TRACKING: A CONCISE SURVEY

image velocity field can be computed from an image sequence, as the only data measurable are changes of intensities, which are caused not only by motion, but also by radiometric phenomena (e.g., reflections, illumination changes) as illustrated in [21]. For this reason, Negahdaripour [47] introduces a revised definition of optic flow which models explicitly geometric and radiometric effects. Finally, the intrinsic difference between feature tracking and optic flow methods is that the former are based on feature comparisons within arbitrary search areas and can, in principle, match arbitrarily distant features; the latter are differential and therefore local in nature, and therefore require very small frame-to-frame displacements. 3) Tracking Extended Features: Extended features cover a larger part of the image than local features; they can be contours of basic shapes (e.g., ellipses, rectangles), or free-form contours, or image regions. The main advantage of extended features is their greater robustness to clutter than local features, as they rely on a larger image support (but see [52] for an intrigueing example of clutter-resistant tracking with local features). In other words, one expects a much higher risk of false positives with local features than with extended ones. Another advantage is that extended features are related more directly to significant 3-D entities; e.g., tracking image ellipses is a way to track circles in the scene. Regions are popular extended features. They are defined as connected image parts with distinguishing intensity or color properties (e.g., color histogram, texture statistics, statistical differences from an adaptive or fixed background). Notice that we make a difference between windows and regions: “window” is taken to indicate any subimage (generally rectangular), “region” one with specific properties (generally free-form). Region-based tracking systems have been reported for surveillance (people detection and tracking [53]–[55]), vehicle guidance and servoing [56]–[58], tracking in infra-red sequences [59], and photometrically invariant tracking [60]. Contours are by far the best-studied extended features for model-based tracking, partly because many methods exist for locating image contours with sufficient reliability, and partly because contours are intuitively meaningful features, often taken as the boundary of a whole object. For these reasons, Sections II-C–F concentrate on contours. The shape of a contour can change substantially over time: imagine for instance the contour of a person walking. Here a tracker must incorporate not only a motion model (as in, say, a window-based KF), but also a shape deformation model constraining the possible deformations. Considering that a discrete contour can be formed by several tens of pixels, the search space covering all possible motions and deformations can grow unwieldly large. The challenge is therefore to design models with low dimensionality (number of parameters to estimate), leading to feasible search spaces. Designing such models is relatively straightforward for rigid 3-D shapes (Sections II-C and D), which require only six parameters for 3-D motion, and much more difficult for moving and deforming 3-D objects (Section II-E), which require a higher number of parameters to describe motion and shape. For this reason several authors dealing with deformable targets have turned to visual learning techniques (Section II-F). Authorized licensd use limted to: IE Xplore. Downlade on May 10,2 at 19:0458 UTC from IE Xplore. Restricon aply.

523

C. Planar Rigid Shapes The shape of the image of a planar, rigid object depends on its position and orientation with respect to the camera, and on the type of imaging projection (e.g., perspective or affine). Simple shapes like rectangles or circles can be captured by closed-form models (e.g., implicit equations) to predict their appearance in the image. Conversely, one can use image features to solve for the shape, location and orientation parameters of the model in 3-D space. Solving over time generates translational and rotational velocities. Notice that these methods assume to know which image features (e.g., points, lines) belong to targets, and which are background. This segmentation is in general difficult, and usually tackled with knowledge about target properties (e.g., the target is a dark rectangle on a light background). Algorithms tracking planar rigid shapes are cast as the solution of a linear or nonlinear minimization problem [61], [62] yielding the instantaneous values of shape and positional parameters; an example is given in Section II-D for rigid solid objects. The shape model can be simply a vector of feature positions [63]–[65] (e.g., points in fixed relative positions in space) which are tracked over time.

D. Solid Rigid Objects Similarly to the previous section, if a model of the solid rigid objects being imaged is available, tracking can yield 3-D position and orientation in time, and therefore 3-D velocities. The basic component is an instantaneous estimation of rotation and translation in space. First, contours detected in the image and contours in the model are put in correspondence. Then, the pairs of matching contours are used to estimate the rotation and translation of the model contours generating the best approximation of the observed image [36] by minimizing an integral function of the distances between corresponding contours. This step is repeated for each new frame of a video sequence, using the previous estimate as an initial guess [36], [62], [66], [67]. Notice that some authors avoid feature correspondence by searching directly in the space of translations and orientations [66], [68], [69]. We sketch an example algorithm from this class, based on Wunsch and Hirzinger’s work [62]. The input is the estimate of the 3-D rotation and translation from the previous frame, the current frame , and a contour-based model of the 3-D object being tracked. , be the 3-D rotation and translation esti1) Let . mated from frame 2) Extract contours from the current image, . 3) Match image contours with contours of the 3-D model placed at , . 4) Evaluate a global error metric in 3-D space. , aligning image and model features by 5) Estimate minimizing the error metric over and . 6) Next frame and go to 1. The output is the instantaneous estimate of the rotation and translation in space of the 3-D object, , .

524

www.DownloadPaper.ir

IEEE JOURNAL OF OCEANIC ENGINEERING, VOL. 31, NO. 2, APRIL 2006

Different contour features can be used simultaneously, e.g., lines and ellipses, in which case the minimization in step 5 could be (5)

where and are, respectively, some distances between are weights. It the th model and image feature, and the is useful to define such distance in 3-D space, to avoid reprojecting model features onto the image and compute distances in the image plane (as done by Lowe [36]). For instance, for lines, one expects that a model line, if correctly positioned, lies on the plane defined by the corresponding image line and the centre of projection. Therefore, calling the normal to , and , the endpoints of the model line, one can define as

not take on by definition. Active contours of this kind (not necessarily splines) have been used extensively in computer vision [18], [23], [79]. A third class of deformable contours are deformable templates [80]–[83]. The distinguishing feature of these methods is the use of a specific prototype shape (e.g., an eye, a fish), specified by a set of landmark points. The motion of the landmarks is restricted by a motion model. The key difference from the previous approaches is that the allowed shapes are not just configurations of general curves, but the expected shapes of specific objects. Tracking a complex shape is therefore reduced to tracking a discrete set of points forming a fixed, deformable template. This is very useful when it is known that a specific object is present in a sequence. For completeness, we mention two other related methods in computer vision, level-set methods [84] and eigentracking [85]. They have been used successfully but their impact has been arguably smaller than that of other techniques discussed here. We, therefore, just point the reader to the references above.

(6) The resulting minimization problem can be linearized and applied iteratively to reduce errors [62]. E. Tracking Deformable Contours Predicting the evolution of a target through a CAD-like model becomes computationally very expensive for deformable objects. As anticipated above, the problem becomes to devise a model predicting target deformation and motion with a limited number of parameters. Foundational here is work on snakes [70], [71], i.e., image contours formed of discrete particles (pixels) bound together by internal elastic forces and sensitive to forces created by image gradients. Both types of forces create a potential energy that the snake seeks to minimize by changing its shape. At convergence, the snake has moulded itself along an image contour defining an area of interest; in our case, an object being tracked. The intrinsic continuity of the snake recovers broken contours automatically, but the energy functional must be designed carefully to avoid odd shapes. Notice that the shape of a snake is dictated only by the forces acting on it, and does not necessarily belong to any specific family of curves. Snakes and related techniques (e.g., balloons, see [72]), augmented with appropriate motion models, have been used for tracking contours in various domains [56], [73], [74], including underwater [75], [76]. An alternative is to use a parametric geometric model, typically B-splines [77], to represent the contour being tracked. This constrains the space of all possible shapes; Blake and Isard [78] give an excellent introduction. The deformations allowed are captured by the space of the parameters in the contour’s parametrization, known as shape space. With an appropriate parameterization, a very complex set of deformations are captured by a feasible number of parameters. The key advantage over snakes is that acceptable shapes are guaranteed by the mathematical model itself, not by a particular energy function. In other words, particular images can force a snake into awkward shapes (e.g., twisted contour, unwanted corners) which a B-spline could Authorized licensd use limted to: IE Xplore. Downlade on May 10,2 at 19:0458 UTC from IE Xplore. Restricon aply.

F. Visual Learning Visual learning [86] is a powerful computer vision paradigm and the culmination of our brief review. The main idea is to get a system to learn the shape (appearance) and dynamics of complex, deformable objects from example videos or images, as opposed to capturing shape and dynamics in a priori models. This is an attractive proposition, particularly when targets are difficult to capture in a priori models as it is often the case subsea. The collection of example images (usually hundreds or thousands of images) is processed to elicit a common description, typically via principal component analysis (PCA), statistical learning (e.g., support vector machines [87], [88], or estimation theory [23]. Representative techniques based on PCA are eigenspaces [89], active shape models [18], [81], [83], and active appearance models [90]. We give the trace of a contourbased tracking algorithm (adapted from [18]) using PCA-based visual learning. The aim is to show how PCA can be used to capture complex shapes, and how the resulting representation can be combined with motion models to track deformable objects. The algorithm first learns a shape model, capturing the space of all possible shape, then uses the model to track targets. A) Learning the shape model. The objective is to devise a low-dimensional model capturing the shape deformations of a complex, flexible object. The input is formed by example frames containing the target. 1) Extract (manually or automatically) silhouettes of example targets from the training sequences. Represent each contour with a fixed number of equally spaced points, say . 2) Align centroids and orientations of all shapes. Let be the column vector of the resulting coordinates of the th contour. 3) Compute the mean contour by averaging all contours. 4) Form the residual matrix .

www.DownloadPaper.ir

TRUCCO AND PLAKAS: VIDEO TRACKING: A CONCISE SURVEY

5) Find the eigenvalues and eigenvectors of , reand . spectively 6) Reduce model dimensionality by considering only the eigenvalues (indicatively, ). first Any target contour can now be written through its projections onto the eigendirections, that is (7) where is the matrix of the first eigenvectors. Consequently, is now represented by , a -dimensional vector, plus a common part including the mean shape , and the matrix . B) Tracking shapes. The idea is to use Kalman filtering to track motion and deformation parameters simultaneously. Both must be included in the state vector. The shape model above must be combined with a motion model to achieve the necessary KF equations. To do this, consider the transin (7) to image coformation from shape coordinates ordinates (8) a motion matrix where is the image center, and which summarizes rotation and scaling, with 2 degrees of freedom. Translation is accounted for separately through the motion of . Combining (7) and (8) we obtain the desired, combined model (9) which can be used to set up the desired KF. The state (position and vector is formed by (shape), and detranslational velocity), and the two parameters of scribing rotation and scaling. For each new frame acquired, the image is explored along normals to the previous-frame contour at each contour point. The highest contrast edge is located along each normal, defining the new observed contour points. The observation equation is , where is the vector of the obsimply is obtained by differentiating (9), and served contour points, the customary KF noise. The state evolution equations predict simply either linear or static state parameters with additive noise. The complete algorithm is given by Baumberg and Hogg in [18]. III. UNDERWATER VIDEO TRACKING The design of a video tracker in the underwater environment must consider all the problems and requirements listed in Section I, plus a number of specific issues. First, the physical image formation process [91] is complicated by the presence of water and possibly by several phase changes as light travels from the water into the camera, maybe through a waterproof, transparent case. This makes image restoration based on physical models of image formation difficult. Second, image quality is usually worse than that of images acquired in air; subsea-specific phenomena like water turbidity, marine snow, and rapid light dispersion with distance can degrade images seriously for image processing purposes. Third, cameras and lights are often carried by Authorized licensd use limted to: IE Xplore. Downlade on May 10,2 at 19:0458 UTC from IE Xplore. Restricon aply.

525

the same vehicle, so that illumination varies with the vehicle’s motion. Fourth, for scientific missions in particular, one deals with natural objects difficult to characterize with shape-based algorithms; even the appearance of regular, man-made objects like pipes or relics is often dramatically transformed by marine growth. We have examined 28 recent papers reporting video tracking work and closely related work within the subsea research community. The results are summarized in Table I, which offers a compact, at-a-glance overview. The column Application gives the applicative context, if any, to which the paper refers. The column Method identifies the main techniques used. Imaging model states whether the method incorporates any image formation model. Notice that “CML” stand for “concurrent mapping and localization.” Papers appear in no special order. The table suggests several observations. First, with reference to the subsea-specific problems listed above, only a small minority of authors incorporate physical models of subsea imaging, e.g., of the kind reported by Jaffe in [91]. A few authors use empirical solutions to counteract some of the effects of through-water imaging; the vast majority ignores, in practice, the fact that imaging occurs in water. This is hardly surprising, as tracking per se does not require precise metric measurements, but only the localization of moving image regions through a sequence. More surprising is the fact that little attempt is made to enhance the notoriously noisy subsea imagery before processing (e.g., counteract illumination patterns at depth, filter out marine snow and similar noise). Similarly, well-established algorithms on appearance-based and deformable models are still seldom used for subsea video processing. A reason may be that training sequences are not always available, but this seems a weak motivation for the nearly complete absence of such techniques in the recent literature of subsea computer vision. Second, most authors use the current and previous frame only for motion analysis; only a few adopts algorithms exploiting the history of a motion, e.g., Kalman filtering [6], [17], or algorithms requiring extended temporal windows, e.g., optic flow. A possible reason is that much of the work reviewed considers slow motion if compared to the acquisition frame rate, leading to limited frame-to-frame displacements. Third, most authors use feature or window matching, and only very few use dense optic flow algorithms. Correlation-like matching seems very popular, with lines and corners the most used features. Corners are nearly invariably defined as in [37] or [40]. Lines are used particularly (and obviously) for pipeline and cable detection and tracking; the Hough transform [21] is a favorite detector. Fourth, in mosaicing applications [8]–[11], the assumption of near-planar seafloor is practically ubiquitous, and makes it possible to assume a simple, linear transformation (an homography) between frames. Fifth, according to the work reviewed, the main fields of application for subsea video tracking and motion analysis are visual servoing and dynamic positioning, cable and pipeline inspection, soldering and similar ROV/autonomous underwater vehicle (AUV) intervention, seafloor mosaicing, concurrent

526

www.DownloadPaper.ir

IEEE JOURNAL OF OCEANIC ENGINEERING, VOL. 31, NO. 2, APRIL 2006

TABLE I

localization and mapping, and monitoring particular animal species in their habitat (including fish, plankton, and starfish). Sixth, and importantly, various techniques developed in the computer vision community have not yet been deployed underAuthorized licensd use limted to: IE Xplore. Downlade on May 10,2 at 19:0458 UTC from IE Xplore. Restricon aply.

water, or at least do not seem to have been reported. Suprising omissions, given their success in computer vision applications, are shape-space methods [18], active shape and active appearance models [90], but see [83] for an underwater application

www.DownloadPaper.ir

TRUCCO AND PLAKAS: VIDEO TRACKING: A CONCISE SURVEY

of active shape models, and particle filtering [23]. Possibly less surprising omissions include eigentracking [85] and level-set methods [92] (although the latter have been successfully applied in a variety of segmentation and tracking problems).

IV. SUMMARY AND CONCLUSION The first contribution of this paper is a concise introduction to video tracking in computer vision, including design requirements and a review of techniques from simple window tracking to tracking complex, deformable objects by learning models of shape and dynamics (Section II). The second contribution is an overview of recent work on subsea video tracking and related motion analysis (Section III). We notice that very few authors incorporate explicit models of subsea-specific imaging phenomena. Such models could prove very useful to design task-specific enhancement algorithms, leading to more robust video tracking, and ultimately more reliable subsea video processing. Although incorporating image formation models may bring limited returns with short videos in relatively unchanging conditions, we believe that it may prove a key technology for truly autonomous AUVs, called to operate independently over extended periods of time. The third contribution is to attract the attention of the subsea video processing community on promising research directions in video tracking opened by recent computer vision work (Section III). Several well-proven techniques have not yet been taken up by the subsea research community. We believe that visual learning and model-based tracking can benefit greatly many subsea video-based tasks, including visual servoing with man-made structures (e.g., AUVs docking at subsea “garages”), object recognition in typically noisy environment and with nonrigid objects, and structure inspection (e.g., cables and pipelines).

REFERENCES [1] J.-F. Lots, D. M. Lane, E. Trucco, and F. Chaumette, “A 2-d visual servoing for underwater vehicle station keeping,” in IEEE Conf. Robot. Automat., Seoul, Korea, May 2001, pp. 2767–2772. [2] F. Spindler and P. Bouthemy, “Real-time estimation of dominant motion in underwater video images for dynamic positioning,” in Proc. IEEE Int. Conf. Robot. Automat., 1998, pp. 1063–1068. [3] A. J. Woods, J. D. Penrose, A. J. Duncan, R. Koch, and D. Clark, “Improving the operability of remotely operated vehicles,” Australian Petroleum Production and Exploration Associated (APPEA) J., vol. 1, pp. 849–854, 1998. [4] R. Marks, H. Wang, M. Lee, and S. Rock, “Automatic visual station keeping of an underwater robot,” in Proc. IEEE Int. Conf. Robot. Automat., vol. 2, 1994, pp. 137–142. [5] N. Gracias and J. Santos-Victor, “Underwater video mosaics as visual navigation maps,” Comput. Vis. Image Understanding, vol. 79, no. 1, pp. 66–91, 2000. [6] M. Simo, A. Ortiz, and G. Oliver, “A vision system for an underwater cable tracker,” Mach. Vis. Applications, vol. 13, pp. 129–140, 2002. [7] N. Stagg, “Better roving through advanced telerobotics,” Underwater Mag., vol. Fall, 1995. [8] H. Singh, L. Whitcomb, D. Yoerger, and O. Pizarro, “Microbathymetric mapping from underwater vehicles in the deep ocean,” Comput. Vis. Image Understanding, vol. 79, no. 1, pp. 143–161, 2000. [9] E. Trucco, Y. R. Petillot, I. Tena Ruiz, K. Plakas, and D. M. Lane, “Feature tracking in video and sonar subsea sequences with applications,” Comput. Vis. Image Understanding, vol. 79, no. 1, pp. 92–122, 2000.

Authorized licensd use limted to: IE Xplore. Downlade on May 10,2 at 19:0458 UTC from IE Xplore. Restricon aply.

527

[10] S. Negahdaripour and X. Xu, “Mosaic-based positioning and improved motion estimation methods for automatic navigation of submersible vehicles,” IEEE J. Ocean. Eng., vol. 27, no. 1, pp. 79–99, Jan. 2002. [11] N. Gracias, S. Zwaan, A. Bernardino, and J. Santos-Victor, “Mosaic based navigation for autonomous underwater vehicles,” IEEE J. Ocean. Eng., vol. 28, no. 4, pp. 609–624, Oct. 2003. [12] P. S. Maybeck, Stochastic Models, Estimation and Control. London, U.K.: Academic, 1979, vol. 1, 2. [13] G. S. Manku, P. Jain, A. Aggarwal, and L. Kumar, “Object tracking using affine structure for point correspondence,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 1997, pp. 704–709. [14] L. Matthies, T. Kanade, and R. Szelisky, “Kalman filter based algorithms for estimating depth from image sequences,” Int. J. Comput. Vis., vol. 3, pp. 209–236, 1989. [15] L. S. Shapiro, H. Wang, and J. M. Brady, “A matching and tracking strategy for independently moving objects,” in Proc. British Machine Vision Conf., 1992, pp. 306–315. [16] Y. Yao and R. Chellappa, “Dynamic feature point tracking in an image sequence,” in Proc. Int. Conf. Pattern Recog., vol. 1, 1994, pp. 654–657. [17] M. Zampato, R. Pistellato, D. Maddalena, and I. Bruno, “Visual motion estimation for tumbling satellite capture,” in Proc. British Machine Vision Conf., vol. 2, 1996, pp. 565–574. [18] A. Baumberg and D. Hogg, “Learning flexible models from image sequences,” in Proc. Eur. Conf. Comput. Vis., J.-O. Ekhlund, Ed., 1994, pp. 299–308. [19] O. Faugeras, Three-Dimensional Computer Vision: A Geometric Viewpoint. Cambridge, MA: The MIT Press, 1993. [20] D. Scharstein and R. Szeliski, “A taxomony and evaluation of dense two-frame stereo correspondence algorithms,” Int. J. Comput. Vis., vol. 47, pp. 7–42, 2002. [21] E. Trucco and A. Verri, Introductory Techniques for 3-D Computer Vision. Englewood Cliffs, NJ: Prentice-Hall, 1998. [22] Y. Bar-Shalom and T. E. Fortmann, Tracking and Data Association. London, U.K.: Academic, 1988. [23] M. Isard and A. Blake, “Condensation—Conditional density propagation for visual tracking,” Int. J. Comput. Vis., vol. 29, no. 1, pp. 5–28, 1998. [24] M. Isard. (1999) Condensation Home Page. [Online]www.dai.ed.ac.uk/CVONLINE [25] C. Fuh and P. Maragos, “Motion displacement estimation using an affine model for matching,” Opt. Eng., vol. 30, no. 7, pp. 881–887, 1991. [26] R. Manmatha and J. Oliensis, “Extracting affine deformations from image patches—I: Finding scale and rotation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 1993, pp. 754–755. [27] F. Isgrò, E. Trucco, and L.-Q. Xu, “Toward teleconferencing by view synthesis and large-baseline stereo,” in Proc. Int. Conf. Image Anal. Process., 2001, pp. 198–203. [28] P. J. Narayanan, P. W. Rander, and T. Kanade, “Constructing virtual worlds using dense stereo,” in Proc. IEEE Int. Conf. Comput. Vis., 1998, pp. 3–10. [29] T. Tommasini, A. Fusiello, V. Roberto, and E. Trucco, “Robust feature tracking in underwater video sequences,” in Proc. OCEANS’98, vol. 1, Nice, France, Sep. 1998, pp. 46–50. [30] L. Bretzner and T. Lindeberg, “Feature tracking with automatic selection of spatial scale,” Comput. Vis. Image Understanding, vol. 71, no. 3, pp. 385–391, 1998. [31] H. Gu, M. Asada, and Y. Shirai, “The optimal partition of moving edge segments,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 1993, pp. 367–372. [32] E. Hayman, T. Thorhallsson, and D. Murray, “Zoom-invariant tracking using points and lines in affine views,” in Proc. IEEE Int. Conf. Comput. Vis., 1999, pp. 269–277. [33] T. Vieville and O. Faugeras, “Robust and fast computation of edge characteristics in image sequences,” Int. J. Comput. Vis., vol. 10, no. 2, pp. 153–179, 1994. [34] R. Deriche and O. Faugeras, “Tracking line segments,” in Proc. Eur. Conf. Comput. Vis., O. Faugeras, Ed., 1990, pp. 259–268. [35] C. Harris, “Tracking with rigid models,” in Active Vis., A. Blake and A. Yuille, Eds., 1992, pp. 59–73. [36] D. Lowe, “Robust model-based motion tracking through the integration of search and estimation,” Int. J. Comput. Vis., vol. 8, pp. 113–122, 1992. [37] C. Harris and M. Stephens, “A combined corner and edge detector,” in Proc. British Machine Vis. Conf., 1988, pp. 147–151. [38] R. Deriche and G. Giraudon, “A computational approach for corner and vertex detection,” Int. J. Comput. Vis., vol. 10, no. 2, pp. 101–124, 1993. [39] H. Wang and M. Brady, “Real-time corner detection algorithm for motion estimation,” Image Vis. Comput., vol. 13, no. 9, pp. 695–705, 1995.

528

www.DownloadPaper.ir

IEEE JOURNAL OF OCEANIC ENGINEERING, VOL. 31, NO. 2, APRIL 2006

[40] J. Shi and C. Tomasi, “Good features to track,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 1994, pp. 593–600. [41] S. Smith. (1999) Literature Review on Feature-based Tracking Approaches. [Online]. Available: http://www.dai.ed.ac.uk/CVonline/motion.htm [42] D. Forsyth and J. Ponce, Computer Vision: A Modern Approach. Englewood Cliffs, NJ: Prentice-Hall, 2002. [43] F. R. Hampel, P. J. Rousseeuw, E. M. Ronchetti, and W. A. Stahel, Robust Statistics: The Approach Based on Influence Functions, ser. Wiley Series in probability and mathematical statistics. New York: Wiley, 1986. [44] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision. Cambridge, U.K.: Cambridge Univ. Press, 2002. [45] E. Trucco, A. Doull, F. Odone, A. Fusiello, and D. M. Lane, “Dynamic video mosaics and augmented reality for subsea inspection and monitoring,” in Oceanol. Int. 2000, Brighton, U.K., Mar. 2000. [46] K. Plakas, “Video sequence analysis for subsea robotics,” Ph.D. dissertation, Heriot-Watt Univ., Edinburgh, U.K., 2001. [47] S. Negahdaripour, “Revised definition of optical flow: Integration of radiometric and geometric cues for dynamic scene analysis,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 9, pp. 961–979, Sep. 1998. [48] B. D. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo vision,” in Proc. Int. Joint Conf. Artif. Intell., 1981, pp. 674–679. [49] B. K. P. Horn and B. G. Schunk, “Determining optical flow,” Artif. Intell., vol. 17, pp. 185–203, 1981. [50] J. L. Barron, D. J. Fleet, and S. Beauchemin, “Performance of optical flow techniques,” Int. J. Comput. Vis., vol. 12, no. 1, pp. 43–77, 1994. [51] A. Mitchie and P. Bouthemy, “Computation and analysis of image motion: A synopsis of current problems and methods,” Int. J. Comput. Vis., vol. 19, no. 1, pp. 29–55, 1996. [52] D. Halevy and D. Weinshall, “Motion of disturbances: Detection and tracking of multi-body rigid motion,” Mach. Vis. Applicat., vol. 11, pp. 122–137, 1999. [53] Q. Cai and J. K. Aggarwal, “Tracking human motion in structured environments using a distributed-camera system,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 21, no. 12, pp. 1241–1247, Dec. 1999. [54] I. Haritaoglu, D. Harwood, and L. S. Davis, “W4: Real-time surveillance of people and their activities,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 8, pp. 809–830, Aug. 2000. [55] W. M. Lu and Y.-P. Tan, “A color histogram based people tracking system,” in Proc. IEEE Int. Symp. Circuits Syst., Sydney, Australia, 2001, pp. 137–140. [56] B. Bascle and R. Deriche, “Region tracking through image sequences,” in Proc. IEEE Int. Conf. Comput. Vis., 1995, pp. 302–307. [57] J. Crisman, “Color region tracking for vehicle guidance,” in Active Vis., A. Blake and A. Yuille, Eds., 1992, pp. 107–120. [58] J. Orwell, P. Remagnino, and G. A. Jones, “Multi-camera color tracking,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 1999, pp. 117–123. [59] P. Kornprobst and G. Medioni, “Tracking segmented objects using tensor voting,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2000, pp. 118–125. [60] G. Hager and P. Belhumeur, “Real-time tracking of image regions with changes of geometry and illumination,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 1996, pp. 403–410. [61] K. Kanatani, Geometric Computation for Machine Vision. Oxford, U.K.: Oxford Univ. Press, 1993. [62] P. Wunsch and G. Hirzinger, “Real-time visual tracking of 3-d objects with dynamic handling of occlusions,” in Proc. IEEE Int. Conf. Robot. Automat., 1997, pp. 2868–2873. [63] B. Espiau, F. Chaumette, and P. Rives, “A new approach to visual servoing in robotics,” IEEE Trans. Robot. Automat., vol. 8, no. 3, pp. 313–325, Jun. 1992. [64] E. Marchand and G. D. Hager, “Dynamic sensor planning in visual servoing,” in Proc. IEEE Int. Conf. Robot. Automat., vol. 1, 1998, pp. 1988–1993. [65] C. E. Smith and N. P. Papanikolopoulos, “Grasping of static and moving objects using a vision-based control approach,” J. Intell. Robot. Syst., vol. 19, pp. 237–270, 1997. [66] P. Wunsch and G. Hirzinger, “Registration of cad models to images by iterative inverse perspective matching,” in Proc. Int. Conf. Pattern Recognit., 1996, pp. 77–83. [67] A. Worrall, G. Sullivan, and K. Baker, “Pose refinement of active models using forces in 3d,” in Proc. Eur. Conf. Comput. Vis., 1994, pp. 341–350. [68] P. Besl and N. McKay, “A method for registration of 3-d shapes,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 14, no. 2, pp. 239–256, Feb. 1992. Authorized licensd use limted to: IE Xplore. Downlade on May 10,2 at 19:0458 UTC from IE Xplore. Restricon aply.

[69] E. Trucco, A. Fusiello, and V. Roberto, “Robust motion and correspondences of 3-d point sets with missing data,” Pattern Recognit. Lett., vol. 20, no. 9, pp. 889–898, 1999. [70] M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: Active contour models,” in Proc. IEEE Int. Conf. Comput. Vis., 1987, pp. 259–268. [71] D. Terzopoulos and R. Szeliski, “Tracking with kalman snakes,” in Active Vis., A. Blake and A. Yuille, Eds., 1992, pp. 3–20. [72] L. D. Cohen, “On active contour models and balloons,” Comput. Vis. Image Understanding, vol. 53, no. 2, pp. 211–218, 1991. [73] Y. Fu, A. Tanju Erdem, and A. Murat Tekalp, “Tracking visible boundaries of objects using occlusion adaptive motion snakes,” IEEE Trans. Image Process., vol. 9, no. 12, pp. 2051–2060, Dec. 2000. [74] A. Giachetti and V. Torre, “Optical flow and deformable objects,” in Proc. IEEE Int. Conf. Comput. Vis., 1995, pp. 706–711. [75] D. M. Kocak, N. da Vitoria Lobo, and E. A. Widder, “Computer vision techniques for quantifying, tracking, and identifying bioluminescent plankton,” IEEE J. Ocean. Eng., vol. 24, no. 1, pp. 81–95, Jan. 1999. [76] S. Reed, Y. R. Petillot, and J. M. Bell, “An automatic approach to the detection and extraction of mine features in sidescan sonar,” IEEE J. Ocean. Eng., vol. 28, no. 1, pp. 90–105, Jan. 2003. [77] R. Bartels, J. Beatty, and B. Barsky, An Introduction to Splines for Use in Computer Graphics and Geometric Modeling. San Mateo, CA: Morgan Kaufmann, 1997. [78] A. Blake and M. Isard, Active Contours. London, U.K.: SpringerVerlag, 1998. [79] Y. Ricquebourg and P. Bouthemy, “Real-time tracking of moving persons by exploiting spatio-temporal image slices,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 8, pp. 797–808, Aug. 2000. [80] Y. Zhong, A. Jain, and M.-P. Dubuisson-Jolly, “Object tracking using deformable templates,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 5, pp. 544–549, May 2000. [81] T. Cootes, A. Hill, C. Taylor, and J. Haslam, “The use of active shape models for locating structure in medical images,” Int. J. Comput. Vis., vol. 12, no. 6, pp. 356–366, 1994. [82] A. Yuille and P. Hallinan, “Deformable templates,” in Active Vision, A. Blake and A. Yuille, Eds., 1992, pp. 21–38. [83] R. Tillett, N. McFarlane, and J. Lines, “Estimating dimensions of freeswimming fish using 3-d point distribution models,” Comput. Vis. Image Understanding, vol. 79, pp. 123–141, 2000. [84] M. Bertalmio, G. Sapiro, and G. Randall, “Morphing active contours,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 7, pp. 733–737, Jul. 2000. [85] M. Black and A. Jepson, “Eigentracking: Robust matching and tracking of articulated objects using a view-based representation,” in Proc. Eur. Conf. Comput. Vis., G. Sandini, Ed., 1996, pp. 328–342. [86] S. Nayar and T. Poggio, Early Visual Learning. Oxford, U.K.: Oxford Univ. Press, 1996. [87] M. Pontil and A. Verri, “Object recognition with support vector machines,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 6, pp. 637–646, Jun. 1998. [88] S. Avidan, “Support vector tracking,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2001, pp. 283–310. [89] H. Murase and N. Nayar, “Visual learning and recognition of 3-d objects from appearance,” Int. J. Comput. Vis., vol. 14, pp. 5–24, 1995. [90] T. Cootes, G. Edwards, and C. Taylor, “Active appearance models,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 6, pp. 681–685, Jun. 2001. [91] J. S. Jaffe, “Computer modeling and the design of optimal underwater imaging systems,” IEEE J. Ocean. Eng., vol. 15, no. 2, pp. 101–111, Apr. 2000. [92] N. Paragios and R. Deriche, “A pde-based level set approach for detection and tracking of moving objects,” in Proc. IEEE Int. Conf. Comput. Vis., 1998, pp. 1139–1145. [93] Y. Rzhanov, L. M. Linnett, and R. Forbes, “Underwater video mosaicing for seabed mapping,” Proc. Image Process., vol. 1, pp. 224–227, 2000. [94] Y. Rzhanov, L. Huff, and R. Cutter, “Improvement of image alignment using camera attitude information,” in 6th Int. Symp. Image Process. Applicat., vol. 2, 2001, pp. 639–642. [95] J. Rife and S. Rock, “Visual tracking of jellyfish in situ,” in Proc. Int. Conf. Image Process., vol. 1, 2001, pp. 289–292. [96] S. Fleischer, H. Wang, S. Rock, and M. Lee, “Video mosaicking along arbitrary vehicle paths,” in Proc. Int. Symp. Autonomous Underwater Vehicle Technol., 1996, pp. 293–299. [97] S. Fleischer, S. Rock, and R. Burton, “Global position determination and vehicle path estimation from a vision sensor for real-time video mosaicking and navigation,” in Proc. OCEANS Conf., vol. 1, 1997, pp. 641–647.

www.DownloadPaper.ir

TRUCCO AND PLAKAS: VIDEO TRACKING: A CONCISE SURVEY

[98] A. Balasuriya and T. Ura, “Autonomous underwater vehicle navigation scheme for cable following,” in Proc. Int Conf. Intell. Transport. Syst., 2001, pp. 519–524. [99] R. Li, H. Li, W. Zou, R. Smith, and T. Curan, “Quantitative photogrammetric analysis of digital underwater video imagery,” IEEE J. Ocean. Eng., vol. 22, no. 2, pp. 364–375, Apr. 1997. [100] R. Eustice, H. Singh, and J. Howland, “Image registration underwater for fluid flow measurements and mosaicking,” in Proc. OCEANS Conf., vol. 3, 2000, pp. 1529–1534. [101] F.-X. Espiau and P. Rives, “Extracting robust features and 3d reconstruction in underwater images,” in Proc. OCEANS Conf., vol. 4, 2001, pp. 2564–2569. [102] M. Simo, A. Ortiz, and G. Oliver, “Optimized image sequence analysis for real-time underwater cable tracking,” in Proc. OCEANS Conf., vol. 1, 2000, pp. 497–504. [103] D. Wettergreen, C. Gaskett, and A. Zelinsky, “Development of a visually-guided autonomous underwater vehicle,” in Proc. OCEANS Conf., vol. 2, 1998, pp. 1200–1204. [104] K. Nishihara, “Practical real-time imaging stereo matcher,” Opt. Eng., vol. 23, pp. 536–545, 1984. [105] C. Silpa-Anan, T. Brinsmead, S. Abdallah, and A. Zelinsky, “Preliminary experiments in visual servo control for autonomous underwater vehicle,” in Proc. Int Conf. Intell. Robots and Syst., 2001, pp. 1824–1829. [106] S. Negahdaripour and P. Firoozfam, “Positioning and photo-mosaicking with long image sequences: Comparison of selected methods,” in Proc. OCEANS Conf., Nice, France, 2001, pp. 2584–2592. [107] P. Firoozfam and S. Negahdaripour, “Reliability analysis of parameter estimation in linear models with applications to mensuration problems in computer vision,” in Proc. OCEANS Conf., vol. 3, 2002, pp. 1595–1602. [108] M. Minami, J. Agbanhan, and T. Asakura, “Manipulator visual servoing and tracking of fish using genetic algorithms,” Indust. Robot, vol. 26, no. 4, pp. 278–289, 1999. [109] P. Lagstad, “Detecting linear motion of an object in a sequence of monocular underwater images,” in Proc. Int. Symp. Autonomous Underwater Vehicle Technol., 1996, pp. 343–347. [110] G. Conte, S. Zanoli, A. Perdon, G. Tascini, and P. Zingaretti, “Automatic analysis of visual data in submarine pipeline inspection,” in Proc. OCEANS Conf., vol. 3, Sep. 1996, pp. 1213–1219. [111] K. Plakas, E. Trucco, and A. Fusiello, “Uncalibrated vision for 3-D underwater applications,” in Proc. OCEANS Conf., Nice, France, Sep. 1998, pp. 272–276. [112] V. D. Gesú, F. Isgró, D. Tegolo, and E. Trucco, “Finding essential features for tracking starfish in a video sequence,” in Proc. IAPR Int. Conf. Image Anal. Process., Mantova, Italy, Sep. 2003, pp. 504–509.

View publication stats

Authorized licensd use limted to: IE Xplore. Downlade on May 10,2 at 19:0458 UTC from IE Xplore. Restricon aply.

529

Emanuele Trucco received the B. Sc. and Ph.D. degrees in electronic engineering from the University of Genoa, Genoa, Italy, in 1984 and 1990, respectively. Currently, he is a Reader (Associate Professor) in the School of Engineering and Physical Sciences, Heriot-Watt University, Edinburgh, U.K. He published more than 100 refereed publications and co-authored (with Alessandro Verri) a book widely adopted by the international community. Press reports include New Scientist, the Financial Times, and an invited participation in the BBC Tomorrow’s World Roadshow 2002. His research interests are in multiple view vision, motion analysis, image-based rendering, and applications to image-based communications, videoconferencing, medical image processing and subsea robotics. Dr. Trucco has served as and Editor of the IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS (PART C) and an Honorary Editor of the IEE Proceedings in Vision, Signal and Image Processing, and Pattern Analysis and Applications. He has received research funding and awards from the European Union, EPSRC, various foundations such as Royal Society and British Council. He serves regularly on professional, technical, and organizing committees for international events in computer vision and image processing.

Konstantinos Plakas received the B.Sc. in physics from the University of Ioannina, Ioannina, Greece, in 1995, and the M.Sc. degree in knowledge-based systems and the Ph.D. degree from Heriot-Watt University, Edinburgh, U.K., in 1996 and 2000, respectively. From 2000 to 2003, he was a Research Associate on the VIRTUE project, which demonstrates the first European immersive videoconferencing system based on computer vision technology. Currently, he is with Seebyte Ltd., a spinoff company of Heriot-Watt University. His research interests are on applications of computer vision to subsea robotics, especially tracking, uncalibrated vision, and metrology.

Lihat lebih banyak...

Video Tracking: A Concise Survey

Descripción

Comentarios