0% found this document useful (0 votes)
31 views14 pages

Flux Tensor Constrained Geodesic Active Contours

Uploaded by

aqsahussain272
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views14 pages

Flux Tensor Constrained Geodesic Active Contours

Uploaded by

aqsahussain272
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

20 JOURNAL OF MULTIMEDIA, VOL. 2, NO.

4, AUGUST 2007

Flux Tensor Constrained Geodesic Active


Contours with Sensor Fusion for Persistent
Object Tracking
Filiz Bunyak, Kannappan Palaniappan, Sumit Kumar Nath
Department of Computer Science, University of Missouri-Columbia, MO 65211-2060, USA
Email: {bunyak,palaniappank}@missouri.edu, [email protected]
Gunasekaran Seetharaman
Dept of Electrical and Computer Engineering, Air Force Institute of Technology, OH 45433-7765 USA
Email:[email protected]

Abstract— This paper makes new contributions in motion On board lidars and radars have been used success-
detection, object segmentation and trajectory estimation to fully in unmanned autonomous vehicles, extending their
create a successful object tracking system. A new efficient versatility well beyond what was demonstrated in the
motion detection algorithm referred to as the flux tensor
is used to detect moving objects in infrared video without 1990’s based on dynamic scene analysis of visible video
requiring background modeling or contour extraction. The only. Most of the autonomous vehicles competing in the
flux tensor-based motion detector when applied to infrared recent DARPA Grand Challenge events used one or more
video is more accurate than thresholding ”hot-spots”, and lidar sensors to augment the video imagery, demonstrating
is insensitive to shadows as well as illumination changes in intelligent navigation using fusion of multiple information
the visible channel. In real world monitoring tasks fusing
scene information from multiple sensors and sources is sources [3]. Autonomous navigation in city traffic with
a useful core mechanism to deal with complex scenes, weather, signals, vehicles, pedestrians, and construction
lighting conditions and environmental variables. The object will be more challenging.
segmentation algorithm uses level set-based geodesic active Effective performance in persistent tracking of people
contour evolution that incorporates the fusion of visible and objects for navigation, surveillance, or forensic behav-
color and infrared edge informations in a novel manner.
Touching or overlapping objects are further refined dur- ior analysis applications require robust capabilities that
ing the segmentation process using an appropriate shape- are scalable to changing environmental conditions and
based model. Multiple object tracking using correspondence external constraints (ie visibility, camouflage, contraband,
graphs is extended to handle groups of objects and occlusion security, etc.) [4]. For example, monitoring the barrier
events by Kalman filter-based cluster trajectory analysis around sensitive facilities such as chemical or nuclear
and watershed segmentation. The proposed object tracking
algorithm was successfully tested on several difficult outdoor plants will require using multiple sensors in addition to a
multispectral videos from stationary sensors and is not network of (visible) video cameras. Both infrared cameras
confounded by shadows or illumination variations. and laser-scanner based lidar have been used to success-
Index Terms— Flux tensor, sensor fusion, object tracking, fully enhance the overall effectiveness of such systems.
active contours, level set, infrared images. In crowds or busy traffic areas even though it may be
impractical to monitor and track each person individually,
I. I NTRODUCTION information fusion that characterizes objects of interest
can significantly improve throughput. Airport surveillance
In real world monitoring tasks, persistent moving object
systems using high resolution infrared/thermal video of
detection and tracking remains a challenging problem
people can extract invisible biometric signatures to char-
due to complex scenes, lighting conditions, environ-
acterize individuals or tight groups, and use these short-
mental variables (particularly in outdoor settings with
term multispectral blob signatures to resolve cluttered
weather), clutter, noise, and occlusions. Fusing scene
regions in difficult video segmentation tasks.
information from multiple sensors and sources is a useful
This paper presents a new moving object detection
core mechanism to deal with these problems. Recent
and tracking system for surveillance applications using
developments in micro-optics and micro-electomechanical
fusion of visible and infrared information. A preliminary
systems (MEMS), VCSELS, tunable RCLEDS (resonant
version of the paper has been published in [5]. Infrared
cavity LEDS), and tunable micro-bolometers indicate that
imagery is less sensitive to illumination related problems
hyperspectral imaging will rapidly become as ubiquitous
such as uneven lighting, moving cast shadows or sudden
as visible and thermal videos are today [1], [2].
illumination changes (i.e. cloud movements) that cause
This paper is based on “Geodesic Active Contour Based Fusion false detections, missed objects, shape deformations, false
of Visible and Infrared Video for Persistent Object Tracking,” by F. merges etc. in visible imagery. But use of infrared imagery
Bunyak, K. Palaniappan, S. Nath and G. Seetharaman, which appeared in
the Proceedings of the 8th IEEE Workshop on Applications of Computer alone often results in poor performance since gener-
Vision (WACV 2007), Texas, USA, February 2007. c 2007 IEEE. ally these sensors produce imagery with low signal-to-

© 2007 ACADEMY PUBLISHER


JOURNAL OF MULTIMEDIA, VOL. 2, NO. 4, AUGUST 2007 21

Visible Infrared
Imagery Imagery more efficiently, thus produce less noisy and more spa-
tially coherent motion segmentation results compared to
Motion Detection
classical optical flow methods [7]. The flux tensor is
(Flux Tensors) more efficient in comparison to the 3D grayscale structure
FGM tensor since motion information is more directly incor-
Object Segmentation
porated in the flux calculation which is less expensive
(Active Contours and
Shape−based Refinement)
than computing eigenvalue decompositions as with the
3D grayscale structure tensor.
FGR

Multi−object Tracking
(Correspondence Graphs) A. 3D Structure Tensors
Object Properties
& Trajectories Structure tensors are a matrix representation of partial
Cluster Trajectory Analysis
derivative information. As they allow both orientation
(Kalman Filter, Watershed) estimation and image structure analysis they have many
Refined Trajectories
applications in image processing and computer vision. 2D
structure tensors have been widely used in edge/corner
Figure 1: Multi-spectral data fusion system for persistent object tracking. detection and texture analysis, 3D structure tensors have
been used in low-level motion estimation and segmenta-
tion [7], [8].
Under the constant illumination model, the optic-flow
noise ratio, uncalibrated white-black polarity changes, (OF) equation of a spatiotemporal image volume I(x)
and ”halo effect” around hot or cold objects [6]. ”Hot centered at location x = [x, y, t] is given by Eq. 1 [9]
spot” techniques that detect moving objects by identifying where, v(x) = [vx , vy , vt ] is the optic-flow vector at x,
bright regions in infrared imagery are inadequate in the
general case, because the assumption that the objects dI(x) ∂I(x) ∂I(x) ∂I(x)
= vx + vy + vt
of interest, people and moving cars are much hotter dt ∂x ∂y ∂t
than the surrounding is not always true. The proposed = ∇IT (x) v(x) = 0 (1)
system illustrated in Figure 1 is summarized below. A
new efficient motion detection algorithm referred to as and v(x) is estimated by minimizing Eq. 1 over a local
the flux tensor is used to detect moving objects in 3D image patch Ω(x, y), centered at x. Note that vt is
infrared video without requiring background modeling not 1 since spatio-temporal orientation vectors will be
or contour extraction. The object segmentation algorithm computed. Using Lagrange multipliers, a corresponding
uses level set-based geodesic active contour evolution error functional els (x) to minimize Eq. 1 using a least-
that incorporates the fusion of visible color and infrared squares error measure can be written as Eq. 2 where
edge information. Touching or overlapping objects are W (x, y) is a spatially invariant weighting function (e.g.,
further refined during the segmentation process using an Gaussian) that emphasizes the image gradients near the
appropriate shape-based model. Multiple object tracking central pixel [8].
module resolves frame-to-frame object correspondences, Z
2
cluster trajectory analysis extension handles groups of els (x) = ∇IT (y) v(x) W (x, y) dy
Ω(x,y)
objects and occlusion events.  
T
This paper is organized as follows. In Section II motion +λ 1 − v(x) v(x) (2)
detection using a flux tensor framework is explored. In
Section III motion constrained object segmentation using Assuming a constant v(x) within the neighborhood
geodesic active contours is discussed. In Section IV edge Ω(x, y) and differentiating els (x) to find the minimum,
based video sensor fusion within a level set based active leads to the standard eigenvalue problem (Eq. 3) for
contour framework is explored. In Section V shape-based solving v̂(x) the best estimate of v(x),
refinement of the object masks is presented. In Section VI J(x, W) v̂(x) = λ v̂(x) (3)
multiple object tracking using correspondence graphs and
its extention to handle groups of objects and occlusion The 3D structure tensor matrix J(x, W) for the spa-
events are described. Section VII presents the results, tiotemporal volume centered at x can be written in
Section VIII offers the concluding remarks. expanded matrix form, without the spatial filter W (x, y)
and the positional terms shown for clarity, as Eq. 4.
II. M OTION D ETECTION U SING A F LUX T ENSOR  R ∂I ∂I R ∂I ∂I R ∂I ∂I
dy dy dy

Ω ∂x ∂x Ω ∂x ∂y Ω ∂x ∂t
F RAMEWORK  
 R ∂I ∂I R ∂I ∂I R ∂I ∂I 
Motion blob detection is performed using our novel J =  Ω ∂y ∂x
 dy Ω ∂y ∂y
dy Ω ∂y ∂t
dy 

flux tensor method which is an extension to 3D grayscale 
R ∂I ∂I R ∂I ∂I R ∂I ∂I

structure tensor. Both the grayscale structure tensor and Ω ∂t ∂x
dy Ω ∂t ∂y
dy Ω ∂t ∂t
dy
the proposed flux tensor use spatio-temporal consistency (4)

© 2007 ACADEMY PUBLISHER


22 JOURNAL OF MULTIMEDIA, VOL. 2, NO. 4, AUGUST 2007

A typical approach in motion detection is to threshold As seen from Eq. 10, the elements of the flux tensor incor-
trace(J) (Eq. 5); but this results in ambiguities in porate information about temporal gradient changes which
distinguishing responses arising from stationary versus leads to efficient discrimination between stationary and
moving features (e.g., edges and junctions with and with- moving image features. Thus the trace of the flux tensor
out motion), since trace(J) incorporates total gradient matrix which can be compactly written and computed as,
change information but fails to capture the nature of these Z

gradient changes (i.e. spatial only versus temporal). trace(JF ) = || ∇I||2 dy (11)
Ω ∂t
Z
trace(J) = ||∇I||2 dy (5) and can be directly used to classify moving and non-
Ω moving regions without the need for expensive eigenvalue
To resolve this ambiguity and to classify the video regions decompositions. If motion vectors are needed then Eq. 8
experiencing motion, the eigenvalues and the associated can be minimized to get v̂(x) using
eigenvectors of J are usually analyzed [10], [11]. How-
JF (x, W) v̂(x) = λ v̂(x) (12)
ever eigenvalue decompositions at every pixel is compu-
tationally expensive especially if real time performance is In this approach the eigenvectors need to be calculated at
required. just moving feature points.
B. Flux Tensors
C. Flux Tensor Implementation
In order to reliably detect only the moving structures
without performing expensive eigenvalue decompositions, To detect motion blobs, only the trace of flux tensor

∇I||2 dy needs to be computed.
R
the concept of the flux tensor is proposed. Flux tensor is trace(JF ) = Ω(y) || ∂t
the temporal variations of the optical flow field within the That requires computation of Ixt , Iyt and Itt and the
local 3D spatiotemporal volume. integration of squares of Ixt , Iyt , Itt over the area Ω(y).
Computing the second derivative of Eq. 1 with respect The following notation is adopted for simplicity:
to t, Eq. 6 is obtained where, a(x) = [ax , ay , at ] is the ∂2I ∂2I ∂2I
acceleration of the image brightness located at x. Ixt = , Iyt = , Itt = (13)
∂x∂t ∂y∂t ∂t∂t
∂ 2 I(x) ∂ 2 I(x) ∂ 2 I(x)
 
∂ dI(x)
= vx + vy + vt The calculation of the derivatives is implemented as
∂t dt ∂x∂t ∂y∂t ∂t2 convolutions with a filter kernel. By using separable
∂I(x) ∂I(x) ∂I(x) filters, the convolutions are decomposed into a cascade
+ ax + ay + at (6)
∂x ∂y ∂t of 1D convolutions. For numerical stability as well as
which can be written in vector notation as, noise reduction, a smoothing filter is applied to the dimen-
∂ ∂∇IT (x) sions that are not convolved with a derivative filter e.g.
(∇IT (x)v(x)) = v(x) + ∇IT (x) a(x) calculation of Ixt requires smoothing in y-direction, and
∂t ∂t
(7) calculation of Iyt requires smoothing in x-direction. Itt is
Using the same approach for deriving the classic 3D the second derivative in temporal direction; the smoothing
structure, minimizing Eq. 6 assuming a constant veloc- is applied in both spatial directions. As smoothing and
ity model and subject to the normalization constraint derivative filters, optimized filter sets presented by Scharr
||v(x)|| = 1 leads to Eq. 8, et. al. in [12], [13] are used.
2 The integration is also implemented as an averaging
∂(∇IT (y)
Z 
F
els (x) = v(x) W (x, y) dy filter decomposed into three 1D filters. As a result,
Ω(x,y) ∂t
  calculation of trace at each pixel location requires three
T
+λ 1 − v(x) v(x) (8) 1D convolutions for derivatives and three 1D convolutions
for averages in the corresponding spatio-temporal cubes.
Assuming a constant velocity model in the neighborhood A brute-force implementation where spatial and tem-
Ω(x, y), results in the acceleration experienced by the poral filters are applied for each pixel separately within
brightness pattern in the neighborhood Ω(x, y) to be zero a spatio-temporal neighborhood would be computation-
at every pixel. As with its 3D structure tensor counterpart ally very expensive since it would have to recalculate
J in Eq. 4, the 3D flux tensor JF using Eq. 8 can be the convolutions for neighboring pixels. For an efficient
written as implementation, the spatial (x and y) convolutions are
Z
∂ ∂
JF (x, W) = W (x, y) ∇I(x) · ∇IT (x)dy (9) separated from the temporal convolutions, and the 1D
Ω ∂t ∂t convolutions are applied to the whole frames one at a
and in expanded matrix form as Eq. 10. time. This minimizes the redundancy of computations and
∂2 I ∂2 I ∂2 I ∂2 I ∂2 I allows reuse of intermediate results.
2 R ˘ ¯2 R R 3
Ω ∂x∂t
dy Ω ∂x∂t ∂y∂t
dy Ω ∂x∂t ∂t2
dy
6
6 R
7
7 The spatial convolutions required to calculate Ixt , Iyt
∂2 I ∂2 I ∂2 I ∂2 I ∂2 I
¯2
and Itt are Ixs , Isy and Iss where s represents the
R ˘ R
JF = 6 Ω dy dy dy
6 7
∂y∂t ∂x∂t Ω ∂y∂t Ω ∂y∂t ∂t2
7
6
4
7
5 smoothing filter. Each frame of the input sequence is
∂2 I ∂2 I ∂2 I ∂2 I
R ˘ ∂ 2 I ¯2
first convolved with two 1D filters, either a derivative
R R
Ω ∂t2 ∂x∂t
dy Ω ∂t2 ∂y∂t
dy Ω ∂t2
dy
(10) filter in one direction and a smoothing filter in the other

© 2007 ACADEMY PUBLISHER


JOURNAL OF MULTIMEDIA, VOL. 2, NO. 4, AUGUST 2007 23

direction, or a smoothing filter in both directions. These the motion segmentation results prevents early stopping
intermediate results are stored as frames to be used in of the contour on local non-foreground edges.
temporal convolutions, and pointers to these frames are The motion constrained object segmentation process
stored in a First In First Out (FIFO) buffer of size is summarized in Algorithm 1, level set based geodesic
nF IF O = nDt + nAt − 1 where nDt is the length of active contours, computation of edge indicator functions,
the temporal derivative filter and nAt is the length of shape-based segmentation refinement process are elabo-
the temporal averaging filter. For each input frame, three rated in the Sections III-A, IV, and V respectively.
frames Ixs , Isy and Iss are calculated and stored. Once
nDt frames are processed and stored, FIFO has enough Algorithm 1 Object Segmentation Algorithm
frames for the calculation of the temporal derivatives Input : Visible image sequence IRGB (x , t), infrared image sequence
Ixt , Iyt and Itt . Since averaging is distributive over IIR (x , t), foreground mask sequence FGM (x , t) with NM (t)
2
addition, Ixt 2
+ Iyt 2
+ Itt is computed first and spatial regions
Output : Refined foreground (binary) mask sequence FGR (x , t) with
averaging is applied to this result and stored in the FIFO NR (t) regions
structure to be used in the temporal part of averaging.
1: for each time t do
Once flux tensor trace of nAt frames are computed, 2: Compute edge indicator functions gIR (x , t) and gRGB (x , t)
temporal averaging is applied. Motion mask FGM is from infrared IIR (x , t) and visible IRGB (x , t) images.
obtained by thresholding and post-processing averaged 3: Fuse gIR (x , t) and gRGB (x , t) into a single edge indicator
function gF (x , t).
flux tensor trace. Post-processing include morphological 4: Initialize refined mask, FGR (t) ← 0
operations to join fragmented objects and to fill holes. 5: Identify disjoint regions Ri (t) in FGM (t) using connected
component analysis.
6: for each region Ri (t) {i = 1, 2, ...NM (t)} in FGM (t) do
III. M OTION C ONSTRAINED O BJECT S EGMENTATION 7: Fill holes in Ri (t) using morphological operations.
8: Initialize geodesic active contour level sets Ci (t) using
U SING G EODESIC ACTIVE C ONTOURS
contour of Ri (t).
9: Evolve Ci (t) using gF (t) as edge stopping function.
Motion blob detection produces a coarse motion mask 10: Check stopping/convergence condition to subpartition
FGM as described in Section II-B. Each pixel of an Ri (t) = {Ri,0 (t), Ri,1 (t), ..., Ri,NR (t) (t)} into
i
infrared image frame IIR (x , t) is classified as moving NRi (t) ≥ 1 foreground regions and one background region
or stationary by thresholding trace of the corresponding Ri,0 (t) .
11: Refine mask FGR using foreground partitions as
flux tensor matrix trace(JF ) and a motion blob mask FGR = FGR ∪ Ri,j ; j = 1 : NRi (t)
FGM (t) is obtained. 12: end for// NM (t) regions
Two problems with motion blob detection are: (1) 13: end for// T f rames
holes: motion detection produces holes inside slow mov-
ing homogeneous objects, because of the aperture prob-
lem. (2) inaccurate object boundaries: motion blobs are
larger than the corresponding moving objects, because A. Level Set Based Geodesic Active Contours
these regions actually correspond to the union of the Active contours evolve/deform a curve C, subject to
moving object locations in the temporal window, rather constraints from a given image. Active contours are
than the region occupied in the current frame. Beside classified as parametric [15], [16] or geometric [17]–[19]
inaccurate object boundaries this may lead to merging according to their representation. Parametric active con-
of neighboring object masks and consequently to false tours (i.e. classical snakes) are represented explicitly as
trajectory merges and splits at the tracking stage. parametrized curves in a Lagrangian formulation, geomet-
Motion constrained object segmentation module refines ric active contours are implicitly represented as level sets
the coarse FGM (t) obtained through flux tensors by using [20] and evolve according to an Eulerian formulation [21].
fusion of multi-spectral image information and motion Main advantages of level set based active contours over
information, in a level set based geodesic active contours parametric active contours are computational simplicity
framework. Using mathematical morphology, holes in the and topological flexibility.
motion blobs (disjoint foreground regions in FGM ) are In this module level set based geodesic active contours
filled. Geodesic active contours are started from the mo- [19] are used to refine motion blobs obtained using flux
tion blob boundaries and evolved toward moving object tensor method (Section II-B). These contours are topolog-
boundaries defined by fusion of visible and infrared edges. ically flexible and are effectively tuned to edge/contour
Contour evolution results in tighter object contours information. Topological flexibility is critical in order
and separates most of the merged neighboring objects. to recover individual objects, because during the coarse
Obtained object masks are further refined using shape in- motion segmentation, neighboring objects may have been
formation. Since the geodesic active contour segmentation merged into a single motion blob.
relies on edges between background and foreground rather In level set based active contour methods the curve C
than the color or intensity differences such as in [14] , is represented implicitly via a Lipschitz function φ by
the method is more stable and robust across very dif- C = {(x, y)|φ(x, y) = 0}, and the evolution of the curve
ferent appearances, non-homogeneous backgrounds and is given by the zero-level curve of the function φ(t, x, y).
foregrounds. Starting the active contour evolution from Evolving C in a normal direction with speed F amounts

© 2007 ACADEMY PUBLISHER


24 JOURNAL OF MULTIMEDIA, VOL. 2, NO. 4, AUGUST 2007

to solving the differential equation [14], vector methods. In output fusion methods gray-scale edge
∂φ detection is carried out in each channel independently
= |∇φ|F ; φ(0, x, y) = φ0 (x, y) (14) then combined using methods such as weighted sum.
∂t
Multi-dimensional gradient methods estimate a single
In [17]–[19] geodesic active contour evolution is for- estimate of the orientation and strength of an edge at
mulated as Eq. 15 a point. In vector methods, color information is treated
∂φ as a vector through the whole process of edge detection
= gF (I)(c + K(φ))|∇φ| + ∇φ · ∇gF (I) (15)
∂t such as in the case of edge detection based on vector
where gF (I) is the fused edge stopping function (Eq. 27), order statistics [23], [24].
c is a constant, and K is the curvature term,
1) Tensor-based Color Edge Indicator Functions:
φxx φ2y − 2φx φy φxy + φyy φ2x
 
∇φ Color edge indicator functions in this work are based
K = div = 3
|∇φ| (φ2x + φ2y ) 2 on multi-dimensional gradient methods. The main
(16) issue in multi-dimensional gradient methods is the
The force (c + K) acts as the internal force in the combination of the individual channel gradients into a
classical energy based snake model. The constant velocity final multi-dimensional gradient. Simple methods use
c pushes the curve inwards or outwards depending on operations such as sum, weighted sum, max, min etc.
its sign (inwards in our case). The regularization term to produce the final multi-dimensional/color gradients.
K ensures boundary smoothness. The external image But the summation of the individual channel gradients
dependent force gF (I) (Section IV) is the fused edge discard the correlation between the channels and may
indicator function and is used to stop the curve evolution result in cancellation effects [25]. Pioneering work on
at visible or infrared object boundaries. The term ∇gF · how to combine the gradients of each channel is done by
∇φ introduced in [19] is used to increase the basin of DiZenzo [26] who considered the multi-channel image
attraction for evolving the curve to the boundaries of as a vector field and computed the tensor gradient.
the objects and to pull back the contour if it passes the Tensor based methods have since been used in various
boundary [21]. color feature detection algorithms. And many paper
The level set function φ is initialized with the signed study and extend scale and affine invariance [27], [28] or
distance function of the motion blob contours (FGM ) and photometric invariance [29] [30] of these features. More
evolved using the geodesic active contour speed function information on color features and multi-channel gradients
Eq. 15 with edge stopping function gF (I) which fuses can be found in [25] [22] [31]–[33]. Two types of tensors
information from visible and infrared imageries. Formu- particularly important for multi-dimensional/color
lation of edge stopping function and fusion of information gradients are 2D color structure tensor and Beltrami
from visible and infrared imageries are elaborated in the color metric tensor explored below.
next section.
2D Color Structure Tensor:
IV. V IDEO S ENSOR F USION C OMBINING V ISIBLE AND The 2D color structure tensor is defined in Eq.18. Many
I NFRARED E DGE I NFORMATION color feature detectors [34]–[36] can be related to the
Contour feature or edge indicator functions are used eigenvalues of the color structure tensor matrix JC (Eq.
to guide and stop the evolution of the geodesic active 19). Since these eigenvalues are correlated with the local
contour when it arrives at the object boundaries. The image properties of edgeness and cornerness i.e. λ1 >> 0,
edge indicator function is a decreasing function of the λ2 ≈ 0 and λ1 ≈ λ2 >> 0 respectively. Some local
image gradient that rapidly goes to zero along edges and descriptors that can be obtained from the two eigenvalues
is higher elsewhere. Eq. 19 derived from the 2D color structure tensor are
The magnitude of the gradient of the infrared image is summarized in Table I [25].
used to construct an edge indicator function gIR as shown
below where Gσ (x, y) ∗ IIR (x, y) is the infrared image   2 
X ∂Ii X ∂Ii ∂Ii
smoothed with a Gaussian filter,  ∂x ∂x ∂y 
i=R,G,B i=R,G,B
 
gIR (IIR ) = exp(−|∇Gσ (x, y) ∗ IIR (x, y)|) (17) 
JC = 

 (18)
2
   
Although the spatial gradient for single channel images


X ∂Ii ∂Ii X ∂Ii 

lead to well defined edge operators, edge detection ∂x ∂y ∂y
i=R,G,B i=R,G,B
in multi-channel images (i.e. color edge strength) is
not straight forward to generalize since gradients in
different channels can have inconsistent orientations. In 1
λ1,2 = (JC (1, 1) + JC (2, 2)
[22], Ruzon and Tomasi classify color edge detection p 2
algorithms into three categories based on when the ± (JC (1, 1) − JC (2, 2))2 + (2JC (1, 2))2 ) (19)
individual channel responses are fused: (1) output fusion
methods, (2) multi-dimensional gradient methods, and (3)

© 2007 ACADEMY PUBLISHER


JOURNAL OF MULTIMEDIA, VOL. 2, NO. 4, AUGUST 2007 25

λ1 + λ2 total local derivative energy


λ1 derivative energy in the most prominent direction
λ1 − λ2 line energy
λ2 amount of energy orthogonal to the prominent local
orientation

TABLE I.: Local descriptors that can be obtained from


the two eigenvalues λ1 , λ2 derived from the 2D color
structure tensor [25]. (a) Beltrami color features (b) Harris color features (k=0.5)

Beltrami Color Metric Tensor:


The Beltrami color metric tensor operator for a 2D color
image defines a metric on a two-dimensional manifold
{x, y, R(x, y), G(x, y), B(x, y)} in the five-dimensional
spatial-spectral space {x, y, R, G, B}. The color metric
tensor is defined in Eq. 20 and reformulated as a function (c) Shi-Tomasi color features (d) Cumani color features
of the 2D color structure tensor in Eq. 21 where I2 min(λ1 , λ2 ) max(λ1 , λ2 )
is the 2 × 2 identity matrix and JC is the 2D color
structure tensor. The magnitude of E can be considered Figure 2: Color features for frame #1256 obtained using
as a generalization of the gradient magnitude, and det(E) Beltrami, Harris, Shi-Tomasi, and Cumani operators.
can be taken as the edge indicator function [37]–[39].

  2 
X ∂Ii X ∂Ii ∂Ii The Cumani operator (Eq. 26) [36] responds nicely to
 1+ ∂x ∂x ∂y  both edges and corners.
i=R,G,B i=R,G,B
 
 
E = 
Cumani(IRGB ) = max(λ1 , λ2 ) (26)
X  ∂Ii 2 
 
 X ∂Ii ∂Ii
 1+ 
∂x ∂y ∂y Although any robust accurate color edge response
i=R,G,B i=R,G,B
(20) function can be used, the determinant of the Beltrami
color tensor [39] has been selected for our application
E = I2 + JC (21) based on its robustness and speed. The best operators
for the geodesic active contour edge stopping functions
The Beltrami color edge stopping function can then be will respond to all salient contours in the image. In our
defined as, experiments, as shown in Figure 2, the Beltrami color
(edge) features was the most suitable function and is
gRGB (IRGB ) = exp(−abs(det(E))) (22)
fast to compute. The Harris operator misses some salient
det(E) = Beltrami(IRGB ) contours around the pedestrians, the Shi-Tomasi operator
= 1 + trace(JC ) + det(JC ) (23) responds primarily to corners and is not suitable as an
= 1 + (λ1 + λ2 ) + λ1 λ2 edge stopping function, the Cumani operator produces
a contour map that is nearly the same as the Beltrami
where λ1 and λ2 are the eigenvalues of JC .
color metric tensor map but is slightly more expensive to
compute.
Comparison of Color Feature Detectors:
2) Fusion of Infrared and Color Edge Indicator Func-
Several common color (edge/corner) feature indicator
tions: Edge indicator functions obtained from infrared im-
functions were evaluated for comparison purposes, in-
agery and visible color imagery can be fused using various
cluding Harris [34], Shi-Tomasi [35], Cumani [36] feature
methods such as weighted sum or tensor-based fusion (i.e.
detectors and determinant of the Beltrami metric tensor
infrared imagery can be considered as a fourth channel
[39].
and the metric tensor in Eq. 18 can be defined in the six-
The Harris operator (Eq. 24) [34] uses the parameter k
dimensional spatial-spectral space {x, y, R, G, B, IR}).
to tune edge versus corner responses (i.e. k → 0 responds
The fusion should satisfy two conditions: (1) fused edge
primarily to corners).
stopping function gF (x, y) should respond to the strongest
Harris(IRGB ) = det(JC ) − k trace2 (JC ) edge at location (x, y) in either IR or visible imagery,
(24)
= λ1 λ2 − k(λ1 + λ2 )2 and (2) infrared imagery should have more weight in
The Shi-Tomasi operator (Eq. 25) [35] responds strongly the final decision than any single channel of the visible
to corners and filters out most edges (since one of the imagery, since moving infrared edges are highly salient
eigenvalues is nearly zero along edges). This is not suit- for tracking. In order to satisfy these conditions and not
able for a geodesic active contour edge stopping function. to miss any infrared edges, independent of the gradients
in the visible channels, the min statistic Eq. 27 is used as
ShiT omasi(IRGB ) = min(λ1 , λ2 ) (25) the fusion operator. The fused edge stopping function gF

© 2007 ACADEMY PUBLISHER


26 JOURNAL OF MULTIMEDIA, VOL. 2, NO. 4, AUGUST 2007

M1 ⊕ se2 M2 ⊕ se2 M1 ⊕ se2 M2 ⊕ se2


is defined as the minimum of the two normalized (0, 1) R R
edge stopping functions gIR (x, y), and gRGB (x, y), R1 R2

M1 M2

gF (IR, RGB, x, y) = min{gIR (x, y), gRGB (x, y)}


(27)
This fusion method ensures that the curve evolution Separator
stops where there is an edge in the visible imagery or in
the infrared imagery. min fusion operator handles cases Figure 3: Shape-based segmentation refinement. R: original region/cluster,
M1,2 : marker regions obtained by eroding R, M1,2 ⊕ se2 : dilated markers,
where the visible RGB appearance of the moving object R1,2 : final segmentation.
is similar to the background but there is a distinct infrared
signature, and when the background and foreground have
similar infrared signatures but distinct visible appear- Each disjoint region R of the refined mask FGR is
ances. eroded and a set of markers (disjoint regions) is obtained.
If R is a compact ellipse like region the erosion results
in a single marker, no further processing is done and R
V. S HAPE - BASED S EGMENTATION R EFINEMENT is returned intact. If R contains narrow regions possibly
Evolving geodesic active contours from motion blob corresponding to locations where people join, erosion
boundaries toward object boundaries defined by fusion removes those regions and the process results in n > 1
of visible and infrared edges results in tighter object marker regions. R is partitioned using a process similar to
contours and separates most of the merged neighboring morphological reconstruction. An n-layer image is allo-
objects. But non-foreground edges such as background cated and each marker Mi is assigned to a layer. Similar
edges or shadow/illumination boundaries may result in to morphological reconstruction, successive conditional
early stopping of the active contours. Or close neighboring dilations (Eq. 28) are applied to the marker regions Mi ’s
objects may lack non-edge spacing between them where (dilation is constrained to lie within the initial mask
the active contour can move. In such cases geodesic active F GR ). While in reconstruction the process is stopped
contours result in under-segmentation of object clusters. when the dilations cease to change, here the process
Shape-based segmentation refinement breaks clusters in is stopped when the dilated markers overlap with each
the refined foreground mask FGR based on shape infor- other and the overlap regions are identified as partition
mation. Shape-based refinement relies on an ”ellipse like” separators.
shape model for people. Regions where people join are Pi = (Mi ⊕ se2 ) ∩ F GR (28)
narrower compared to the size of the individual people Unlike morphological opening this reconstruction oper-
that form the group/cluster. ation separates the objects but does not deteriorate their
The process is illustrated in Figure 3, summarized in shapes. To avoid false fragmentation, validity of the parti-
Algorithm 2 and elaborated in the following paragraphs. tioning is checked using an heuristic based on maximum,
minimum, and relative sizes of the obtained partitions.
Algorithm 2 Shape-based Segmentation Refinement Al-
gorithm VI. M ULTI - OBJECT T RACKING U SING O BJECT
Input : R ∈ FGR (x , t) // Disjoint region in refined foreground mask C ORRESPONDENCE G RAPHS
Output : Ri ’s (R = ∪i=1:n Ri , n ≥ 1) // Sub-regions in R
Moving object tracking is a fundamental step in the
1: RE = R ⊖ se1 // erode the region
2: L = Label(RE ) // apply connected component labeling analysis of object behaviors [40]. Persistence in track-
3: Mi ← Disjoint Regions in RE , RE = ∪Mi // disjoint regions in ing is of particular importance for long term behav-
RE will be used as markers ior analysis. Moving objects can be tracked implicitly
4: if if (#Mi′ s ≤ 1) then
i.e. through contour evolution [41] or explicitly through
5: Return R // keep R intact
6: else correspondence analysis. Active contour-based tracking
7: for each Mi ∈ RE do methods track objects by evolving their contours dy-
8: Pi = (Mi ⊕ se2 ) ∩ F GR , se2 > se1 // conditional dilation namically in successive frames [40]. They provide an
of the markers/parts
9: end for P
effective description of the tracked objects, but are highly
10: Rrec = Pi sensitive to initialization and unless an additional re-
11: Separators ← 0 initialization step such (e.g. feature-based re-initialization
12: Separators(Rrec > 1) ← 1 // overlapped regions are in [42]) is included to the process, they can not handle
locations where the object should be splitted
13: Split ← V alidateSplit(R, Separators) large displacements. Therefore, while active contours are
14: if if (Split == 0) then used in the object segmentation stage (Section III), the
15: Return R // keep R intact proposed system adopts an explicit region correspondence
16: else based tracking algorithm that is able to handle larger
17: R(Separators == 1) ← 0
18: Ri ← Disjoint Regions in R object displacements.
19: Return Ri ’s The tracking module outlined in Algorithm 3 is an
20: end if extension of our previous works in [43]–[45]. Object-to-
21: end if
object matching (correspondence) is performed using a

© 2007 ACADEMY PUBLISHER


JOURNAL OF MULTIMEDIA, VOL. 2, NO. 4, AUGUST 2007 27

Algorithm 3 Tracking Algorithm Algorithm 4 Cluster Trajectory Analysis


Input : Image sequence I(x , t), and refined foreground mask se- Input : Merged trajectory segment TM, parent and child trajectory
quence FGR (x , t) segments TP(1: np ), TC(1: np ).
Output : Trajectories and Temporal Object Statistics Output : Individual trajectory segments TS(1: np ), updated parent
and child trajectory segments TP(1: np ), TC(1: np ).
1: for each frame I(x , t) at time t do
2: Use the refined foreground mask, FGR (t) from the motion 1: Initialize state matrix X(t0 ) consisting of position and velocity
constrained object segmentation module. informations for each individual trajectory segments TS(1: np )
3: Partition FGR (t) into disjoint regions using connected com- using the parent segments’ states, X(t0 ) ← TP(1: np ). states
ponent analysis FGR (t) = {R1 (t), R2 (t), . . . , RNR (t)} that 2: for each node TM. node(t) of the merged segment do
ideally correspond to NR individual moving objects. 3: Predict using Kalman filter [48] X̂(t), the estimated state
4: for each region Ri (t) {i = 1, 2, ...NR (t)} in FGR (t) do matrix of the trajectory segments TS(1: np ) corresponding to
5: Extract blob centroid, area, bounding box, support map etc. individual objects in the object cluster, from the previous states
6: Arrange region information in an object correspondence X(t − 1).
graph OGR . Nodes in the graph represent objects Ri (t), 4: Project masks of the sub-nodes TS(1: np ). node(t-1) from the
while edges represent object correspondences. previous frame, to the predicted positions on the current merged
7: Search for potential object matches in consecutive frames node TM. node(t), and use these projected masks as markers
using multi-stage overlap distance DMOD . for the watershed segmentation.
8: Update OGR by linking nodes that correspond to objects in 5: Using watershed segmentation algorithm and the markers
frame I(x , t) with nodes of potential corresponding objects obtained through motion compensated projections of the
in frame I(x , t − 1). Associate the confidence value of each sub-nodes from the previous frame, segment the merged
match, CM (i, j) with each link. node TM. node(t) corresponding to a cluster of objects
9: end for into a set of sub-nodes corresponding to individual objects
10: end for TS(i). node(t), i = 1 : np .
11: Trace links in the object correspondence graph OGR to generate 6: Use refined positions of the sub-nodes obtained after watershed
moving object trajectories. segmentation to update the corresponding states X(t).
7: end for
8: Update object correspondence graph OGR by including sub-node
informations such as new support maps, centroids, areas etc.
multi-stage overlap distance DMOD , which consists of 9: Update individual trajectory segments’s parent and children links
(TS(1: np ). parents,TS(1: np ). children), parent segments’ chil-
three distinct distance functions for three different ranges dren links (TP(1: np ). children), and children segments’ parent
of object motion as described in [44]. links (TC(1: np ). parents) by matching TSs to TPs and TCs.
Correspondence information is arranged in an acyclic 10: Propagate parent segments’ labels to the associated sub-
segments, which are subsequently propagated to their children
directed graph OGR . Trajectory-Segments are formed by (TP(1: np ). label → TS(1: np ). label → TC(1: np ). label ).
tracing the links of OGR and grouping ”inner” nodes
that have a single parent and a single child. For each
Trajectory-Segment, parent and children segments are
where np = nc are considered.
identified and a label is assigned. Segment labels encap-
Most occlusion resolution methods rely heavily on
sulate connectivity information and are assigned using a
appearance. But for far-view video, elaborate appearance-
method similar to connected component labeling. Events
based models cannot be used since objects are small and
such as appearance, disappearance, split and merge are
not enough support is available for such models. Predic-
also identified in this stage based on the number of parent
tion and cluster segmentation is used to recover individual
and child segments.
trajectories. Rather than predicting individual trajectories
Trajectory segment generation is followed by trajectory
for the merged objects from the parent trajectories alone,
segment filtering and cluster trajectory analysis. Trajec-
at each time step, object clusters are segmented, new
tory segment filtering prunes spurious trajectory segments
measurements are obtained, and object states are updated.
based on features such as object size, trajectory length,
This reduces error accumulation particularly for long
displacement, and duration, or other spatio-temporal mea-
lasting merges that become more frequent in persistent
sures [43]. Factors such as under-segmentation, group
object tracking.
interactions, partial or full occlusions result in temporary
Segmentation of the object clusters into individual
merging of individual object trajectories. Cluster trajec-
objects is done using a marker-controlled watershed seg-
tory analysis module resolves these merge-split events
mentation algorithm applied to the object cluster masks
where np parent trajectory segments TPs, temporarily
[46], [47]. The use of markers prevents over-segmentation
merge into a single trajectory segment TM, then split into
and enables incorporation of segmentation results from
nc child trajectory segments TCs, and recovers individual
the previous frame. The cluster segmentation process
object trajectories TS s. Currently only symmetric cases
is shown in Figure 4 and the overall cluster trajectory
analysis process is summarized in Algorithm 4, where X
parent − nodes
X̂(t) markers(t) indicates state matrix that consists of a temporal sequence
X(to ) Predict
(Kalman Filter)
Project Segment Merged Node
(Watershed Transform)
of position and velocity information for each individual
X(t − 1) Sub−node Masks
object segment, and X̂ indicates estimated state matrix.
Update sub − nodes(t)
VII. R ESULTS AND A NALYSIS
Figure 4: Cluster segmentation using Kalman filter and watershed segmenta- The proposed system is tested on thermal/color video
tion. sequence pairs from OTCBVS dataset collection [49].

© 2007 ACADEMY PUBLISHER


28 JOURNAL OF MULTIMEDIA, VOL. 2, NO. 4, AUGUST 2007

Figure 5: Flux tensor based motion detection results. Top three rows: visible image, IR image, flux tensor trace for
OTCBVS benchmark sequence 3:1 frames #8,#432,#836,#1254,#1740. Flux tensor based motion detection results.
Bottom three rows: visible image, IR image, flux tensor trace for OTCBVS benchmark sequence 3:4 frames
#20,#124,#294,#1722,#3000. Averaging window size nAt = 7, derivative filter length nDt = 5. Flux response is
scaled for improved print visibility.

Data consists of 8-bit grayscale bitmap thermal images, Figure 6 shows different moving object detection re-
and 24-bit color bitmap images of 320 x 240 pixels. Im- sults. MoG refers to the background estimation and
ages were sampled approximately at 30Hz. and registered subtraction method by mixture of Gaussians [50] [51].
using homography with manually-selected points. Ther- Flux refers to the flux tensor method presented in Section
mal sequences were captured using a Raytheon PalmIR II. The parameters for the mixture of gaussians (MoG)
250D sensor, color sequences were captured using a Sony method are selected as follows: number of distributions
TRV87 Handycam. K = 4, distribution match threshold Tmatch = 2.0,
Figure 5 shows flux tensor trace for sample frames in background threshold T = 70%, learning rate α = 0.02.
OTCBVS sequences 3:1 and 3:4. While both IR image The parameters for the flux tensor method use a neigh-
sequences particularly sequence 3:1 contain non-moving borhood size W = 9, and trace threshold T = 4. Visible
hot/bright regions such as part of the ground, roof tops, imagery (Figure 6c) is very sensitive to moving shadows
windows, lamp post etc., flux tensor trace successfully and illumination changes. Shadows (Figure 6c row 3)
identifies only the moving pedestrians. Higher responses can alter object shapes and can result in false detections.
are produced for the object boundaries, and inside of the Illumination changes due to cloud movements covers a
larger objects contain holes due to aperture problem. large portion of the ground (Figure 6c row 4) which

© 2007 ACADEMY PUBLISHER


JOURNAL OF MULTIMEDIA, VOL. 2, NO. 4, AUGUST 2007 29

(a) Original image RGB (b) Original image IR (c) MoG on RGB (d) MoG on IR (e) Flux on IR

Figure 6: Moving object detection results for OTCBVS benchmark sequence 3:1. Top to bottom frames #120,
#400,#1048,#1256.

(a) (b) (c) (d) (e) (f)

Figure 7: Shape-based segmentation refinement results for OTCBVS benchmark sequence 3:1 frames #408 (top) and
#1528 (bottom). a) edge stopping function, b) contours before shape-based refinement, green: flux tensor results, red:
contours refined using active contours, c) contours after shape-based refinement, green: flux tensor results, red: contours
refined using active contour evolution + shape-based refinement, d) mask after active contour evolution, e) mask after
active contour evolution + shape based refinement, f) partition separators.

results in many false moving object detections, making it integrates temporal information from isotropic spatial
detection and tracking of pedestrians nearly impossible. neighborhoods.
As can be seen from Figures 6d,e infrared imagery is less
sensitive to illumination related problems. But infrared Figure 7 illustrates two sample cases where the shape-
imagery is more noisy compared to visible imagery and based segmentation refinement splits the pedestrian clus-
suffers from ”halo” effects (Figure 6d). The flux tensor ters, under-segmented during active contour evolution. In
method (Figure 6e) produces less noisy and more compact the first case (top row), the two pedestrians walking side-
foreground masks compared to pixel based background by-side have no spacing between them where the geodesic
subtraction methods such as MoG (Figure 6d), since active contour can move. In the second case (bottom row),
the geodesic active contour stops early because of the

© 2007 ACADEMY PUBLISHER


30 JOURNAL OF MULTIMEDIA, VOL. 2, NO. 4, AUGUST 2007

(a) (b)
(a) gRGB

(c) (d)

(b) gIR

(e) (f)

Figure 8: Level set based active contour evolution for


(c) gF usion
OTCBVS benchmark sequence 3:1 frame #1256. Left:
Level set function φ(x, t), horizontal plane is zero level. Figure 9: Edge indicator functions for OTCBVS bench-
Right: Segmentation result superimposed on the original mark sequence 3:1 frame #1256.
image. Red and Green channels: original image, Blue
channel: fused edge stopping function gF (x, y), Red
contour: level set contour (φ(x, y) = 0).
which is critical for persistent object tracking. When
used alone, both visible and infrared video result in total
or partial loss of moving objects (i.e. top left person
background edges. in Figure 10b due to low color contrast compared to
Figure 8 shows evolution of level set function φ(x, y) background, parts of top right person and legs in Figure
and the resulting contour during object segmentation. 10c due to lack of infrared edges). A low level fusion of
Level set evolution moves the contour from the motion the edge indicator functions shown in Figure 10d results
blob boundaries inwards toward actual object boundaries in a more complete mask, compared to just combining
defined by fusion of visible and infrared images. The visible and infrared foreground masks (i.e. legs of top
process results in tighter object boundaries, and individual right and bottom persons). Figure 11 illustrates the effects
objects. of contour refinement and merge resolution on object
Figure 9 shows visible,IR and fused edge indicator trajectories. Level set based geodesic active contours can
functions used in the object segmentation process. Figure separate clusters caused by under-segmentation (Figure
10 illustrates effects of contour refinement and fusion of 11a) but cannot segment individual objects during oc-
visible and infrared information. Level set based geodesic clusions (Figure 11b). In those cases merge resolution
active contours refine object boundaries and segment recovers individual trajectories using prediction and pre-
object clusters into individual objects or smaller clusters vious object states (Figure 11 second row). In occlusion

© 2007 ACADEMY PUBLISHER


JOURNAL OF MULTIMEDIA, VOL. 2, NO. 4, AUGUST 2007 31

(a) IR Flux (b) Visible

(a)

(c) Infrared (d) Fusion (b) (c)

Figure 10: (a) Motion blob #2 in frame #1256 using IR Figure 11: Merge resolution. Left to right: frames #41,
flux tensors. Refinement of blob #2 using (b) only visible #56, and #91 in OTCBVS benchmark sequence 3:4. Top
imagery, (c) only infrared imagery, (d) using fusion of row: motion constrained object extraction results. Flux
both visible and IR imagery. tensor results are marked in green, refined contours are
marked in red. Bottom row: object trajectories after merge
resolution.
events no single parameter (i.e. color, size, shape etc.) can
consistently resolve ambiguities in partitioning as evident
in Figure 11b first row. infrared edge information in a novel manner and refines
the initial motion mask. Touching or overlapping objects
VIII. C ONCLUSION are further refined during the segmentation process using
This paper presented a moving object detection and an appropriate shape-based model.
tracking system based on the fusion of infrared and visible Multiple object tracking using correspondence graphs
imagery for persistent object tracking. Outdoor surveil- is extended to handle groups of objects and occlusion
lance applications require robust systems due to wide area events by Kalman filter-based cluster trajectory analysis
coverage, shadows, cloud movements and background and watershed segmentation. The proposed object track-
activity. The proposed system fuses the information from ing algorithm was successfully tested on several difficult
both visible and infrared imagery within a geodesic active outdoor multispectral videos from stationary sensors and
contour framework to achieve this robustness. is not confounded by shadows or illumination variations.
A new efficient motion detection algorithm referred to
as the flux tensor is used to detect moving objects in
infrared video without requiring background modeling or R EFERENCES
contour extraction. The flux tensor-based motion detector
when applied to infrared video is more accurate than [1] J. Zolper, “Integrated microsystems: A revolution on
thresholding ”hot-spots”, and is insensitive to shadows as five frontiers,” in Proc. of the 24th DARPA-Tech Conf.,
well as illumination changes in the visible channel. The Anahiem, CA., Aug. 2005.
[2] Z. Lemnios and J. Zolper, “Informatics: An opportunity
novel flux tensor also produces less noisy and more spa-
for microelectronics innovation in the next decade,” IEEE
tially coherent results compared to classical pixel based Circuit & Devices, vol. 22, no. 1, pp. 16–22, Jan. 2006.
background subtraction methods. The object segmentation [3] G. Seetharaman, A. Lakhotia, and E. Blasch, “Unmanned
algorithm uses level set-based geodesic active contour vehicles come of age: The DARPA grand challenge,”
evolution that incorporates the fusion of visible color and Special Issue of IEEE Computer, pp. 32–35, Dec. 2006.

© 2007 ACADEMY PUBLISHER


32 JOURNAL OF MULTIMEDIA, VOL. 2, NO. 4, AUGUST 2007

[4] W. Brown, R. Kaehr, and D. Chelette, “Finding and track- Anal. and Mach. Intell., vol. 23, no. 11, pp. 1281–1295,
ing targets: Long term challenges,” Air Force Research 2001.
Technology Horizons, vol. 5, no. 1, pp. 9–11, 2004. [23] P. E. Trahanias and A. N. Venetsanopoulos, “Vector
[5] F. Bunyak, K. Palaniappan, S. Nath, and G. Seetharaman, order statistics operators as color edge detectors,” IEEE
“Geodesic active contour based fusion of visible and Transactions on Systems, Man, and Cybernetics-PartB:
infrared video for persistent object tracking,” in 8th IEEE Cybernetics, vol. 26, no. 1, pp. 135–143, Feb. 1996.
Workshop on Applications of Computer Vision (WACV [24] Hai Tao and Thomas S. Huang, “Color image edge
2007), Feb. 2007. detection using cluster analysis,” in IEEE Int. Conf. on
[6] J. Davis and V. Sharma, “Background-subtraction in Image Processing (ICIP’97), 1997, pp. 834–836.
thermal imagery using contour saliency,” Int. Journal of [25] H. Stokman T. Gevers, J. Weijer, “Color feature detection,”
Computer Vision, vol. 71, no. 2, pp. 161–181, 2007. in Color Image Processing: Methods and Applications,
[7] S. Nath and K. Palaniappan, “Adaptive robust structure ten- R. Lukac and K.N. Plataniotis, Eds. CRC Press, 2006.
sors for orientation estimation and image segmentation,” [26] Silvano Di Zenzo, “A note on the gradient of a multi-
in LNCS-3804: Proc. ISVC’05, Lake Tahoe, Nevada, Dec. image,” Computer Vision, Graphics, and Image Process-
2005, pp. 445–453. ing, vol. 33, no. 1, pp. 116–125, 1986.
[8] H.H. Nagel and A. Gehrke, “Spatiotemporally adaptive [27] K. Mikolajczyk and C. Schmid, “Scale & affine invariant
estimation and segmentation of OF-Fields,” in LNCS-1407: interest point detectors,” International Journal on Com-
ECCV98, Germany, June 1998, vol. 2, pp. 86–102. puter Vision, vol. 60, no. 1, pp. 63–86, 2004.
[9] B.P. Horn and B.G. Schunck, “Determining optical flow,” [28] Nicu Sebe, Theo Gevers, S. Dijkstra, and Joost van de
Artificial Intell., vol. 17, no. 1-3, pp. 185–203, Aug. 1981. Weijer, “Evaluation of intensity and color corner detectors
[10] K. Palaniappan, H. Jiang, and T.I. Baskin, “Non-rigid mo- for affine invariant salient regions,” in Beyond Patches
tion estimation using the robust tensor method,” in IEEE Workshop Interaction at CVPR, 2006.
Comp. Vision and Patt. Recog. Workshop on Articulated [29] J. van de Weijer, Th. Gevers, and J.M. Geusebroek, “Edge
and Nonrigid Motion, Washington, DC, June 2004. and corner detection by photometric quasi-invariants,”
[11] J. Zhang, J. Gao, and W. Liu, “Image sequence segmenta- IEEE Trans. Pattern Analysis and Machine Intelligence,
tion using 3-D structure tensor and curve evolution,” IEEE vol. 27, no. 4, April 2005.
Trans. Circuits Syst. Video Technol., vol. 11, no. 5, pp. [30] J. van de Weijer, Th. Gevers, and A.W.M. Smeulders,
629–641, May 2001. “Robust photometric invariant features from the color
[12] H. Scharr, “Optimal filters for extended optical flow,” in tensor,” IEEE Trans. Image Proces., vol. 15, no. ’1, 2006.
LNCS: First Int. Workshop on Complex Motion, Berlin, [31] C. Schmid, R. Mohr, and C. Bauckhage, “Evaluation
Germany, Oct. 2004, vol. 3417, pp. 66–74, Springer- of interest point detectors,” International Journal of
Verlag. Computer Vision, vol. 37, no. 2, pp. 151–172, 2000.
[13] H. Scharr, I. Stuke, C. Mota, and E. Barth, “Estimation [32] A. Koschan and M. Abidi, “Detection and classification
of transparent motions with physical models for additional of edges in color images,” Signal Processing Magazine,
brightness variation,” in 13th European Signal Processing Special Issue on Color Image Processing, vol. 22, no. 1,
Conference, EUSIPCO, 2005. pp. 64–73, 2005.
[14] T. Chan and L. Vese, “Active contours without edges,” [33] S. K. Naik and C. A. Murthy, “Standardization of edge
IEEE Trans. Image Proc., vol. 10, no. 2, pp. 266–277, magnitude in color images,” IEEE Trans. on Image
Feb. 2001. Processing, vol. 15, no. 9, pp. 2588– 2595, Sep. 2006.
[15] M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: Active [34] C. Harris and M. Stephens, “A combined corner and edge
contour models,” Journal International Journal of Com- detector,” in Proc. 4th Alvey Vision Conf., Manchester,
puter Vision, vol. 1, no. 4, pp. 321–331, January 1988. 1988, vol. 15, pp. 147–151.
[16] L. D. Cohen, “On active contour models and balloons,” [35] Jianbo Shi and Carlo Tomasi, “Good features to track,”
Computer Vision, Graphics, and Image Processing. Image in IEEE Conf. on Comp. Vis. and Patt. Recog.(CVPR),
Understanding, vol. 53, no. 2, pp. 211–218, 1991. Seattle, June 1994.
[17] V. Caselles, F. Catte, T. Coll, and F. Dibos, “A geometric [36] A. Cumani, “Edge detection in multispectral images,”
model for active contours,” Numerische Mathematik, vol. CVGIP: Graphical Models and Image Processing, vol. 53,
66, pp. 1–31, 1993. no. 1, 1991.
[18] R. Malladi, J.A. Sethian, and B. Vemuri, “Shape modelling [37] A. Brook, R. Kimmel, and N.A. Sochen, “Variational
with front propagation:A level set approach,” IEEE Trans. restoration and edge detection for color images,” Journal
Patt. Anal. Mach. Intell., vol. 17, no. 2, pp. 158–174, 1995. of Mathematical Imaging and Vision, vol. 18, no. 3, pp.
[19] V. Caselles, R. Kimmel, and G. Sapiro, “Geodesic active 247–268, May 2003.
contours,” Int. Journal of Computer Vision, vol. 22, no. 1, [38] N.A. Sochen, R. Kimmel, and R. Malladi, “A general
pp. 61–79, 1997. framework for low-level vision,” IEEE Trans. Image Proc.,
[20] J.A.Sethian, Level Set Methods and Fast Marching vol. 7, no. 3, pp. 310–318, 1998.
Methods: Evolving Interfaces in Computational Geometry, [39] R. Goldenberg, R. Kimmel, E. Rivlin, and M. Rudzsky,
Fluid Mechanics, Computer Vision, and Materials Science, “Fast geodesic active contours,” IEEE Trans. Image Proc.,
Cambridge University Press, Cambridge, UK, 1999, ISBN vol. 10, no. 10, pp. 1467–1475, Oct 2001.
0-521-645573-3. [40] W. Hu, T.N. Tan, L. Wang, and S.J. Maybank, “A survey
[21] C. Xu, A. Yezzi, and J. Prince, “A summary of geometric on visual surveillance of object motion and behaviors,”
level-set analogues for a general class of parametric active IEEE Trans. on Systems, Man, and Cybernetics - Part C,
contour and surface models,” in Proc. IEEE Workshop vol. 34, no. 3, pp. 334–352, August 2004.
on Variational and Level Set Methods in Computer Vision, [41] N. Paragios and R. Deriche, “Geodesic active regions
2001, pp. 104 –111. and level set methods for motion estimation and tracking,”
[22] M.A. Ruzon and C. Tomasi, “Edge, junction, and corner Computer Vision and Image Understanding, vol. 97, no.
detection using color distributions,” IEEE Trans. Patt. 3, pp. 259–282, March 2005.

© 2007 ACADEMY PUBLISHER


JOURNAL OF MULTIMEDIA, VOL. 2, NO. 4, AUGUST 2007 33

[42] A. Knoll G. Panin, “Fully automatic real-time 3d object Northern Research, Bell Canada, Preussen Elektra Germany,
tracking using active contour and appearance models,” the Canadian Ministry of Environment, and Ontario Hydro. His
Journal of Multimedia 2006, vol. 1, no. 7, pp. 62–70, 2006. research interests include satellite image analysis, biomedical
[43] F. Bunyak and S. R. Subramanya, “Maintaining trajectories imaging, video tracking, level set-based segmentation, nonrigid
of salient objects for robust visual tracking,” in LNCS- motion analysis, scientific visualization, and content-based im-
3212: Proc. ICIAR’05, Toronto, Sep. 2005, pp. 820–827. age retrieval. Dr. Palaniappan received the highest teaching
[44] F. Bunyak, K. Palaniappan, S.K. Nath, T.I. Baskin, and award given by the University of Missouri, the William T.
G. Dong, “Quantitive cell motility for in vitro wound Kemper Fellowship for Teaching Excellence in 2002, the Boeing
healing using level set-based active contour tracking,” in Welliver Faculty Fellowship in 2004, the University Space
Proc. 3rd IEEE Int. Symp. Biomed. Imaging (ISBI), pp. Research Association Creativity and Innovation Science Award
1040–1043. Arlington, VA, April 2006. (1993), the NASA Outstanding Achievement Award (1993), the
[45] S. K. Nath, F. Bunyak, and K. Palaniappan, “Robust Natural Sciences and Engineering Research Council of Canada
tracking of migrating cells using four-color level set seg- scholarship (19821988), and the NASA Public Service Medal
mentation,” in LNCS-3212: Proc. ACIVS’05, Antwerp, (2001) for pioneering contributions to scientific visualization
Belgium, Sep. 2006, vol. 3212. and analysis tools for understanding petabyte-sized archives of
[46] L. Vincent and P. Soille, “Watersheds in digital spaces: NASA datasets.
an efficient algorithm based on immersion simulations,” Sumit Nath received his M.Tech degree from the Indian
IEEE Trans. Patt. Anal. Mach. Intell., vol. 13, no. 6, pp. Institute of Science, Bangalore, India in 1998 and his Phd
583–598, 1991. from the University of Ottawa, Canada, 2004. He joined Dr.
[47] S. Beucher and F. Meyer, “The morphological ap- K.Palaniappan’s group at the University of Missouri-Columbia
proach to segmentation: the watershed transformation,” in as a Post-Doc researcher in 2004. Currently, he is a pursuing
Math. Morph. and its Applications to Image Proc., E.R. his post-doc research at the Electrical Computer and Systems
Dougherty, Ed., pp. 433–481. Marcel Dekker, NY, 1993. Engineering Department at Rensselaer Polytechnic Institute,
[48] Y. Bar-Shalom, X.R. Li, and T. Kirubarajan, Estimation Troy. He is primarily interested in all aspects of computer vision,
with Applications to Tracking and Navigation: Theory, image processing and compression, with special emphasis on
Algorithms, and Software, John Wiley & Sons, Inc., 2001. level sets, segmentation, optic flow estimation, Voronoi diagrams
[49] J. Davis and V. Sharma, “Fusion-based background- and Tomography.
subtraction using contour saliency,” in IEEE Int. Workshop Guna Seetharaman has been an associate professor of
on Object Tracking and Classification Beyond the Visible Electrical and Computer Engineering at Air Force Institute
Spectrum, San Diego, CA, June 2005. of Technology, since June 2003. He was with The Center
[50] C. Stauffer and E. Grimson, “Learning patterns of activity for Advanced Computer Studies, University of Louisiana at
using real-time tracking,” IEEE Trans. Pattern Anal. and Lafayette, until 2003, as an associate professor of computer
Machine Intel., vol. 22, no. 8, pp. 747–757, 2000. engineering. He was a CNRS Research Visiting Professor at
[51] X. Zhuang, Y. Huang, K. Palaniappan, and Y. Zhao, the Institute for Electronics Fundamentals, University of Paris
“Gaussian mixture density modeling, decomposition and XI, Orsay, on sabbatical and short-term fellowships.
applications,” IEEE Trans. Image Proc., vol. 5, no. 9, pp. Dr. Seetharaman earned his Ph.D. degree in Electrical and
1293–1302, Sep. 1996. Computer Engineering, in 1988 from the University of Miami,
Coral Gables, FL. He holds a M.Tech., in Electrical Engi-
Filiz Bunyak received her B.S. and M.S. degrees in control neering (1982) from Indian Institute of Technology, Chennai.
and computer engineering from Istanbul Technical University, He earned his B.E., Electronics and Communication Engi-
Istanbul, Turkey and her Ph.D. degree in computer science from neering, in 1980 from The University of Madras, India. He
University of Missouri-Rolla, Rolla, MO, USA in 2005. In 2005 established and successfully ran the Computer Vision Labo-
she joined University of Missouri-Columbia computer science ratory, and Intelligent Robotics Laboratory (co-established) at
department as post-doctoral researcher. Her research interests The University of Louisiana at Lafayette. His research has
are image processing and computer vision, with emphasis on vi- been funded by NSF, ONR, DOE, AFOSR and The Board of
sual surveillance, video tracking, biomedical image processing, Regents of Louisiana. His recent focus has been on integrated
data fusion, segmentation, level set methods, and mathematical Microsystems for 3D imaging and displays; and high perfor-
morphology. mance embedded computing algorithms for image processing
Kannappan Palaniappan received the B.A.Sc. and M.A.Sc. systems. He also participated in the DARPA Grand Challenge
degrees in systems design engineering from the University of as a charter member of Team CajunBot. He has published
Waterloo, Waterloo, ON, Canada, and the Ph.D. degree in more than 120 articles in: Computer Vision, low-altitude aerial
electrical and computer engineering from the University of imagery, SIMD-Parallel Computing, VLSI-signal processing,
Illinois, Urbana-Champaign, in 1991. From 1991 to 1996, he 3D Displays, Nano-Technology, and 3D Video analysis.
was with the NASA Goddard Space Flight Center, working in He co-organized the DOE/ONR/NSF Sponsored Second Inter-
the Laboratory for Atmospheres, where he coestablished the national Workshop on Foundations of Decision and Information
High-Performance Visualization and Analysis Laboratory. He Fusion, in 1996 (Washington DC), and the IEEE Sixth Interna-
developed the first massively parallel algorithm for satellite- tional Workshop on Computer Architecture for Machine Percep-
based hurricane motion tracking using 64 K processors and tion, New Orleans, 2003. He guest edited IEEE COMPUTER
invented the Interactive Image SpreadSheet system for visu- special issue devoted to Unmanned Intelligent Autonomous
alization of extremely large image sequences and numerical Vehicles, Dec 2006. He also guest-edited a special issue of the
model data over highperformance networks. Many visualization EURASIP Journal on Embedded Computing in the topics of
products created with colleagues at NASA have been widely Intelligent Vehicles. He is an active member of the IEEE, and
used on television, magazines, museums, web sites, etc. With ACM. He is also a member of Tau Beta Pi, Eta Kappa Nu and
the University of Missouri, he helped establish the NSF vBNS Upsilon Pi Epsilon.
highspeed research network, the NASA Center of Excellence in
Remote Sensing, ICREST, and MCVL. He has been with UMI-
ACS, University of Maryland, and worked in industry for Bell

© 2007 ACADEMY PUBLISHER

You might also like