Real-Time Face Tracking System For Human-Robot Interact Ion
Real-Time Face Tracking System For Human-Robot Interact Ion
01999 IEEE
0-7803-5731-0/99$10.00 II -830
Our system satisfies all those requirements simultane- 2.3 Field Multiplexing Device
ously by utilizing the following techniques:
The field multiplexing is a technique t o generate a mul-
0 stereo vision with field multiplexing device tiplexed video steam from two video streams in the
0 image processing board with a normalized corre- analog phase. A diagram of the device is shown in
lation capability Figure 2 . The device takes two video steams which
0 3D model fitting based on virtual spring are synchronized. They are input into a video switch-
ing IC, and one of them are selected and output in
The details of the hardware and software system are every field.
described in section 2 and 3 respectively. Some ex- Thus the frequency of the switching is only 60[Hz],
perimental results are shown in section 4. Finally the which makes the device to be implemented quite easily
conclusion and the future works are described in sec- and cheaply. A photo of the device is also shown in
tion 5. Figure 2 . The size is less than 5[cm] square using
only consumer electronic parts.
2 Hardware Configuration of Real- The advantage of multiplexing video signals in the
analog phase is that it can be applied to any vision
time Stereo Vision System system which takes a single video stream as an in-
put, and makes it perform stereo vision processing.
2.1 System Setup Since the multiplexed image is stored in a single video
Figure 1 illustrates the hardware setup of our real- frame memory, stereo image processing can be per-
time stereo face tracking system. It has a NTSC cam- formed with in the memory. This means there is no
era pair (SONY EVI-370DG x 2) to capture a per- overhead cost for image transfer which is inevitable in
son’s face. The output video signals from the cam- stereo vision system with two image processing board-
eras are multiplexed into one video signal by the “field s. Thus a system with a field multiplexing device can
multiplexing technique” [lo]. The multiplexed video have a higher performance than a system with two
stream is then fed into a vision processing board (Hi- boards.
tach IP5000), where the position and the orientation of The weak point of the field multiplexing is that the
the face are recognized. The result of the recognition image looks strange t o human eyes if you display the
is visualized by a graphics workstation (SGI 02). signal directly on a T V monitor, because two images
are superimposed every two lines. However it doesn’t
make image processing any harder, since a normal im-
2.2 IP5000 Image Processing Board age can be obtained by subsampling the multiplexed
image in the vertical direction.
IP5000 is a PCI half-sized image processing board
which is used being connected with a NTSC camer-
a and a T V monitor. It is equipped with 40 frame
memories of 512 x 512 pixels. It provides a wide va- Field MultiplexingDevice
riety of fast image processing functions performed by
hardware, such as binarization, convolution, filtering,
MixOut Image
labeling, histogram calculation, color extraction and Processing
normalized correlation. The frequency of those opera- System
tion is 73.5[MHz],therefore it can apply a basic func-
tion (e.g. binarization) to one image within 3.6[ms].
p, Stereemera
Vision
Processor
Penliumll 450MHz
MMB Memory
SGI 02
Fig. 1 : System configuration of human-machine in- Fig. 2 : Block diagram and a photograph of the Field
terface. Multiplexing Device
II -831
3 Stereo Face Tracking Algorithm
3.1 3D Facial Model
The 3D facial model utilized in our stereo face tracking
is composed of three items as follows: .............................. face image
...................
II -832
3.2.1 3D Feature Tracking
In the 3D feature tracking stage, each feature is as-
1- Attime t
sumed to have a small motion between the current obiect
frame and the previous one, and the 2D position in
the previous frame is utilized to determine the search nt
area in the current frame. The feature images stored
in the 3D facial model are used as templates, and the
right image is used as a search area. Then the matched
image in 2D feature tracking is used as a template and
the left image is utilized as a search area. Thus the
3D coordinates of each feature are acquired. The pro- Orient (O,O,O)
cessing time of the whole tracking process (i.e. feature
tracking +
stereo matching for six features) is about
lO[ms] by IP5000.
+
3D feature tracking. As mentioned before, the face is X
assumed t o have a small motion between the frames.
This also means there can be only small displacements
in terms of the position and the orientation, which is Z
described as (Ax,Ay,Az,Ad,AB, AV) in Figure 6 (1).
Then the position and the orientation acquired in (3) 3D model fitting
based on virtual spring:
the previous frame (at time t) are utilized t o rotate
and translate the measurement sets to move back clos- Stiffness of spring kn
er to the model as shown in Figure 6 (2). After the oc correlation value
rotation and translation, the measurements still have
a small disparity to the model due t o the motion which F2 4
FR
occurred during the interval At. Then the fine mod-
el fitting is to be performed. To realize robust model
fitting, it is essential to take the reliability values of Fig. 6 : 3D model fitting algorithm.
II -833
values are then used as the stiffness of the springs be-
tween each feature in the model and the corresponding
measurement as shown in Figure 6 (3). The model is
then rotated and translated gradually and iteratively
to reduce the elastic energy of the spring. The weight-
ing based on the reliability makes the fitting result in-
sensitive to the partial matching failure, and enables
robust face tracking. The processing time of the iter-
ative model fitting takes less than 2[ms] using a Pen-
tiumII 450MHz.
4 Experimental Results
4.1 Face Tracking
Some snapshots obtained as results of tracking in our
real-time face tracking system are shown Figure 7 .
(1) and (2) in Figure 7 are the results when the
face has rotations, while (3) and (4) indicates the re-
sults when the face moves closer to and further from Fig. 8 : Result of face tracking at situations with de-
the camera. The whole tracking process takes about formation and occlusion.
30[ms] which is within a NTSC video frame rate. The
accuracy of the tracking is approximately &l[mm] in
translation and fl[deg] in rotation.
4.2 Visualization
The snapshots in Figure 8 show the results of
tracking when there is some deformation of the facial The results of the tracking are visualized using a SGI
features and partial occlusions of the face by a hand. 0 2 graphics workstation. Figure 9 illustrates ex-
The results indicate our tracking system works quite amples of the tracking results and the corresponding
robustly in such situations owing to our model fitting visualization. The 3D model used in the visualization
method. By utilizing the normalized correlation func- consists of rigid surface of the face and two eyeball-
tion on the IP5000, the tracking system is tolerant of s. The face has six DOF for position and orientation,
the fluctuation in lighting. and the eyeballs have two DOF respectively. The po-
II -834
Fig. 10 : Result of gaze detection.
References
Fig. 9 : Visualization of tracking results.
A. Azarbayejani , TStarner, B.Horowitz,
and A.Pentland. Visually controlled graphics. IEEE
Trans. on Pattern Analysis and Machine Intelligence,
15(6):602-605, 1993.
sition of the irises of the eyes is detected using the
circular hough transform, which are used to move the A.Zelinsky and J.Heinzmann. Real-time Visual
Recognition of Facial Gestures for Human Computer
eyes of the mannequin head. The visualization process Interaction. In Proc. of the Int. Conf. on Automatic
is performed online during the tracking, therefore the Face and Gesture Recognition, pages 351-356, 1996.
mannequin head can mimic the person’s head and eye P.Ballard and G.C.Stockman. Controlling a Comput-
motions in real-time. er via Facial Aspect. IEEE h n s . Sys. Man and Cy-
bernetics, 25(4):669-677, 1995.
Black and Yaccob. Tracking and Recognizing Rigid
5 Conclusion and Non-rigid Facial Motions Using Parametric Mod-
els of Image Motion. In Proc. of Int. Conf. on Com-
In this paper, our robust real-time stereo face tracking puter Vision (ICCV’95), pages 374-381, 1995.
system for visual human interfaces was presented. The S.Birchfield and C.Tomasi. Elliptical Head Tracking
system consists of a stereo camera pair and a standard Using Intensity Gradients and Color Histograms”.
In Proc. of Computer Vision and Pattern Recognition
PC equipped with an image processing board, and is (CVPR’98), 1998.
able to detect the position and orientation of the face.
A.Gee and R.Cipolla. Fast Visual Tracking by Tem-
The face tracking system is (1) non-intrusive, (2) pas- poral Consensus. Image and Vision Computing,
sive, (3) real-time and (4)accurate, all of which have 14(2):105-114, 1996.
not been able to be achieved by previous research re- Kentaro Toyama. Look, Ma - No Hands! Hands-
sults. The qualitative accuracy and robustness of the Free Corsor Control with Real-time 3D Face Tracking.
tracking is yet to be evaluated, however we believe that In Proc. of Workshop on Perceptual User Interface
the performance of the system is quite high compared (PUI’98), 1998.
with existing systems. J.Heinzmann and A.Zelinsky. 3-D Facial Pose and
By extending the system presented in this paper, we Gaze Point Estimation using a Robust Real-Time
Tracking Paradigm. In Proc. of the Int. Conf. on Au-
have already succeeded in detecting the the 3D gaze tomatic Face and Gesture Recognition, 1998.
vector of a person in real-time (at 15[Hz]). Snapshots R.Stiefelhagan, J.Yang, and A.Waibe1. Tracking Eyes
of the experiment are shown as shown in Figure 10 . and Monitoring Eye Gaze. In Proc. of Workshop on
Our system is developed to be utilized for a visual Perceptual User Interface (PUI’97), 1997.
interface between a human and a robot. However con- Y . Matsutmoto, T. Shibata, K. Sakai, M. Inaba, and
sidering the advantages described above, it can be ap- H. Inoue. Real-time Color Stereo Vision System for a
plied to various targets, such as psychological experi- Mobile Robot based on Field Multiplexing. In Proc. of
ments, ergonomic designing, products for the disabled IEEE Int. Conf. on Robotics and Automation, pages
1934-1939, 1997.
and the amusement industry. In our future work, we
II -835