Toward Accurate and Robust Cross-Ratio based Gaze Trackers Through Learning From Simulation (ETRA 2014)

Toward Accurate and Robust Cross-
Ratio based Gaze Trackers Through
Learning from Simulation
Jia-Bin Huang1, Qin Cai2, Zicheng Liu2,
Narendra Ahuja1, and Zhengyou Zhang2
21

Why?
• Multimodal natural interaction
• Gaze + touch, gesture, speech
If I were an iron man…

Why?
• Understanding user attention and intention

Why?
• Understanding interaction among people
Before sunrise
1995

Sclera
Limbus
Pupil
Iris
Glint
Cornea (like a spherical mirror)
Mike @ Monster University

Gaze Estimation using
Pupil Center and Corneal Reflections
Interpolation-
based
Cross-Ratio
based
Model-based

Model-based Gaze Estimation
• Detailed geometric modeling between light sources, corneal, and
camera [Guestrin and Eizenman, 2006]
• Pros
• Accurate (reported performance < 1o)
• 3D gaze direction
• Head pose invariant
• Cons
• Need careful hardware calibration
Figure from [Guestrin and Eizenman, 2006]

Interpolation-based Gaze Estimation
• Learn polynomial regression from subject-dependent calibration
• Directly map from normalized to Point of Regard (2D PoR)
[Cerrolaza et al., 2008]
• Pros
• Simple to implement
• No need for hardware calibration
• Cons
• Head pose sensitive

Cross-Ratio based Gaze Estimation
• Gaze estimation by exploiting invariance of a plane
projectivity [Yoo et al. 2002]
• Pros
• Simple to implement
• No need for hardware calibration
• Head pose invariant
• Cons
• Large subject dependent bias occur
because simplifying assumptions Figure from [Coutinho and Morimoto 2012]

The Basic Form of Cross-Ratio Method
Image
Corneal
Display

Two Sources of Errors [Kang et al. 2008]
• Angular deviation of visual axis and optical axis
• Virtual image of pupil center is not coplanar with corneal
reflections

Improve Accuracy for Stationary Head
CR [Yoo-2002]
CR-Multi [Yoo-2005]
CR-HOM [Kang-2007]
CR-HOMN [Hansen-2010]
CR-DV [Coutinho-2006]
No correction
Scale correction
Scale and translation correction
Homography correction
+ Residual interpolation

Improve Robustness for Head
Movements
No adaptation Adapt to eye
depth variations
Adapt to eye movements
Assumptions
1) weak perspective
2) fixed eye parameters.
CR [Yoo-2002] CR-DD [Coutinho and
Morimoto 2010]
PL-CR [Coutinho and
Morimoto 2012]

Accuracy of Gaze Prediction for
Stationary Head
Robustness to Head
Movement
No adaptation
CR [Yoo-2002]
CR-Multi [Yoo-2005]
CR-DV [Coutinho-2006]
CR-HOM [Kang-2007]
CR-HOMN [Hansen-2010]
No correction
Scale correction
Scale and translation
correction
Homography correction +
Residual interpolation
CR-DD [Coutinho-2010]
Adapt to eye depth
variations only
PL-CR [Coutinho-2012]
Assumptions
1) weak perspective
2) fixed eye parameters.
No assumptions on
1) weak perspective
2) fixed eye parameters
This paper

How? The Main Idea
• Build upon the homography normalization method [Hansen et al
2010]
• Improving accuracy and robustness simultaneously by introducing the
Adaptive Homography Mapping

Adaptive Homograph Mapping
• Two types of predictor variables
• : capture the head movements relative to the calibration position
• Affine transformation between the glints quadrilateral
• : capture gaze direction for spatially-varying mapping
• Pupil center position in the normalized space
• : polynomial regression of degree two with parameter

Training Adaptive Homography Mapping
• Exploit large amount of simulated data
• the set of sampled head position in 3D
• the set of calibration target index in the screen space
• Objective function

Minimizing the Objective Function
• Minimize an algebraic error at each sampled head position
• Use the solution from algebraic error minimization as initialization
Minimize the re-projection errors using the Levenberg-Marquardt
algorithm

Visualize the Training Process
• Eye gaze prediction results using the bias-correcting homography
computed at the calibration position

RMSE Error Comparisons Using
Different Training Models
• Differences are small in
linear regression
• Linear model is not
sufficiently complex
• Compensation using both
predictor variables achieve
the lowest errors

Linear Regression
Adding the normalized pupil center
corrected spatially-varying errors

Experimental Results – Synthetic data
• Setup
• Screen size 400mm x 300mm
• Four IR lights
• Camera 13mm focal length, placed slighted below the screen border
(FoV~31 degree)
• Calibration position and eye parameters
• Eye parameters from [Guestrin and Eizenman, 2006]

Stationary Head
Varying corneal radius

Stationary Head
Varying pupil-corneal distance

Stationary Head
Varying (horizontal) angle between optical/visual axis

Stationary Head
Varying (vertical) angle between optical/visual axis

Head Movements Parallel to the Screen

Head Movement along Depth Variation

Tested at Another Head Position

Effect of Sensor Resolution (at
calibration)
Focal Length = 13 mm Focal Length = 35 mm

Effect of Sensor Resolution (at new
position)
Focal Length = 13 mm Focal Length = 35 mm

Real Data Evaluation –
Programmable Hardware Setup
Off-axis IR light sources
Stereo camera
(We use one only in this work)
On-axis ring light

Real Data Evaluation – Feature Detection
• Detecting glints and pupil center

Averaged Gaze Estimation Error
at calibration position

Averaged Gaze Estimation Error
Calibrated at 600mm from screenCalibrated at 500mm from screen

Conclusions
• A learning-based approach for simultaneously compensating (1)
spatially varying errors and (2) errors induced from head movements
• Generalize previous work on compensating head movements using
glint geometric transformation [Cerroaza et al. 2012] [Coutinho and
Morimoto 2012]
• Leveraging simulated data avoid the tedious data collection

Future Work
• Consider subject-dependent parameters in the learning and inference
the adaptive homography adaptation
• Integrate binocular information, please see poster
Zhengyou Zhang, Qin Cai, Improving Cross-Ratio-Based Eye Tracking
Techniques by Leveraging the Binocular Fixation Constraint
• Extensive user study using a physical setup

Comments or questions?
Jia-Bin Huang
jbhuang1@Illinois.edu
Narendra Ahuja
n-ahuja@Illinois.edu
Zhengyou Zhang
zhang@microsoft.com
Qin Cai
qincai@microsoft.com
Zicheng Liu
zliu@microsoft.com

Toward Accurate and Robust Cross-Ratio based Gaze Trackers Through Learning From Simulation (ETRA 2014)

More Related Content

Viewers also liked

Similar to Toward Accurate and Robust Cross-Ratio based Gaze Trackers Through Learning From Simulation (ETRA 2014)

More from Jia-Bin Huang

Recently uploaded

Toward Accurate and Robust Cross-Ratio based Gaze Trackers Through Learning From Simulation (ETRA 2014)