Facial Feature Tracking
We employ a hierarchical method for tracking the facial features used by the head orientation recovery model. We assume the initial locations of these points are given. This initialization problem has been addressed in the literature. Given these points in the first frame, we track each point independently by employing image-based parameterized tracking [3]. This performs both global region tracking of the whole face and local region tracking of the individual facial features. We then refine our position estimates by employing color correlation. Figure below shows the hierarchical framework for facial feature tracking.
From face and feature tracking, the locations of feature
points in the next frame are estimated. The positional
refinement step involves creating
a template, from the current frame, of each feature point and
then performing a color correlation over a search region around the
estimated location in the next frame.
In the correlation, a mis-match score, M, which is a
measure of the difference between template, T, and the
target image, I, is determined. M is
defined as follows:
M = S(i, j) MAX
(|RT(i,j) -
RI(i,j)|,
(|GT(i,j) -
GI(i,j)|,
(|BT(i,j) -
BI(i,j)|)
where R, G, B represent red, green, and blue pixel
values respectively, and i, j are the index of the pixel.
Our orientation computational model assumes that the corners of the eyes are
collinear. We take this constraint into account while the corners of the
eyes are being tracked. While performing correlation over the search region,
the mis-match scores and their corresponding pixel positions are tabulated.
Then, we consider the best m candidates for each feature.
Next, a minimum-squared-error line fitting algorithm is used to
determine a single ``eye'' line. More precisely, given the
set of 4m candidate points
(xi, yi),
we find c0 and
c1
that minimize the error function
S(i=1 to n)
[(c0 +
c1xi) -
yi]^2
Once the best fit line is obtained, we then choose the one, among the
candidates for each eye corner, that minimizes
|d|/|dmax| +
M/Mmax
where |d| is the vertical distance between the
point to the line; and where
|dmax| and
Mmax are the maximum value of
|d| and M respectively.
Representative results of our tracking method are shown in figure below.
Roll Recovery
Roll is straight forward to recover from the image of the eye corners.
From the figure, we immediately see that the head roll is
g = arctan (
D y /
D x )
where
D y
is the vertical distance and
D x is the horizontal distance between
E1 and E4 respectively.
Yaw Recovery
Let D1 denote the width of the eyes and D2
denote half of the distance between the two innner eye corners.
The head yaw is recovered based on the assumptions that
- E1E2 = E3E4 (i.e., the eyes are of equal width)
- E1, E2, E3, and E4 are collinear.
Then from the well-known projective invariance of the cross-ratios we have
which yields
where
From perspective projection we obtain
where f is the focal length of the camera. From
(6), we obtain
where
From (9), we can determine the head yaw
(b)
However, since e3 is measured relative to
the projection of the midpoint of E1E4 (which is unknown)
we need to determine S and e3 from
the relative distance among the projections of four eye corners.
From (3) - (8), we obtain a quadratic equation in S:
So,
where
A = (1 - Du/Dv)
B = -((2/Q) + 2)(1 + Du/Dv)
C = ((2/Q) + 1)(1 - D/uDv)
To determine e3, we employ another two cross-ratio invariants
From (13) and (14) it can be shown that
By replacing (12) and (15) in (10), we can now
determine the head yaw angle (b).
Note that b
depends only on the relative distances among four eye corners
and the focal length while being independent of the face structure and
the distance of the face from the camera. It is also not influenced
by other parameters such as the translation of the face along any axis.
Pitch Recovery
Let D denote the 3D length of nasal bridge,
p0 denote the
projected length of the nasal bridge when it is parallel to the image
plane and p1 denote the observed length of the
nasal bridge at the
unknown pitch. Let (X0, Y0, Z0) and (X5, Y5, Z5)
denote
the 3D coordinates of the tip of the nose at 0 degree and at the current
angle a. From the perspective
projection, we obtain
From (16) and (17) it can be shown that
The estimated pitch angle, a, can be computed by :
where
Computing T requires estimating p0
which is not generally known. Instead, we obtain it by first catagorizing
the observed face
with respect to the variables of gender, race and age, and then use
tabulated anthropometric data to estimate the mean of p0.
Let N denote the average length of the nasal bridge and
E
denote the average length of the eye fissure (Biocular width).
By employing these statistical estimates for the face structure variables,
p0 can be estimated:
where w is the length of projective eye fissure
in the image plane.
|