Feature detection
Jayanta Mukhopadhyay
Dept. of Computer Science and Engg.
Key Feature Points
Intensity (Maximum
or Minimum)
Edges
Corners
Noise
Orientation
Scale
Harris corner detector
___
2
______
H f x fx f y
______ ___
f y f x f y
2
Algorithm:
1. Compute hm=det(H)/trace(H).
2. Retain hm>threshold.
3. Select local maxima as key-points.
Examples: Harris corner points
Other key-point extractor
G ( x, y )
Difference of g ( x, y ) . f ( x , y ) *
Gaussian (DoG)
Measures from Local Maxima
Hessian Matrix
_ __
(Tr ( H )) 2
H __
f xx f xy
_ Det ( H )
f yx f yy Smaller values
Feature Descriptors
Scale Invariant Accumulate statistics
Feature Transform of neighboring
(SIFT) gradients.
Speeded Up Robust Final descriptor is a
Feature (SURF) multidimensional
Histogram of feature vector.
Gradients (HOG) Feature vectors are
used in classification
/ similarity matches.
Feature detection (Chapter 4 -
Szeleski)
Local measure of feature uniqueness
How does the window change when you shift it?
Shifting the window in any direction may cause a big change.
flat region: edge: no corner:
no change in change along significant
all directions the edge change in all
direction directions
Slide adapted from Darya Frolova, Denis Simakov, Weizmann Institute.
Feature detection
Consider shifting the window W by
(u,v)
how do the pixels in W change?
compare each pixel before and
after by summing up the squared
W
differences (SSD)
this defines an SSD error of
E(u,v):
Small motion assumption
For small u and v
Feature detection
W
Feature detection
For the example above
You can move the center of the green window
to anywhere on the blue unit circle
Which directions will result in the largest and
smallest E values?
We can find these directions by looking at the
eigenvectors of H.
Quick eigenvalue/eigenvector
review
The eigenvectors of a matrix A are the vectors x that satisfy:
The scalar is the eigenvalue corresponding to x
The eigenvalues are found by solving:
In our case, A = H is a 2x2 matrix, so we have
The solution:
Once you know , you find x by solving
Feature detection
x-
x+
Eigenvalues and eigenvectors of H
Define shifts with the smallest and
largest change (E value)
x+ = direction of largest increase in E.
+ = amount of increase in direction x+
x- = direction of smallest increase in E.
- = amount of increase in direction x+
Feature detection
How are +, x+, -, and x- relevant for
feature detection?
Whats our feature scoring function?
Want E(u,v) to be large for small shifts in all
directions
the minimum of E(u,v) should be large,
over all unit vectors [u v]
this minimum is given by the smaller
eigenvalue (-) of H
Feature detection
Feature detection summary
Heres what we do
Compute the gradient at each point in the image
Create the H matrix from the entries in the gradient
Compute the eigenvalues.
Find points with large response (- > threshold)
Choose those points where - is a local maximum
as features
Feature detection summary
Choose those points where - is a
local maximum as features
The Harris operator
- is a variant of the Harris operator for feature detection.
The trace is the sum of the diagonals, i.e.,
trace(H) = h11 + h22
Very similar to - but less expensive (no square root)
Called the Harris Corner Detector or Harris Operator
Lots of other detectors, this is one of the most popular.
The Harris operator
Harris
operator
Harris detector example
f value (red high, blue low)
Threshold (f > value)
Find local maxima of f
Harris features (in red)
Feature Matching
We know how to detect good points
Next question: How to match them?
Describe feature points: Feature descriptor.
Matching with Features
Detect feature points in both images
Matching with Features
Detect feature points in both images
Find corresponding pairs
Matching with Features
Detect feature points in both images.
Find corresponding pairs.
Use these pairs to align images.
Invariance
Suppose you rotate the image by some
angle
Will you still pick up the same features?
What if you change the brightness?
Scale?
Scale invariant detection
Suppose youre looking for corners
Key idea: find scale that gives local maximum of f
f is a local maximum in both position and
scale
Common definition of f: Laplacian
(or difference between two Gaussian filtered
images with different s.d.s.).
Invariance
Suppose we are comparing two images I1
and I2
I2 may be a transformed version of I1
What kinds of transformations are we likely
to encounter in practice?
Invariance
Wed like to find the same features regardless of
the transformation
This is called transformational invariance
Most feature methods are designed to be invariant
to
Translation, 2D rotation, scale
They can usually also handle
Limited 3D rotations (SIFT works up to about 60
degrees)
Limited affine transformations (some are fully affine
invariant)
Limited illumination/contrast changes
How to achieve invariance
Need both of the following:
1. Make sure your detector is invariant
Harris is invariant to translation and rotation
Scale is trickier
common approach is to detect features at many scales
using a Gaussian pyramid (e.g., MOPS)
More sophisticated methods find the best scale to
represent each feature (e.g., SIFT)
2. Design an invariant feature descriptor
A descriptor captures the information in a region
around the detected feature point
The simplest descriptor: a square window of pixels
Whats this invariant to?
Lets look at some better approaches
Finding Keypoints Scale,
Location
How do we choose scale?
Scale Invariant Detection
Functions for determining scale
Kernels: f Kernel Image
L 2 Gxx ( x, y, ) Gyy ( x, y, )
(Laplacian)
DoG G( x, y, k ) G( x, y, )
(Difference of Gaussians)
where Gaussian
x2 y 2
G ( x, y , ) 1
2
e 2 2 Note: both kernels are
invariant to scale and rotation
Finding Keypoints Scale,
Location
# of scales/octave
=> empirically
Downsample
Find extrema
in 3D DoG space
Convolve with
Gaussian
Relationship between LoG and
DoG operator
Scale Invariant Detectors
Harris-Laplacian1 scale
Laplacian
Find local maximum of:
Harris corner
y
Laplacian in scale Harris x
SIFT (Lowe)2 scale
Find local maximum of:
DoG
Difference of Gaussians y
in space and scale
DoG x
1 K.Mikolajczyk,C.Schmid. Indexing Based on Scale Invariant Interest Points. ICCV 2001
2 D.Lowe. Distinctive Image Features from Scale-Invariant Keypoints. IJCV 2004
Scale Invariant Detectors
Experimental
evaluation of
detectors
w.r.t. scale change
Repeatability rate:
# correspondences
# possible correspondences
K.Mikolajczyk, C.Schmid. Indexing Based on Scale Invariant Interest Points. ICCV 2001
Scale Invariant Detection:
Summary
Given: two images of the same scene with a large
scale difference between them.
Goal: find the same interest points independently in
each image.
Solution: search for maxima of suitable functions in
scale and in space (over the image).
Methods:
1. Harris-Laplacian [Mikolajczyk, Schmid]: maximize Laplacian
over scale, Harris measure of corner response over the
image.
2. SIFT [Lowe]: maximize Difference of Gaussians over scale
and space.
Keypoint localization
There are still a lot of points, some
of them are not good enough.
The locations of keypoints may be
not accurate.
Eliminating edge points.
Eliminating edge
points
Edge point: large principal curvature across the
edge but a small one in the perpendicular
direction.
The eigenvalues of H are proportional to the
principal curvatures, so two eigenvalues
shouldnt differ too much.
Orientation assignment
Create histogram of local
gradient directions at selected
scale.
Assign canonical orientation at
peak of smoothed histogram.
Each key specifies stable 2D
coordinates (x, y, scale,
orientation).
If 2 major orientations, use both.
Keypoint localization with
orientation
initial
keypoints
233x189
832
729
536
keypoints keypoints
after after
gradient ratio
threshold threshold
Keypoint Descriptors
At this point, each keypoint has
location
scale
orientation
Next is to compute a descriptor for the local
image region about each keypoint that is
highly distinctive
invariant as possible to variations such as
changes in viewpoint and illumination
Scale Invariant Feature
Transform (Lowe99, ICCV)
Take 16x16 square window (orientation corrected)
around detected feature
Compute edge orientation (angle of the gradient - 90)
for each pixel
Throw out weak edges (threshold gradient magnitude)
Create histogram of surviving edge orientations
0 2
angle histogram
Adapted from slide by David Lowe
SIFT descriptor
Full version
Divide the 16x16 window into a 4x4 grid of cells
(2x2 case shown below).
Compute an orientation histogram for each cell
16 cells * 8 orientations = 128 dimensional
descriptor.
Adapted from slide by David Lowe
Properties of SIFT
Extraordinarily robust matching technique
Can handle changes in viewpoint
Up to about 60 degree out of plane rotation
Can handle significant changes in illumination
Sometimes even day vs. night (below)
Fast and efficientcan run in real time
Lots of code available
http://people.csail.mit.edu/albert/ladypack/wiki
/index.php/Known_implementations_of_SIFT
Speeded-Up Robust Features
(SURF): Another descriptor
Speeded-Up Robust Features (SURF)
(Bay et al. ECCV, 2006)
Box-type convolution filters and use of
integral images to speed up the
computation.
Use of Hessian operator for key point
detection Local maxima of det(H).
Accumulate orientation corrected Haar
wavelet responses.
Patch Descriptor: Histogram of
Gradients (HoG) (N. Dalal and B. Triggs
CVPR 2005)
Compute centered horizontal and vertical
gradients with no smoothing.
Compute gradient orientation and magnitudes,
For color image, pick the color channel with the
highest gradient magnitude for each pixel.
For a 64x128 image, divide the image into
16x16 blocks of 50% overlap. 7x15=105
blocks in total
Histogram of Gradients (HoG)
Each block: 2x2 cells with size 8x8.
Quantize the gradient orientation into 9 bins.
The vote is the gradient magnitude.
Interpolate votes between neighboring bin center.
The vote can also be weighted with Gaussian to
down-weight the pixels near the edges of the
block.
Concatenate histogramsFeature dimension:
105x4x9 = 3,780
Image / Object Descriptor
Bag of visual words (Sivic et. al.,
ICCV05)
Compute key-point based feature
descriptors.
Quantize them (clustering) to form a finite
set of representative descriptors (visual
words).
Represent by a histogram of visual words.
Image Analysis: A brief outline
End Analysis
Pre-processing (Classification,
(Filtering, Enhancement, ..) Matching, Retrieval)
Feature Extraction
And Description
(SIFT, SURF, HOG,
BoVW)