0% found this document useful (0 votes)
60 views30 pages

Scale Invariant Feature Transform: Tom Duerig

The document discusses the Scale Invariant Feature Transform (SIFT) algorithm for extracting features from images that are invariant to changes in scale, rotation, and illumination. SIFT achieves scale invariance using difference of Gaussian filtering across image scales. Rotation invariance is obtained by assigning orientations based on local gradient directions. Affine invariance is achieved by using local gradient histograms as descriptors.

Uploaded by

titu1993
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views30 pages

Scale Invariant Feature Transform: Tom Duerig

The document discusses the Scale Invariant Feature Transform (SIFT) algorithm for extracting features from images that are invariant to changes in scale, rotation, and illumination. SIFT achieves scale invariance using difference of Gaussian filtering across image scales. Rotation invariance is obtained by assigning orientations based on local gradient directions. Affine invariance is achieved by using local gradient histograms as descriptors.

Uploaded by

titu1993
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 30

Scale Invariant Feature Transform

Tom Duerig

Why do we care about matching


features?

Object Recognition
Wide baseline matching
Tracking/SFM

We want invariance!!!
Good

features should be robust to all sorts of


nastiness that can occur between images.

Types of invariance

Illumination

Types of invariance

Illumination
Scale

Types of invariance

Illumination
Scale
Rotation

Types of invariance

Illumination
Scale
Rotation
Affine

Types of invariance

Illumination
Scale
Rotation
Affine
Full Perspective

How to achieve illumination invariance

The easy way


(normalized)
Difference based
metrics (random tree,
Haar, and sift)

How to achieve scale invariance


Pyramids

Divide width and height by 2


Take average of 4 pixels for each pixel (or
Gaussian blur)
Repeat until image is tiny
Run filter over each size image and hope its
robust

Scale

Space (DOG method)

Pyramids

How to achieve scale invariance


Pyramids
Scale

Space (DOG method)

Pyramid but fill gaps with blurred images


Like having a nice linear scaling without the
expense
Take features from differences of these images
If the feature is repeatably present in between
Difference of Gaussians it is Scale Invariant and
we should keep it.

Differences Of Gaussians

Rotation Invariance
Rotate

all features to go the same way in a


determined manner
Take histogram of Gradient directions (36 in
paper for 1 every 10 degrees)
Rotate to most dominant (maybe second if its
good enough, sub-Bin accuracy)

Rotation Invariance

Affine Invariance
Easy

way: Warp your training and hope


Fancy way: design your feature itself to be
robust against affine transformations (SIFT
method)

Actual SIFT features

Remember the gradient


histograms we used for
rotation invariance?
Same theory, except
keep N2 histograms (4
shown, 16 used)
Note, use weighted
contributions to avoid
edge nastiness

SIFT algorithm overview


Get

tons of points from maxima+minima of


DOGS
Threshold on simple contrast (low contrast is
generally less reliable than high for feature
points)
Threshold based on principal curvatures
(technical term is linyness)

SIFT algorithm overview


Gradient,

histogram, Rotate
Take old gradient, histogram regions using
Gaussian weighting
Hand off beautiful robust feature to someone
who cares (i.e. object recognizer, navigation
software (SFM), or stereo matching
algorithm)

Actual SIFT stage output

MSER (a one minute survey)


Maximally

Stable Extremal Regions


Go through thresholds, grab regions which
stay nearly the same through a wide range of
thresholds (connected components then
bounding ellipses)
Keep those regions descriptors as features
As regions are illumination based warp
ellipse to circle for affine invariance

Actual MSER output

Initial image

Threshold

Threshold

Threshold

Ears and Square (note: not actual


algorithm output)

How to use these features?


Distance

could be L2 norm on histograms


Match by (nearest neighbor distance)/(2 nd
nearest neighbor distance) ratio
Object recognize with Hough of pose of
points (ie, these three should be in line on
object, gee they are thats the object all
right)

But where does the magic end?

SIFT is relatively expensive (computationally) and


copyrighted (the other type of expensive)
MSER doesnt work well with images with any
motion blur
Interesting alternatives:

GLOH (Gradient Location and Orientation Histogram)

larger initial descriptor + PCA

SURF (Speeded Up Robust Features)

possibly faster AND more robust?

Credits + References

MSER image from J. Matas, O. Chum, M. Urban, T. Pajdla


Robust Wide Baseline Stereo from Maximally Stable Extremal
Regions, BMVC, 2002
SIFT keypoint images from F. Estrada & A Jepson & D. Fleet s
SIFT tutorial, 2004
GLOH A performance evaluation of local descriptors, Krystian
Mikolajczyk and Cordelia Schmid, IEEE tran. On Pattern
Analysis and Machine Intelligence, pp 16151630, 2005
SURF SURF: Speeded Up Robust Features, Herbert Bay,
Tinne Tuytelaars and Luc Van Gool Proceedings of the 9th
European Conference on Computer Vision 2006

You might also like