0% found this document useful (0 votes)
5 views

06_feature_detection_2

The document discusses vision algorithms for mobile robotics, focusing on point feature detection and matching, particularly using SIFT (Scale-Invariant Feature Transform). It covers key concepts such as feature descriptor invariance to scale, rotation, and viewpoint changes, along with methods for automatic scale selection and descriptor normalization. The SIFT descriptor is highlighted for its robustness against severe viewpoint and illumination changes, while also noting its computational expense.

Uploaded by

Thet Hsu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

06_feature_detection_2

The document discusses vision algorithms for mobile robotics, focusing on point feature detection and matching, particularly using SIFT (Scale-Invariant Feature Transform). It covers key concepts such as feature descriptor invariance to scale, rotation, and viewpoint changes, along with methods for automatic scale selection and descriptor normalization. The SIFT descriptor is highlighted for its robustness against severe viewpoint and illumination changes, while also noting its computational expense.

Uploaded by

Thet Hsu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 86

Vision Algorithms for Mobile Robotics

Lecture 06
Point Feature Detection and Matching – Part 2

Davide Scaramuzza
http://rpg.ifi.uzh.ch 1
Lab Exercise 4 – Today
Implement SIFT blob detection and matching

2
Main questions
• What features are repeatable and distinctive?
• How to describe a feature?
• How to establish correspondences, i.e., compute matches?

3
Feature Matching
For each point, how to match its corresponding point in the other image?
• Brute-force Matching: compare each feature descriptor of Image 1 against the descriptor of each
feature in Image 2 and assign as correspondence the feature with closest descriptor (e.g.,
minimum of SSD). If each image contains N features, we need to perform 𝑵𝟐 comparisons.

Image 1 Image 2
4
Recall: Patch and Census Descriptors
• Patch descriptor
(i.e., patch of intensity values, integer values)

• Census descriptor (binary values)

5
HOG Descriptor (Histogram of Oriented Gradients)
• The patch is divided into a grid of cells and for each cell a histogram of gradient directions is compiled.
• The HOG descriptor is the concatenation of these histograms (used in SIFT)
• Differently from the patch and Census descriptors, HOG has float values.

0 2p
Example of gradient histogram with 8 orientation bins.
Each vote is weighted by the gradient magnitude

HOG Descriptor: …
(1D vector)
0 2p 0 2p 0 2p
6
Feature Descriptor Invariance
Are feature descriptors invariant (robust) to geometric and photometric changes?

Geometric changes: scale, rotation, viewpoint Photometric changes: illumination

Image 1 Image 2 Image 1 Image 2

7
Outline
• How to achieve descriptor invariance to:
• Scale
• Rotation
• Viewpoint
• The SIFT blob detector and descriptor
• Other corner and blob detectors and descriptors

8
Scale changes
How can we match image patches corresponding to the same feature but belonging to
images taken at different scales?

Image 1 Image 2

9
Scale changes
How can we match image patches corresponding to the same feature but belonging to
images taken at different scales? Possible solution: rescale the patch

Image 1 Image 2

10
Scale changes
How can we match image patches corresponding to the same feature but belonging to
images taken at different scales? Possible solution: rescale the patch

Image 1 Image 2

11
Scale changes
How can we match image patches corresponding to the same feature but belonging to
images taken at different scales? Possible solution: rescale the patch

Image 1 Image 2

12
Scale changes
• Scale search is time consuming (needs to be done individually for all patches
in one image)
• Complexity is 𝑁 2 𝑆 assuming 𝑁 features per image and 𝑆 rescalings per
feature
• Solution: automatic scale selection: automatically assign each feature its
own “scale” (i.e., size)

13
Automatic Scale Selection
• Idea: Design a function on the image patch, which is scale invariant (i.e., it has the same
value for corresponding patches, even if they are at different scales)

f Image 1 f Image 2
scale = 1/2

patch size patch size


14
Automatic Scale Selection
• Idea: Design a function on the image patch, which is scale invariant (i.e., it has the same
value for corresponding patches, even if they are at different scales)
• Find local extrema of this function
• The patch size at which the local extremum is reached should be invariant to image rescaling
• Important: this scale invariant patch size is found in each image independently

f Image 1 f Image 2
scale = 1/2

s1 patch size s2 patch size


15
Automatic Scale Selection: Example
Image 1 Image 2

f ( I ( x, y,  )) f ( I ( x' , y ' ,  ' ))

𝜎 𝜎′
16
Automatic Scale Selection: Example
Image 1 Image 2

f ( I ( x, y,  )) f ( I ( x' , y ' ,  ' ))

𝜎 𝜎′
17
Automatic Scale Selection: Example
Image 1 Image 2

f ( I ( x, y,  )) f ( I ( x' , y ' ,  ' ))

𝜎 𝜎′
18
Automatic Scale Selection: Example
Image 1 Image 2

f ( I ( x, y,  )) f ( I ( x' , y ' ,  ' ))

𝜎 𝜎′
19
Automatic Scale Selection: Example
Image 1 Image 2

f ( I ( x, y,  )) f ( I ( x' , y ' ,  ' ))

𝜎 𝜎′
20
Automatic Scale Selection: Example
Image 1 Image 2

f ( I ( x, y,  )) f ( I ( x' , y ' ,  ' ))

s1 𝜎 s2 𝜎′
21
Automatic Scale Selection: Example
Image 1 Image 2

When the right scale is found, the patches must be normalized to


a canonical size so that they can be compared by SSD.
Patch normalization is done via warping.
22
Automatic Scale Selection: Example
Image 1 Image 2

23
Automatic Scale Selection
• A “good” function for scale detection should have a single & sharp peak

f f f Very good!
bad
Good or Bad?
patch size patch size patch size

• What if there are multiple peaks? Is it really a problem?

• What is a good function?


• Sharp intensity changes are good regions to monitor in order to identify the scale

24
Automatic Scale Selection
• The ideal function for determining the scale is one that highlights sharp discontinuities
• Solution: convolve image with a kernel that highlights edges

f = Kernel  Image

• It has been shown that the Laplacian of Gaussian kernel is optimal under certain
assumptions [Lindeberg’94]:

 2G ( x, y )  2G ( x, y )
LoG ( x, y,  ) =  G ( x, y ) =
2
+
x 2
y 2

Lindeberg, Scale-space theory: A basic tool for analysing structures at different scales, Journal of Applied Statistics, 1994. PDF. 25
Automatic Scale Selection
The correct scale(s) is (are) found as local extrema across consecutive smoothed patches

Scale
(i.e., 𝜎 of the LoG)

Scale (𝜎)

26
Outline
• How to achieve descriptor invariance to:
• Scale
• Rotation
• Viewpoint
• The SIFT blob detector and descriptor
• Other corner and blob detectors and descriptors

27
How to achieve invariance to Rotation
Derotation:
• Determine patch orientation
e.g., eigenvectors of M matrix of Harris or
dominant gradient direction (see next slide)
• Derotate patch through “patch warping”
This puts the patches into a canonical orientation

28
How to determine the patch orientation?
1. First, multiply the patch by a Gaussian kernel to make the shape circular rather than square
2. Then, compute gradients vectors at each pixel
3. Build a histogram of gradient orientations, weighted by the gradient magnitudes. This histogram is a particular case of HOG
descriptor (a grid of 1×1 cells)
4. Extract all local maxima in the histogram: each local maximum above a threshold is a candidate dominant orientation.
5. Construct a different keypoint descriptor for each dominant orientation

Dominant gradient direction

0 2𝜋

29
Outline
• How to achieve descriptor invariance to:
• Scale
• Rotation
• Viewpoint
• The SIFT blob detector and descriptor
• Other corner and blob detectors and descriptors

30
How to achieve invariance to small viewpoint changes?
Affine warping provides invariance to small view-point changes
• The second moment matrix M of the Harris detector can be used to identify the two directions of fastest
and slowest change of SSD around the feature
• Out of these two directions, an elliptic patch is extracted
• The region inside the ellipse is normalized to a canonical circular patch

Image 1 Image 2

31
Recap:
How to achieve Scale, Rotation, and Affine-invariant patch matching
1. Scale assignment: compute the scale using the LoG operator. If mutiple local extrema, assign multiple scales
2. Multiply the patch by a Gaussian kernel to make the shape circular rather than square
3. Rotation assignment: use Harris or gradient histogram to find dominant orientation. If multiple local extrema, assign
multiple orientations
4. Affine invariance: use Harris eigenvectors to extract affine transformation parameters
5. Warp the patch into a canonical patch
Image 1 Image 2

32
How to warp a patch?
• Start with an “empty” canonical patch (all pixels set to 0)
• For each pixel (𝑥, 𝑦) in the empty patch, apply the warping function 𝑾(𝒙, 𝒚)
to compute the corresponding position in the source image. It will be in
floating point and will fall between the image pixels.
• Interpolate the intensity values of the 4 closest pixels in the detected image
using either of:
• Nearest neighbor interpolation
• Bilinear interpolation
• Bicubic interpolation

33
Example: Similarity Transformation (rotation, translation, rescaling)

• Warping function 𝑊: rotation (𝜃) plus rescaling (𝑠) and translation (𝑎, 𝑏):

𝑥’ = 𝑠(𝑥 cos𝜃 – 𝑦 sin𝜃) + 𝑎


𝑦’ = 𝑠(𝑥 sin𝜃 + 𝑦 cos𝜃) + 𝑏

(𝑥’, 𝑦’)
𝑊
(𝑥, 𝑦)

Empty canonical patch


Patch detected in the image
34
Example: Rescaling

35
Nearest Neighbor vs Bilinear vs Bicubic Interpolation

36
Bilinear Interpolation
• It is an extension of linear interpolation for interpolating functions of two variables (e.g., 𝑥 and 𝑦) on a
rectilinear 2D grid.
• The key idea is to perform linear interpolation first in one direction, and then again in the other direction.
• Although each step is linear in the sampled values and in the position, the interpolation as a whole is not
linear but rather quadratic in the sample location.
𝑥
𝐼(0,0) 𝐼(1,0) 𝐼 𝑥, 𝑦 = 𝐼 0,0 1−𝑥 1−𝑦 +
𝑦 This formula
𝐼 0,1 1−𝑥 𝑦 +
won’t be asked
𝐼 1,0 𝑥 1−𝑦 +
at the exam ☺
𝐼(0,1) 𝐼(1,1) 𝐼 1,1 𝑥 𝑦

In this geometric visualization, the value at the black spot is the sum of the value at each
colored spot multiplied by the area of the rectangle of the same color.
37
Disadvantage of Patch Descriptors
• Disadvantage of patch descriptors:
• If the warp is not estimated accurately, very small errors in rotation, scale, and view-
point will affect matching score significantly
• Computationally expensive (need to unwarp every patch)

38
Outline
• Automatic Scale Selection
• The SIFT blob detector and descriptor
• Other corner and blob detectors and descriptors

39
SIFT Descriptor
• Scale Invariant Feature Transform
• Invented by David Lowe in 2004

Lowe, “Distinctive Image Features from Scale-Invariant Keypoints”, Internal Journal of Computer Vision, 2004. PDF 40
SIFT Descriptor
Descriptor computation:
• Consider a 𝟏𝟔 × 𝟏𝟔 pixel patch
• Multiply the patch by a Gaussian filter, compute dominant orientation, and de-rotate patch
• Compute HOG descriptor
• Divide patch into 4×4 cells
• Use 8 bin histograms (, i.e., 8 directions)
• Concatenate all histograms into a single 1D vector
• Resulting SIFT descriptor: 4×4×8 = 128 values
• Descriptor Matching: SSD (i.e., Euclidean-distance)
• Why 4×4 cells and why 8 bins? See later

Is HOG invariant to additive or affine illumination changes (i.e., 𝐼′ (𝑥, 𝑦) = 𝛼𝐼 𝑥, 𝑦 + 𝛽 )?


41
Descriptor Normalization
• The HOG descriptor is already invariant to additive illumination because it is based on
gradients
• To make it invariant affine illumination changes, the descriptor vector 𝒗 is then
normalized such that its 𝐿2 norm is 1:

𝒗
ഥ=
𝒗
σ𝑛𝑖 𝑣𝑖2

• We can conclude that the SIFT descriptor is invariant to affine illumination changes

42
SIFT Matching Robustness
• Can handle severe viewpoint changes (up to 50 degree out-of-plane rotation)
• Can handle even severe non affine changes in illumination (low to bright scenes)
• Computationally expensive: 10 frames per second (fps) on an i7 processor
• Original SIFT binary files: http://people.cs.ubc.ca/~lowe/keypoints
• OpenCV C/C++ implementation: https://docs.opencv.org/master/da/df5/tutorial_py_sift_intro.html

43
SIFT Detector
• SIFT uses the Difference of Gaussian (DoG) kernel instead of Laplacian of Gaussian (LoG) because
computationally cheaper
LOG ( x, y )  DoG( x, y ) = Gk ( x, y ) − G ( x, y )

𝜕𝐺𝜎
• The proof that LoG can be approximated by a difference of Gaussian comes from the Heat Equation: = 𝜎 ∇2 𝐺𝜎
𝜕𝜎
44
SIFT Detector (location + scale)
SIFT keypoints: local extrema in both space and scale of the DoG images
• Each pixel is compared to 26 neighbors (below in green): its 8 neighbors in the current image + 9 neighbors
in the adjacent upper scale + 9 neighbors in the adjacent lower scale
• If the pixel is a global maximum or minimum (i.e., extrema) with respect to its 26 neighbors then it is
selected as SIFT feature

For each extrema, the output of the


SIFT detector is the location (𝑥, 𝑦)
and the scale 𝑠

45
Example

46
DoG Images example

Magnitude of (𝐺(𝑘𝜎) − 𝐺(𝜎)) | 𝑠 = 4; 𝜎 = 1.6 |

47
DoG Images example

Magnitude of (𝐺(𝑘 2 𝜎) − 𝐺 𝑘𝜎 )| 𝑠 = 4; 𝜎 = 1.6 |

48
DoG Images example

Magnitude of (𝐺(𝑘 3 𝜎) − 𝐺 𝑘 2 𝜎 )| 𝑠 = 4; 𝜎 = 1.6 |

49
DoG Images example

Magnitude of (𝐺(𝑘 4 𝜎) − 𝐺(𝑘 3 𝜎)) | 𝑠 = 4; 𝜎 = 1.6 |

50
DoG Images example

Magnitude of (𝐺 𝑘 5 𝜎 − 𝐺 𝑘 4 𝜎 ) | 𝑠 = 4; 𝜎 = 1.6 |
(second octave shown at the input resolution for convenience)

51
DoG Images example

Magnitude of (𝐺 𝑘 6 𝜎 − 𝐺 𝑘 5 𝜎 ) | 𝑠 = 4; 𝜎 = 1.6 |
(second octave shown at the input resolution for convenience)

52
DoG Images example

Magnitude of (𝐺 𝑘 7 𝜎 − 𝐺 𝑘 6 𝜎 ) | 𝑠 = 4; 𝜎 = 1.6 |
(second octave shown at the input resolution for convenience)

53
DoG Images example

Magnitude of (𝐺 𝑘 8 𝜎 − 𝐺 𝑘 7 𝜎 ) | 𝑠 = 4; 𝜎 = 1.6 |
(second octave shown at the input resolution for convenience)

54
DoG Images example

Magnitude of (𝐺 𝑘 9 𝜎 − 𝐺 𝑘 8 𝜎 ) | 𝑠 = 4; 𝜎 = 1.6 |
(third octave shown at the input resolution for convenience)

55
Local extrema of DoG images across Scale and Space

What are SIFT features like?


Hint: Remember the definition of filtering as template matching
56
How it is implemented in practice
1. Build a Space-Scale Pyramid:
G(𝑘𝜎)
• The initial image is incrementally convolved with Gaussians
G(𝑘 𝑖 𝜎) to produce blurred images separated by a constant
G(𝑘𝜎)
factor 𝑘 in scale space (shown stacked in the left column).
• The initial Gaussian G(𝜎) has 𝜎=1.6
1
• 𝑘 is chosen: 𝑘 = 2 Τ𝑠 , where 𝑠 is the number of intervals G(𝑘𝜎)
into which each octave of scale space is divided
• For efficiency reasons, when 𝑘 𝑖 equals 2, the image is G(𝑘 4 𝜎)

downsampled by a factor of 2 and then the procedure is G(𝑘 3 𝜎)


Scale 4
repeated again up to 5 octaves (pyramid levels) G(𝑘 2 𝜎)

G(𝑘𝜎) G(𝑘𝜎) Scale 3


• Adjacent blurred images are then subtracted to produce the G(𝜎)

Difference-of-Gaussian (DoG) images G(𝑘 4 𝜎)

2. Scale-Space extrema detection G(𝑘 3 𝜎) Scale 2

• Detect local maxima and minima in space-scales (see previous G(𝑘 2 𝜎)


Scale 1
slide) G(k𝜎)

G(𝑘𝜎) G(𝜎)

57
58
SIFT: Recap
• SIFT: Scale Invariant Feature Transform
• An approach to detect and describe regions of interest in an image.
• SIFT detector = DoG detector
• SIFT features are invariant to 2D rotation, and reasonably invariant to
rescaling, viewpoint changes (up to 50 degrees), and illumination
• It runs in real-time but expensive (10 Hz on an i7 laptop)
• The expensive steps are the scale detection and descriptor extraction

59
Original SIFT Demo by David Lowe
Download original SIFT binaries and Matlab function from :
http://people.cs.ubc.ca/~lowe/keypoints

>>[image1, descriptor1s, locs1] = sift('scene.pgm');


>>showkeys(image1, locs1);
>>[image2, descriptors2, locs2] = sift('book.pgm');
>>showkeys(image2, locs2);
>>match('scene.pgm','book.pgm');

60
What’s the output of SIFT?
• Descriptor: 4x4x8 = 128-element 1D vector
• Location (pixel coordinates of the center of the patch): 2D vector
• Scale (i.e., size) of the patch: 1 scalar value (high scale corresponds to high blur in the space-scale pyramid)
• Orientation (i.e., angle of the patch): 1 scalar value

61
SIFT Repeatability with Viewpoint Changes

Repeatability=

# correspondences detected
# correspondences present

Lowe, “Distinctive Image Features from Scale-Invariant Keypoints”, Internal Journal of Computer Vision, 2004. PDF 62
SIFT Repeatability with Number of Scales per Octave

Repeatability=

# correspondences detected
# correspondences present

Lowe, “Distinctive Image Features from Scale-Invariant Keypoints”, Internal Journal of Computer Vision, 2004. PDF 63
Influence of Number of Orientations and Number of Sub-patches

The graph shows that a single orientation histogram (n = 1) is very poor at discriminating.
The results improve with a 4x4 array of histograms with 8 orientations.

4x4 HOGs

Lowe, “Distinctive Image Features from Scale-Invariant Keypoints”, Internal Journal of Computer Vision, 2004. PDF 64
Application of SIFT to Object recognition
• Can be implemented easily by returning object with the largest number of
correspondences with the template
• For planar objects, 4-point RANSAC can be used to remove outliers (see Lecture 8).
• For rigid 3D objects, 5-point RANSAC (see Lecture 8).

65
Application of SIFT to Panorama Stitching

AutoStitch: http://matthewalunbrown.com/autostitch/autostitch.html
M. Brown and D. G. Lowe. Recognising Panoramas, International Conference on Computer Vision (ICCV), 2003. PDF. 66
Main questions
• What features are repeatable and distinctive?
• How to describe a feature?
• How to establish correspondences, i.e., compute matches?

67
Feature Matching

68
Feature Matching
• Given a feature in 𝐼1, how to find the best match in 𝐼2?
1. Define distance function that compares two descriptors ((Z)SSD, (Z)SAD, (Z)NCC or Hamming distance for binary
descriptors (e.g., Census, HOG, ORB, BRIEF, BRISK, FREAK)
2. Brute-force matching:
1. Compare each feature in 𝐼1 against all the features in 𝐼2 (𝑁 2 comparisons, where 𝑁 is the number of
features in each image)
2. Take the one at minimum distance, i.e., the closest descriptor

𝐼1 𝐼2

69
Feature Matching
• Issues with closest descriptor: can occasionally return good scores for false matches
• Better approach: compute ratio of distances to 1st to 2nd closest descriptor

𝑑1
< 𝑇ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 (𝑢𝑠𝑢𝑎𝑙𝑙𝑦 0.8)
𝑑2
where:

𝑑1 is the distance from the closest descriptor


𝑑2 is the distance of the 2nd closest descriptor

70
Distance Ratio: Explanation
• In SIFT, the nearest neighbor is defined as the keypoint with minimum Euclidean distance. However, many
features in Image 1 may not have any a correct match in Image 2 because they arise from background
clutter or were not detected in Image 1.
• An effective measure is obtained by comparing the distance of the closest neighbor to that of the second-
closest neighbor. This measure performs well because correct matches need to have the closest neighbor
significantly closer than the closest incorrect match to achieve reliable matching.
• For false matches, there will likely be a number of other false matches within similar distances due to the
high dimensionality of the feature space (this problem is known as curse of dimensionality). We can think
of the second-closest match as providing an estimate of the density of false matches within this portion of
the feature space and at the same time identifying specific instances of feature ambiguity.

71
SIFT Feature Matching: Distance Ratio
The SIFT paper recommends to use a threshold on 0.8. Where does this come
from?

“A threshold of 0.8 eliminates 90% of the


incorrect matches while discarding less
than 5% of the correct matches.”

“This figure was generated by matching


images following random scale and
orientation change, with viewpoint change
of 30 degrees, and addition of 2% image
noise, against a database of 40,000
keypoints.”

72
Outline
• Automatic Scale Selection
• The SIFT blob detector and descriptor
• Other corner and blob detectors and descriptors

73
“FAST” Corner Detector
• FAST: Features from Accelerated Segment Test
• Analyses intensities along a ring of 16 pixels centered on
the pixel of interest 𝒑
• 𝒑 is a FAST corner if a set of N contiguous pixels on the
ring are:
• all brighter than the pixel intensity 𝑰(𝒑) + 𝒕𝒉𝒓𝒆𝒔𝒉𝒐𝒍𝒅,
• or all darker than 𝑰 𝒑 − 𝒕𝒉𝒓𝒆𝒔𝒉𝒐𝒍𝒅
• Common value of N: 12
• A simple classifier is used to check the quality of corners and reject the weak ones
• FAST is the fastest corner detector ever made: can process 100 million pixels per second (<3ms per image)
• Issue: it is very sensitive to image noise (high in low light). This is why Harris is still more common despite a bit slower
• In fact, FAST was initially proposed to find candidate corner regions to scout with the Harris detector

Rosten, Drummond, Fusing points and lines for high performance tracking, International Conference on Computer Vision (ICCV), 2005. PDF.

Rosten, Porter, Drummond, “Faster and better: a machine learning approach to corner detection”,
IEEE Trans. Pattern Analysis and Machine Intelligence, 2010. PDF. 74
“SURF” Blob Detector & Descriptor
Original second order partial derivatives of
• SURF: Speeded Up Robust Features a Gaussian
• Similar to SIFT but much faster
• Basic idea: approximate Gaussian and DoG filters using box filters
• Results comparable with SIFT, plus:
• Faster computation
𝜕 2 𝐺(𝑥, 𝑦) 𝜕 2 𝐺(𝑥, 𝑦)
• Generally shorter descriptors 𝜕𝑦 2 𝜕𝑥𝜕𝑦

SURF Approximation using box filters

Bay, Tuytelaars, Van Gool, " Speeded Up Robust Features ", European Conference on Computer Vision (ECCV) 2006. PDF. 75
“BRIEF” Descriptor (can be applied to corners or blobs)
• BRIEF: Binary Robust Independent Elementary Features

• Goal: high speed description computation and matching

• Binary descriptor formation:


• Smooth image
• for each detected keypoint (e.g. FAST),
• sample 128 intensity pairs (𝑝1 𝑖 , 𝑝2 𝑖 ) (𝑖 = 1, … , 128)
within a squared patch around the keypoint
• Create an empty 128-element descriptor
• for each 𝑖𝑡ℎ pair
• if 𝐼𝑝1 𝑖 < 𝐼𝑝2 𝑖 then set 𝑖𝑡ℎ bit of descriptor to 1
• else to 0

• The pattern is generated randomly (or learned) only once; then, the same pattern is
used for all patches
• Pros: Binary descriptor: allows very fast Hamming distance matching (count of the
number of bits that are different in the descriptors matched) Pattern for intensity pair samples –
• Cons: Not scale/rotation invariant generated randomly

Calonder, Lepetit, Strecha, Fua, BRIEF: Binary Robust Independent Elementary Features,
76
European Conference on Computer Vision (ECCV), 2010. PDF.
“ORB” Descriptor (can be applied to corners or blobs)
• ORB: Oriented FAST and Rotated BRIEF
• Keypoint detector originally based on FAST
• Binary descriptor based on BRIEF but adds an
orientation component to make it rotation
invariant

Rublee,Rabaud, Konolige, Bradski,“ORB: an efficient alternative to SIFT or SURF".


IEEE International Conference on Computer Vision (ICCV), 2011. PDF. 77
“BRISK” Descriptor (can be applied to corners or blobs)
• BRISK: Binary Robust Invariant Scalable Keypoints
• Keypoint detector based on FAST
• Binary descriptor
• Both rotation and scale invariant
• Binary descriptor, formed by pairwise intensity comparisons (like
BRIEF) but on a radially symmetric sampling pattern
• Red circles: size of the smoothing kernel applied
• Blue circles: smoothed pixel value used
• Detection and descriptor speed: 10 times faster than SURF
• Slower than BRIEF, but scale- and rotation- invariant

Leutenegger, Chli, Siegwart. BRISK: Binary Robust invariant scalable keypoints, ICCV 2011. PDF 78
“FREAK” Descriptor (can be applied to corners or blobs)
• FREAK: Fast Retina Keypoint
• Rotation and scale invariant
• Binary descriptor
• Sampling pattern similar to BRISK but uses a more pronounced “retinal” (i.e.,
log-polar) sampling pattern inspired by the human retina: higher density of
points near the center
• Pairwise intensity comparisons form binary strings similar to BRIEF Human retina
• Pairs are learned (as in ORB)
• Circles indicate size of smoothing kernel
• Coarse-to-fine matching (cascaded approach): first compare the first half of
bits; if distance smaller than threshold, proceed to compare the next bits, etc.
• Faster to compute, less memory and than SIFT, SURF or BRISK
FREAK sampling pattern

Alahi, Ortiz, Vandergheynst. FREAK: Fast Retina Keypoint, Conference on Computer Vision and Pattern Recognition (CVPR), 2012. PDF. 79
“LIFT” Descriptor (can be applied to corners or blobs)
• LIFT: Learned Invariant Feature Transform
• Learning-based descriptor
• Rotation, scale, viewpoint and illumination invariant
• First a network predicts the patch orientation which is used to derotate
the patch.
• Then another neural network is used to generate a patch descriptor (128
Keypoints with scales
dimensional) from the derotated patch.
and orientations
• Illumination invariance is achieved by randomizing illuminations during
training.
• LIFT descriptor beats SIFT in repeatability CNN

neural network
predicts descriptor
Kwang Moo Yi, Eduard Trulls, Vincent Lepetit, Pascal Fua,
LIFT: Learned Invariant Feature Transform, European Conference on Computer Vision (ECCV) 2016. PDF. 80
LIFT vs SIFT

https://youtu.be/hhxAttChmCo 81
“SuperPoint”: Self-Supervised Interest Point Detection and Description

• Joint regression of keypoint location and descriptor. Self-supervised.


• Trained on synthetic images and refined on homographies of real images
• Detector less accurate than SIFT and LIFT, but descriptor outperforms SIFT and LIFT
• But slower than SIFT and LIFT

Detone, Malisiewicz, Rabinovich. SuperPoint: Self-Supervised Interest Point Detection and Description. CVPRW 2018. PDF.
82
Recap Table
Detector Localization Accuracy Descriptor that can be used Efficiency Relocalization & Loop closing
of the detector

Harris ++++ Patch +++ +


SIFT/LIFT + +++++
BRIEF ++++ +++
ORB ++++ ++++
BRISK +++ +++
FREAK ++++ ++++
Shi-Tomasi ++++ Patch ++ +
SIFT + +++++
BRIEF ++++ +++
ORB ++++ ++++
BRISK +++ +++
FREAK ++++ ++++
FAST ++++ Patch ++++ +
SIFT/LIFT + +++++
BRIEF ++++ +++
ORB ++++ ++++
BRISK +++ +++
FREAK ++++ ++++
SIFT +++ SIFT + ++++

SURF +++ SURF ++ ++++

SuperPoint ++ SuperPoint + +++++


83
Summary (things to remember)
• Similarity metrics: NCC (ZNCC), SSD (ZSSD), SAD (ZSAD), Census Transform
• Point feature detection
• Properties and invariance to transformations
• Challenges: rotation, scale, view-point, and illumination changes
• Extraction
• Moravec
• Harris and Shi-Tomasi
• Rotation invariance
• Automatic Scale selection
• Descriptor
• Intensity patches
• Canonical representation: how to make them invariant to transformations: rotation, scale, illumination, and view-
point (affine)
• Better solution: Histogram of oriented gradients: SIFT descriptor
• Matching
• (Z)SSD, SAD, NCC, Hamming distance (last one only for binary descriptors)
ratio 1st /2nd closest descriptor
• Depending on the task, you may want to trade off repeatability and robustness for speed: approximated solutions, combinations
of efficient detectors and descriptors.
• Fast corner detector: FAST;
• Keypoint descriptors faster than SIFT: SURF, BRIEF, ORB, BRISK

84
Readings
• Ch. 7.1 of Szeliski book, 2nd Edition
• Chapter 4 of Autonomous Mobile Robots book: link
• Ch. 13.3 of Peter Corke book

85
Understanding Check
Are you able to answer:
• How does automatic scale selection work?
• What are the good and the bad properties that a function for automatic scale selection should have or not
have?
• How can we implement scale invariant detection efficiently? (show that we can do this by resampling the
image vs rescaling the kernel).
• What is a feature descriptor? (patch of intensity value vs histogram of oriented gradients). How do we
match descriptors?
• How is the keypoint detection done in SIFT and how does this differ from Harris?
• How does SIFT achieve orientation invariance?
• How is the SIFT descriptor built?
• What is the repeatability of the SIFT detector after a rescaling of 2? And for a 50 degrees viewpoint change?
• Illustrate the 1st to 2nd closest ratio of SIFT detection: what’s the intuitive reasoning behind it? Where does
the 0.8 factor come from?
• How does the FAST detector work? What are its pros and cons compared with Harris?
86

You might also like