0% found this document useful (0 votes)

5 views

06_feature_detection_2

The document discusses vision algorithms for mobile robotics, focusing on point feature detection and matching, particularly using SIFT (Scale-Invariant Feature Transform). It covers key concepts such as feature descriptor invariance to scale, rotation, and viewpoint changes, along with methods for automatic scale selection and descriptor normalization. The SIFT descriptor is highlighted for its robustness against severe viewpoint and illumination changes, while also noting its computational expense.

Uploaded by

Thet Hsu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

06_feature_detection_2

Uploaded by

Thet Hsu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 86

Vision Algorithms for Mobile Robotics

Lecture 06
Point Feature Detection and Matching – Part 2

Davide Scaramuzza
http://rpg.ifi.uzh.ch 1
Lab Exercise 4 – Today
Implement SIFT blob detection and matching

2
Main questions
• What features are repeatable and distinctive?
• How to describe a feature?
• How to establish correspondences, i.e., compute matches?

3
Feature Matching
For each point, how to match its corresponding point in the other image?
• Brute-force Matching: compare each feature descriptor of Image 1 against the descriptor of each
feature in Image 2 and assign as correspondence the feature with closest descriptor (e.g.,
minimum of SSD). If each image contains N features, we need to perform 𝑵𝟐 comparisons.

Image 1 Image 2
4
Recall: Patch and Census Descriptors
• Patch descriptor
(i.e., patch of intensity values, integer values)

• Census descriptor (binary values)

5
HOG Descriptor (Histogram of Oriented Gradients)
• The patch is divided into a grid of cells and for each cell a histogram of gradient directions is compiled.
• The HOG descriptor is the concatenation of these histograms (used in SIFT)
• Differently from the patch and Census descriptors, HOG has float values.

0 2p
Example of gradient histogram with 8 orientation bins.
Each vote is weighted by the gradient magnitude

HOG Descriptor: …
(1D vector)
0 2p 0 2p 0 2p
6
Feature Descriptor Invariance
Are feature descriptors invariant (robust) to geometric and photometric changes?

Geometric changes: scale, rotation, viewpoint Photometric changes: illumination

Image 1 Image 2 Image 1 Image 2

7
Outline
• How to achieve descriptor invariance to:
• Scale
• Rotation
• Viewpoint
• The SIFT blob detector and descriptor
• Other corner and blob detectors and descriptors

8
Scale changes
How can we match image patches corresponding to the same feature but belonging to
images taken at different scales?

Image 1 Image 2

9
Scale changes
How can we match image patches corresponding to the same feature but belonging to
images taken at different scales? Possible solution: rescale the patch

Image 1 Image 2

10
Scale changes
How can we match image patches corresponding to the same feature but belonging to
images taken at different scales? Possible solution: rescale the patch

Image 1 Image 2

11
Scale changes
How can we match image patches corresponding to the same feature but belonging to
images taken at different scales? Possible solution: rescale the patch

Image 1 Image 2

12
Scale changes
• Scale search is time consuming (needs to be done individually for all patches
in one image)
• Complexity is 𝑁 2 𝑆 assuming 𝑁 features per image and 𝑆 rescalings per
feature
• Solution: automatic scale selection: automatically assign each feature its
own “scale” (i.e., size)

13
Automatic Scale Selection
• Idea: Design a function on the image patch, which is scale invariant (i.e., it has the same
value for corresponding patches, even if they are at different scales)

f Image 1 f Image 2
scale = 1/2

patch size patch size

14
Automatic Scale Selection
• Idea: Design a function on the image patch, which is scale invariant (i.e., it has the same
value for corresponding patches, even if they are at different scales)
• Find local extrema of this function
• The patch size at which the local extremum is reached should be invariant to image rescaling
• Important: this scale invariant patch size is found in each image independently

f Image 1 f Image 2
scale = 1/2

s1 patch size s2 patch size

15
Automatic Scale Selection: Example
Image 1 Image 2

f ( I ( x, y,  )) f ( I ( x' , y ' ,  ' ))

𝜎 𝜎′
16
Automatic Scale Selection: Example
Image 1 Image 2

f ( I ( x, y,  )) f ( I ( x' , y ' ,  ' ))

𝜎 𝜎′
17
Automatic Scale Selection: Example
Image 1 Image 2

f ( I ( x, y,  )) f ( I ( x' , y ' ,  ' ))

𝜎 𝜎′
18
Automatic Scale Selection: Example
Image 1 Image 2

f ( I ( x, y,  )) f ( I ( x' , y ' ,  ' ))

𝜎 𝜎′
19
Automatic Scale Selection: Example
Image 1 Image 2

f ( I ( x, y,  )) f ( I ( x' , y ' ,  ' ))

𝜎 𝜎′
20
Automatic Scale Selection: Example
Image 1 Image 2

f ( I ( x, y,  )) f ( I ( x' , y ' ,  ' ))

s1 𝜎 s2 𝜎′
21
Automatic Scale Selection: Example
Image 1 Image 2

When the right scale is found, the patches must be normalized to

a canonical size so that they can be compared by SSD.
Patch normalization is done via warping.
22
Automatic Scale Selection: Example
Image 1 Image 2

23
Automatic Scale Selection
• A “good” function for scale detection should have a single & sharp peak

f f f Very good!
bad
Good or Bad?
patch size patch size patch size

• What if there are multiple peaks? Is it really a problem?

• What is a good function?

• Sharp intensity changes are good regions to monitor in order to identify the scale

24
Automatic Scale Selection
• The ideal function for determining the scale is one that highlights sharp discontinuities
• Solution: convolve image with a kernel that highlights edges

f = Kernel  Image

• It has been shown that the Laplacian of Gaussian kernel is optimal under certain
assumptions [Lindeberg’94]:

 2G ( x, y )  2G ( x, y )
LoG ( x, y,  ) =  G ( x, y ) =
2
+
x 2
y 2

Lindeberg, Scale-space theory: A basic tool for analysing structures at different scales, Journal of Applied Statistics, 1994. PDF. 25
Automatic Scale Selection
The correct scale(s) is (are) found as local extrema across consecutive smoothed patches

Scale
(i.e., 𝜎 of the LoG)

Scale (𝜎)

26
Outline
• How to achieve descriptor invariance to:
• Scale
• Rotation
• Viewpoint
• The SIFT blob detector and descriptor
• Other corner and blob detectors and descriptors

27
How to achieve invariance to Rotation
Derotation:
• Determine patch orientation
e.g., eigenvectors of M matrix of Harris or
dominant gradient direction (see next slide)
• Derotate patch through “patch warping”
This puts the patches into a canonical orientation

28
How to determine the patch orientation?
1. First, multiply the patch by a Gaussian kernel to make the shape circular rather than square
2. Then, compute gradients vectors at each pixel
3. Build a histogram of gradient orientations, weighted by the gradient magnitudes. This histogram is a particular case of HOG
descriptor (a grid of 1×1 cells)
4. Extract all local maxima in the histogram: each local maximum above a threshold is a candidate dominant orientation.
5. Construct a different keypoint descriptor for each dominant orientation

Dominant gradient direction

0 2𝜋

29
Outline
• How to achieve descriptor invariance to:
• Scale
• Rotation
• Viewpoint
• The SIFT blob detector and descriptor
• Other corner and blob detectors and descriptors

30
How to achieve invariance to small viewpoint changes?
Affine warping provides invariance to small view-point changes
• The second moment matrix M of the Harris detector can be used to identify the two directions of fastest
and slowest change of SSD around the feature
• Out of these two directions, an elliptic patch is extracted
• The region inside the ellipse is normalized to a canonical circular patch

Image 1 Image 2

31
Recap:
How to achieve Scale, Rotation, and Affine-invariant patch matching
1. Scale assignment: compute the scale using the LoG operator. If mutiple local extrema, assign multiple scales
2. Multiply the patch by a Gaussian kernel to make the shape circular rather than square
3. Rotation assignment: use Harris or gradient histogram to find dominant orientation. If multiple local extrema, assign
multiple orientations
4. Affine invariance: use Harris eigenvectors to extract affine transformation parameters
5. Warp the patch into a canonical patch
Image 1 Image 2

32
How to warp a patch?
• Start with an “empty” canonical patch (all pixels set to 0)
• For each pixel (𝑥, 𝑦) in the empty patch, apply the warping function 𝑾(𝒙, 𝒚)
to compute the corresponding position in the source image. It will be in
floating point and will fall between the image pixels.
• Interpolate the intensity values of the 4 closest pixels in the detected image
using either of:
• Nearest neighbor interpolation
• Bilinear interpolation
• Bicubic interpolation

33
Example: Similarity Transformation (rotation, translation, rescaling)

• Warping function 𝑊: rotation (𝜃) plus rescaling (𝑠) and translation (𝑎, 𝑏):

𝑥’ = 𝑠(𝑥 cos𝜃 – 𝑦 sin𝜃) + 𝑎

𝑦’ = 𝑠(𝑥 sin𝜃 + 𝑦 cos𝜃) + 𝑏

(𝑥’, 𝑦’)
𝑊
(𝑥, 𝑦)

Empty canonical patch

Patch detected in the image
34
Example: Rescaling

35
Nearest Neighbor vs Bilinear vs Bicubic Interpolation

36
Bilinear Interpolation
• It is an extension of linear interpolation for interpolating functions of two variables (e.g., 𝑥 and 𝑦) on a
rectilinear 2D grid.
• The key idea is to perform linear interpolation first in one direction, and then again in the other direction.
• Although each step is linear in the sampled values and in the position, the interpolation as a whole is not
linear but rather quadratic in the sample location.
𝑥
𝐼(0,0) 𝐼(1,0) 𝐼 𝑥, 𝑦 = 𝐼 0,0 1−𝑥 1−𝑦 +
𝑦 This formula
𝐼 0,1 1−𝑥 𝑦 +
won’t be asked
𝐼 1,0 𝑥 1−𝑦 +
at the exam ☺
𝐼(0,1) 𝐼(1,1) 𝐼 1,1 𝑥 𝑦

In this geometric visualization, the value at the black spot is the sum of the value at each
colored spot multiplied by the area of the rectangle of the same color.
37
Disadvantage of Patch Descriptors
• Disadvantage of patch descriptors:
• If the warp is not estimated accurately, very small errors in rotation, scale, and view-
point will affect matching score significantly
• Computationally expensive (need to unwarp every patch)

38
Outline
• Automatic Scale Selection
• The SIFT blob detector and descriptor
• Other corner and blob detectors and descriptors

39
SIFT Descriptor
• Scale Invariant Feature Transform
• Invented by David Lowe in 2004

Lowe, “Distinctive Image Features from Scale-Invariant Keypoints”, Internal Journal of Computer Vision, 2004. PDF 40
SIFT Descriptor
Descriptor computation:
• Consider a 𝟏𝟔 × 𝟏𝟔 pixel patch
• Multiply the patch by a Gaussian filter, compute dominant orientation, and de-rotate patch
• Compute HOG descriptor
• Divide patch into 4×4 cells
• Use 8 bin histograms (, i.e., 8 directions)
• Concatenate all histograms into a single 1D vector
• Resulting SIFT descriptor: 4×4×8 = 128 values
• Descriptor Matching: SSD (i.e., Euclidean-distance)
• Why 4×4 cells and why 8 bins? See later

Is HOG invariant to additive or affine illumination changes (i.e., 𝐼′ (𝑥, 𝑦) = 𝛼𝐼 𝑥, 𝑦 + 𝛽 )?

41
Descriptor Normalization
• The HOG descriptor is already invariant to additive illumination because it is based on
gradients
• To make it invariant affine illumination changes, the descriptor vector 𝒗 is then
normalized such that its 𝐿2 norm is 1:

𝒗
ഥ=
𝒗
σ𝑛𝑖 𝑣𝑖2

• We can conclude that the SIFT descriptor is invariant to affine illumination changes

42
SIFT Matching Robustness
• Can handle severe viewpoint changes (up to 50 degree out-of-plane rotation)
• Can handle even severe non affine changes in illumination (low to bright scenes)
• Computationally expensive: 10 frames per second (fps) on an i7 processor
• Original SIFT binary files: http://people.cs.ubc.ca/~lowe/keypoints
• OpenCV C/C++ implementation: https://docs.opencv.org/master/da/df5/tutorial_py_sift_intro.html

43
SIFT Detector
• SIFT uses the Difference of Gaussian (DoG) kernel instead of Laplacian of Gaussian (LoG) because
computationally cheaper
LOG ( x, y )  DoG( x, y ) = Gk ( x, y ) − G ( x, y )

𝜕𝐺𝜎
• The proof that LoG can be approximated by a difference of Gaussian comes from the Heat Equation: = 𝜎 ∇2 𝐺𝜎
𝜕𝜎
44
SIFT Detector (location + scale)
SIFT keypoints: local extrema in both space and scale of the DoG images
• Each pixel is compared to 26 neighbors (below in green): its 8 neighbors in the current image + 9 neighbors
in the adjacent upper scale + 9 neighbors in the adjacent lower scale
• If the pixel is a global maximum or minimum (i.e., extrema) with respect to its 26 neighbors then it is
selected as SIFT feature

For each extrema, the output of the

SIFT detector is the location (𝑥, 𝑦)
and the scale 𝑠

45
Example

46
DoG Images example

Magnitude of (𝐺(𝑘𝜎) − 𝐺(𝜎)) | 𝑠 = 4; 𝜎 = 1.6 |

47
DoG Images example

Magnitude of (𝐺(𝑘 2 𝜎) − 𝐺 𝑘𝜎 )| 𝑠 = 4; 𝜎 = 1.6 |

48
DoG Images example

Magnitude of (𝐺(𝑘 3 𝜎) − 𝐺 𝑘 2 𝜎 )| 𝑠 = 4; 𝜎 = 1.6 |

49
DoG Images example

Magnitude of (𝐺(𝑘 4 𝜎) − 𝐺(𝑘 3 𝜎)) | 𝑠 = 4; 𝜎 = 1.6 |

50
DoG Images example

Magnitude of (𝐺 𝑘 5 𝜎 − 𝐺 𝑘 4 𝜎 ) | 𝑠 = 4; 𝜎 = 1.6 |
(second octave shown at the input resolution for convenience)

51
DoG Images example

Magnitude of (𝐺 𝑘 6 𝜎 − 𝐺 𝑘 5 𝜎 ) | 𝑠 = 4; 𝜎 = 1.6 |
(second octave shown at the input resolution for convenience)

52
DoG Images example

Magnitude of (𝐺 𝑘 7 𝜎 − 𝐺 𝑘 6 𝜎 ) | 𝑠 = 4; 𝜎 = 1.6 |
(second octave shown at the input resolution for convenience)

53
DoG Images example

Magnitude of (𝐺 𝑘 8 𝜎 − 𝐺 𝑘 7 𝜎 ) | 𝑠 = 4; 𝜎 = 1.6 |
(second octave shown at the input resolution for convenience)

54
DoG Images example

Magnitude of (𝐺 𝑘 9 𝜎 − 𝐺 𝑘 8 𝜎 ) | 𝑠 = 4; 𝜎 = 1.6 |
(third octave shown at the input resolution for convenience)

55
Local extrema of DoG images across Scale and Space

What are SIFT features like?

Hint: Remember the definition of filtering as template matching
56
How it is implemented in practice
1. Build a Space-Scale Pyramid:
G(𝑘𝜎)
• The initial image is incrementally convolved with Gaussians
G(𝑘 𝑖 𝜎) to produce blurred images separated by a constant
G(𝑘𝜎)
factor 𝑘 in scale space (shown stacked in the left column).
• The initial Gaussian G(𝜎) has 𝜎=1.6
1
• 𝑘 is chosen: 𝑘 = 2 Τ𝑠 , where 𝑠 is the number of intervals G(𝑘𝜎)
into which each octave of scale space is divided
• For efficiency reasons, when 𝑘 𝑖 equals 2, the image is G(𝑘 4 𝜎)

downsampled by a factor of 2 and then the procedure is G(𝑘 3 𝜎)

Scale 4
repeated again up to 5 octaves (pyramid levels) G(𝑘 2 𝜎)

G(𝑘𝜎) G(𝑘𝜎) Scale 3

• Adjacent blurred images are then subtracted to produce the G(𝜎)

Difference-of-Gaussian (DoG) images G(𝑘 4 𝜎)

2. Scale-Space extrema detection G(𝑘 3 𝜎) Scale 2

• Detect local maxima and minima in space-scales (see previous G(𝑘 2 𝜎)

Scale 1
slide) G(k𝜎)

G(𝑘𝜎) G(𝜎)

57
58
SIFT: Recap
• SIFT: Scale Invariant Feature Transform
• An approach to detect and describe regions of interest in an image.
• SIFT detector = DoG detector
• SIFT features are invariant to 2D rotation, and reasonably invariant to
rescaling, viewpoint changes (up to 50 degrees), and illumination
• It runs in real-time but expensive (10 Hz on an i7 laptop)
• The expensive steps are the scale detection and descriptor extraction

59
Original SIFT Demo by David Lowe
Download original SIFT binaries and Matlab function from :
http://people.cs.ubc.ca/~lowe/keypoints

>>[image1, descriptor1s, locs1] = sift('scene.pgm');

>>showkeys(image1, locs1);
>>[image2, descriptors2, locs2] = sift('book.pgm');
>>showkeys(image2, locs2);
>>match('scene.pgm','book.pgm');

60
What’s the output of SIFT?
• Descriptor: 4x4x8 = 128-element 1D vector
• Location (pixel coordinates of the center of the patch): 2D vector
• Scale (i.e., size) of the patch: 1 scalar value (high scale corresponds to high blur in the space-scale pyramid)
• Orientation (i.e., angle of the patch): 1 scalar value

61
SIFT Repeatability with Viewpoint Changes

Repeatability=

# correspondences detected
# correspondences present

Lowe, “Distinctive Image Features from Scale-Invariant Keypoints”, Internal Journal of Computer Vision, 2004. PDF 62
SIFT Repeatability with Number of Scales per Octave

Repeatability=

# correspondences detected
# correspondences present

Lowe, “Distinctive Image Features from Scale-Invariant Keypoints”, Internal Journal of Computer Vision, 2004. PDF 63
Influence of Number of Orientations and Number of Sub-patches

The graph shows that a single orientation histogram (n = 1) is very poor at discriminating.
The results improve with a 4x4 array of histograms with 8 orientations.

4x4 HOGs

Lowe, “Distinctive Image Features from Scale-Invariant Keypoints”, Internal Journal of Computer Vision, 2004. PDF 64
Application of SIFT to Object recognition
• Can be implemented easily by returning object with the largest number of
correspondences with the template
• For planar objects, 4-point RANSAC can be used to remove outliers (see Lecture 8).
• For rigid 3D objects, 5-point RANSAC (see Lecture 8).

65
Application of SIFT to Panorama Stitching

AutoStitch: http://matthewalunbrown.com/autostitch/autostitch.html
M. Brown and D. G. Lowe. Recognising Panoramas, International Conference on Computer Vision (ICCV), 2003. PDF. 66
Main questions
• What features are repeatable and distinctive?
• How to describe a feature?
• How to establish correspondences, i.e., compute matches?

67
Feature Matching

68
Feature Matching
• Given a feature in 𝐼1, how to find the best match in 𝐼2?
1. Define distance function that compares two descriptors ((Z)SSD, (Z)SAD, (Z)NCC or Hamming distance for binary
descriptors (e.g., Census, HOG, ORB, BRIEF, BRISK, FREAK)
2. Brute-force matching:
1. Compare each feature in 𝐼1 against all the features in 𝐼2 (𝑁 2 comparisons, where 𝑁 is the number of
features in each image)
2. Take the one at minimum distance, i.e., the closest descriptor

𝐼1 𝐼2

69
Feature Matching
• Issues with closest descriptor: can occasionally return good scores for false matches
• Better approach: compute ratio of distances to 1st to 2nd closest descriptor

𝑑1
< 𝑇ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 (𝑢𝑠𝑢𝑎𝑙𝑙𝑦 0.8)
𝑑2
where:

𝑑1 is the distance from the closest descriptor

𝑑2 is the distance of the 2nd closest descriptor

70
Distance Ratio: Explanation
• In SIFT, the nearest neighbor is defined as the keypoint with minimum Euclidean distance. However, many
features in Image 1 may not have any a correct match in Image 2 because they arise from background
clutter or were not detected in Image 1.
• An effective measure is obtained by comparing the distance of the closest neighbor to that of the second-
closest neighbor. This measure performs well because correct matches need to have the closest neighbor
significantly closer than the closest incorrect match to achieve reliable matching.
• For false matches, there will likely be a number of other false matches within similar distances due to the
high dimensionality of the feature space (this problem is known as curse of dimensionality). We can think
of the second-closest match as providing an estimate of the density of false matches within this portion of
the feature space and at the same time identifying specific instances of feature ambiguity.

71
SIFT Feature Matching: Distance Ratio
The SIFT paper recommends to use a threshold on 0.8. Where does this come
from?

“A threshold of 0.8 eliminates 90% of the

incorrect matches while discarding less
than 5% of the correct matches.”

“This figure was generated by matching

images following random scale and
orientation change, with viewpoint change
of 30 degrees, and addition of 2% image
noise, against a database of 40,000
keypoints.”

72
Outline
• Automatic Scale Selection
• The SIFT blob detector and descriptor
• Other corner and blob detectors and descriptors

73
“FAST” Corner Detector
• FAST: Features from Accelerated Segment Test
• Analyses intensities along a ring of 16 pixels centered on
the pixel of interest 𝒑
• 𝒑 is a FAST corner if a set of N contiguous pixels on the
ring are:
• all brighter than the pixel intensity 𝑰(𝒑) + 𝒕𝒉𝒓𝒆𝒔𝒉𝒐𝒍𝒅,
• or all darker than 𝑰 𝒑 − 𝒕𝒉𝒓𝒆𝒔𝒉𝒐𝒍𝒅
• Common value of N: 12
• A simple classifier is used to check the quality of corners and reject the weak ones
• FAST is the fastest corner detector ever made: can process 100 million pixels per second (<3ms per image)
• Issue: it is very sensitive to image noise (high in low light). This is why Harris is still more common despite a bit slower
• In fact, FAST was initially proposed to find candidate corner regions to scout with the Harris detector

Rosten, Drummond, Fusing points and lines for high performance tracking, International Conference on Computer Vision (ICCV), 2005. PDF.

Rosten, Porter, Drummond, “Faster and better: a machine learning approach to corner detection”,
IEEE Trans. Pattern Analysis and Machine Intelligence, 2010. PDF. 74
“SURF” Blob Detector & Descriptor
Original second order partial derivatives of
• SURF: Speeded Up Robust Features a Gaussian
• Similar to SIFT but much faster
• Basic idea: approximate Gaussian and DoG filters using box filters
• Results comparable with SIFT, plus:
• Faster computation
𝜕 2 𝐺(𝑥, 𝑦) 𝜕 2 𝐺(𝑥, 𝑦)
• Generally shorter descriptors 𝜕𝑦 2 𝜕𝑥𝜕𝑦

SURF Approximation using box filters

Bay, Tuytelaars, Van Gool, " Speeded Up Robust Features ", European Conference on Computer Vision (ECCV) 2006. PDF. 75
“BRIEF” Descriptor (can be applied to corners or blobs)
• BRIEF: Binary Robust Independent Elementary Features

• Goal: high speed description computation and matching

• Binary descriptor formation:

• Smooth image
• for each detected keypoint (e.g. FAST),
• sample 128 intensity pairs (𝑝1 𝑖 , 𝑝2 𝑖 ) (𝑖 = 1, … , 128)
within a squared patch around the keypoint
• Create an empty 128-element descriptor
• for each 𝑖𝑡ℎ pair
• if 𝐼𝑝1 𝑖 < 𝐼𝑝2 𝑖 then set 𝑖𝑡ℎ bit of descriptor to 1
• else to 0

• The pattern is generated randomly (or learned) only once; then, the same pattern is
used for all patches
• Pros: Binary descriptor: allows very fast Hamming distance matching (count of the
number of bits that are different in the descriptors matched) Pattern for intensity pair samples –
• Cons: Not scale/rotation invariant generated randomly

Calonder, Lepetit, Strecha, Fua, BRIEF: Binary Robust Independent Elementary Features,
76
European Conference on Computer Vision (ECCV), 2010. PDF.
“ORB” Descriptor (can be applied to corners or blobs)
• ORB: Oriented FAST and Rotated BRIEF
• Keypoint detector originally based on FAST
• Binary descriptor based on BRIEF but adds an
orientation component to make it rotation
invariant

Rublee,Rabaud, Konolige, Bradski,“ORB: an efficient alternative to SIFT or SURF".

IEEE International Conference on Computer Vision (ICCV), 2011. PDF. 77
“BRISK” Descriptor (can be applied to corners or blobs)
• BRISK: Binary Robust Invariant Scalable Keypoints
• Keypoint detector based on FAST
• Binary descriptor
• Both rotation and scale invariant
• Binary descriptor, formed by pairwise intensity comparisons (like
BRIEF) but on a radially symmetric sampling pattern
• Red circles: size of the smoothing kernel applied
• Blue circles: smoothed pixel value used
• Detection and descriptor speed: 10 times faster than SURF
• Slower than BRIEF, but scale- and rotation- invariant

Leutenegger, Chli, Siegwart. BRISK: Binary Robust invariant scalable keypoints, ICCV 2011. PDF 78
“FREAK” Descriptor (can be applied to corners or blobs)
• FREAK: Fast Retina Keypoint
• Rotation and scale invariant
• Binary descriptor
• Sampling pattern similar to BRISK but uses a more pronounced “retinal” (i.e.,
log-polar) sampling pattern inspired by the human retina: higher density of
points near the center
• Pairwise intensity comparisons form binary strings similar to BRIEF Human retina
• Pairs are learned (as in ORB)
• Circles indicate size of smoothing kernel
• Coarse-to-fine matching (cascaded approach): first compare the first half of
bits; if distance smaller than threshold, proceed to compare the next bits, etc.
• Faster to compute, less memory and than SIFT, SURF or BRISK
FREAK sampling pattern

Alahi, Ortiz, Vandergheynst. FREAK: Fast Retina Keypoint, Conference on Computer Vision and Pattern Recognition (CVPR), 2012. PDF. 79
“LIFT” Descriptor (can be applied to corners or blobs)
• LIFT: Learned Invariant Feature Transform
• Learning-based descriptor
• Rotation, scale, viewpoint and illumination invariant
• First a network predicts the patch orientation which is used to derotate
the patch.
• Then another neural network is used to generate a patch descriptor (128
Keypoints with scales
dimensional) from the derotated patch.
and orientations
• Illumination invariance is achieved by randomizing illuminations during
training.
• LIFT descriptor beats SIFT in repeatability CNN

neural network
predicts descriptor
Kwang Moo Yi, Eduard Trulls, Vincent Lepetit, Pascal Fua,
LIFT: Learned Invariant Feature Transform, European Conference on Computer Vision (ECCV) 2016. PDF. 80
LIFT vs SIFT

https://youtu.be/hhxAttChmCo 81
“SuperPoint”: Self-Supervised Interest Point Detection and Description

• Joint regression of keypoint location and descriptor. Self-supervised.

• Trained on synthetic images and refined on homographies of real images
• Detector less accurate than SIFT and LIFT, but descriptor outperforms SIFT and LIFT
• But slower than SIFT and LIFT

Detone, Malisiewicz, Rabinovich. SuperPoint: Self-Supervised Interest Point Detection and Description. CVPRW 2018. PDF.
82
Recap Table
Detector Localization Accuracy Descriptor that can be used Efficiency Relocalization & Loop closing
of the detector

Harris ++++ Patch +++ +

SIFT/LIFT + +++++
BRIEF ++++ +++
ORB ++++ ++++
BRISK +++ +++
FREAK ++++ ++++
Shi-Tomasi ++++ Patch ++ +
SIFT + +++++
BRIEF ++++ +++
ORB ++++ ++++
BRISK +++ +++
FREAK ++++ ++++
FAST ++++ Patch ++++ +
SIFT/LIFT + +++++
BRIEF ++++ +++
ORB ++++ ++++
BRISK +++ +++
FREAK ++++ ++++
SIFT +++ SIFT + ++++

SURF +++ SURF ++ ++++

SuperPoint ++ SuperPoint + +++++

83
Summary (things to remember)
• Similarity metrics: NCC (ZNCC), SSD (ZSSD), SAD (ZSAD), Census Transform
• Point feature detection
• Properties and invariance to transformations
• Challenges: rotation, scale, view-point, and illumination changes
• Extraction
• Moravec
• Harris and Shi-Tomasi
• Rotation invariance
• Automatic Scale selection
• Descriptor
• Intensity patches
• Canonical representation: how to make them invariant to transformations: rotation, scale, illumination, and view-
point (affine)
• Better solution: Histogram of oriented gradients: SIFT descriptor
• Matching
• (Z)SSD, SAD, NCC, Hamming distance (last one only for binary descriptors)
ratio 1st /2nd closest descriptor
• Depending on the task, you may want to trade off repeatability and robustness for speed: approximated solutions, combinations
of efficient detectors and descriptors.
• Fast corner detector: FAST;
• Keypoint descriptors faster than SIFT: SURF, BRIEF, ORB, BRISK

84
Readings
• Ch. 7.1 of Szeliski book, 2nd Edition
• Chapter 4 of Autonomous Mobile Robots book: link
• Ch. 13.3 of Peter Corke book

85
Understanding Check
Are you able to answer:
• How does automatic scale selection work?
• What are the good and the bad properties that a function for automatic scale selection should have or not
have?
• How can we implement scale invariant detection efficiently? (show that we can do this by resampling the
image vs rescaling the kernel).
• What is a feature descriptor? (patch of intensity value vs histogram of oriented gradients). How do we
match descriptors?
• How is the keypoint detection done in SIFT and how does this differ from Harris?
• How does SIFT achieve orientation invariance?
• How is the SIFT descriptor built?
• What is the repeatability of the SIFT detector after a rescaling of 2? And for a 50 degrees viewpoint change?
• Illustrate the 1st to 2nd closest ratio of SIFT detection: what’s the intuitive reasoning behind it? Where does
the 0.8 factor come from?
• How does the FAST detector work? What are its pros and cons compared with Harris?
86

Lecture 5 Stitching Blending
No ratings yet
Lecture 5 Stitching Blending
75 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
510 pages
4.01 08 2022 - FeatureDescriptors
No ratings yet
4.01 08 2022 - FeatureDescriptors
46 pages
Point Feature Detection and Matching: Davide Scaramuzza
No ratings yet
Point Feature Detection and Matching: Davide Scaramuzza
65 pages
Illumination Scale Rotation
No ratings yet
Illumination Scale Rotation
16 pages
CHAP 7 Features Recognition and Classification
No ratings yet
CHAP 7 Features Recognition and Classification
94 pages
SIFT White
No ratings yet
SIFT White
55 pages
Featuredescriptor
No ratings yet
Featuredescriptor
45 pages
Comparis i On
No ratings yet
Comparis i On
68 pages
CSE 185 Introduction To Computer Vision: Feature Matching
No ratings yet
CSE 185 Introduction To Computer Vision: Feature Matching
48 pages
Lecture 05
No ratings yet
Lecture 05
57 pages
Digital Image Processing
No ratings yet
Digital Image Processing
88 pages
SIFT Transform
No ratings yet
SIFT Transform
50 pages
Local Features Tutorial:: (C) 2004 F. Estrada & A. Jepson & D. Fleet
No ratings yet
Local Features Tutorial:: (C) 2004 F. Estrada & A. Jepson & D. Fleet
25 pages
Feature Detection: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
No ratings yet
Feature Detection: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
54 pages
Improved SIFT Algorithm Image Matching
No ratings yet
Improved SIFT Algorithm Image Matching
7 pages
SIFT
No ratings yet
SIFT
33 pages
Sift Detector and Descriptor: (Scale Invariant Feature Transform)
No ratings yet
Sift Detector and Descriptor: (Scale Invariant Feature Transform)
34 pages
Bai09 Descriptors
No ratings yet
Bai09 Descriptors
81 pages
3local_features
No ratings yet
3local_features
76 pages
SIFT - The Scale Invariant Feature Transform
No ratings yet
SIFT - The Scale Invariant Feature Transform
62 pages
SIFT - The Scale Invariant Feature Transform
No ratings yet
SIFT - The Scale Invariant Feature Transform
62 pages
SIFT - The Scale Invariant Feature Transform
No ratings yet
SIFT - The Scale Invariant Feature Transform
62 pages
Feature Matching: "What Stuff in The Left Image Matches With Stuff On The Right?"
No ratings yet
Feature Matching: "What Stuff in The Left Image Matches With Stuff On The Right?"
62 pages
Unit II - Chapter 4 - Feature Detection
No ratings yet
Unit II - Chapter 4 - Feature Detection
56 pages
CV - Unit 2
No ratings yet
CV - Unit 2
30 pages
Document from Sindhu Reddy...??
No ratings yet
Document from Sindhu Reddy...??
94 pages
Article
No ratings yet
Article
27 pages
Scale Invariant Feature Transform (SIFT) : CS 763 Ajit Rajwade
No ratings yet
Scale Invariant Feature Transform (SIFT) : CS 763 Ajit Rajwade
52 pages
A Comparison of SIFT and Harris Conner Features For Correspondence Points Matching
No ratings yet
A Comparison of SIFT and Harris Conner Features For Correspondence Points Matching
4 pages
Distinctive Image Feature From Scale-Invariant Keypoints: David G. Lowe, 2004
No ratings yet
Distinctive Image Feature From Scale-Invariant Keypoints: David G. Lowe, 2004
27 pages
Recognizing Pictures at An Exhibition Using SIFT
No ratings yet
Recognizing Pictures at An Exhibition Using SIFT
5 pages
Panoramic Image Stitching 3D Photography Initial Project Proposal
No ratings yet
Panoramic Image Stitching 3D Photography Initial Project Proposal
4 pages
Module 3.1 Morphology
No ratings yet
Module 3.1 Morphology
97 pages
9-2e. SIFT-21-08-2024
No ratings yet
9-2e. SIFT-21-08-2024
66 pages
Scale Invariant Feature Transform: Tom Duerig
No ratings yet
Scale Invariant Feature Transform: Tom Duerig
30 pages
Feature Detection and Matching
100% (1)
Feature Detection and Matching
50 pages
Scale Invariant Feature Transform (SIFT)
No ratings yet
Scale Invariant Feature Transform (SIFT)
24 pages
03 Features
No ratings yet
03 Features
84 pages
Sift
No ratings yet
Sift
28 pages
SRM Ramapuram Digital image processing Unit 5 DIP (1)
No ratings yet
SRM Ramapuram Digital image processing Unit 5 DIP (1)
41 pages
Lecture 4 1 Feature Descriptors
No ratings yet
Lecture 4 1 Feature Descriptors
30 pages
Features Extraction Dr.tamizhselvan
No ratings yet
Features Extraction Dr.tamizhselvan
56 pages
Invariant Features From Interest Point Groups
No ratings yet
Invariant Features From Interest Point Groups
10 pages
InvariantFeaturesFromInterestPointGroups Brown2002 PDF
No ratings yet
InvariantFeaturesFromInterestPointGroups Brown2002 PDF
10 pages
IT5409 - Ch4 - Part2 - Feature ExtractionMatching - 4pages
No ratings yet
IT5409 - Ch4 - Part2 - Feature ExtractionMatching - 4pages
43 pages
Automatic Panorama Stitching
No ratings yet
Automatic Panorama Stitching
29 pages
IT5409 Ch4 Part2 Feature ExtractionMatching
No ratings yet
IT5409 Ch4 Part2 Feature ExtractionMatching
85 pages
Automatic Panoramic Image Stitching Using Invariant Features
No ratings yet
Automatic Panoramic Image Stitching Using Invariant Features
15 pages
06 Features
No ratings yet
06 Features
94 pages
Lecture 5 -Camera calibration
No ratings yet
Lecture 5 -Camera calibration
14 pages
cv2021 Lec3 Features II - 1600 - PDF - Gdrive.vip
No ratings yet
cv2021 Lec3 Features II - 1600 - PDF - Gdrive.vip
55 pages
Scale Invariant Feature Transform by David Lowe Short Explanation of The Approach by Michela Lecca
No ratings yet
Scale Invariant Feature Transform by David Lowe Short Explanation of The Approach by Michela Lecca
22 pages
3.3.Early Vision 3 Interestpoint
No ratings yet
3.3.Early Vision 3 Interestpoint
106 pages
SIFT Algo
No ratings yet
SIFT Algo
24 pages
ECE181B Proj02 Report
No ratings yet
ECE181B Proj02 Report
12 pages
Features
No ratings yet
Features
60 pages
Pyramid Image Processing: Exploring the Depths of Visual Analysis
From Everand
Pyramid Image Processing: Exploring the Depths of Visual Analysis
Fouad Sabry
No ratings yet
Computer Vision Fundamental Matrix: Please, suggest a subtitle for a book with title 'Computer Vision Fundamental Matrix' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
From Everand
Computer Vision Fundamental Matrix: Please, suggest a subtitle for a book with title 'Computer Vision Fundamental Matrix' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
Fouad Sabry
No ratings yet
Scale Invariant Feature Transform: Unveiling the Power of Scale Invariant Feature Transform in Computer Vision
From Everand
Scale Invariant Feature Transform: Unveiling the Power of Scale Invariant Feature Transform in Computer Vision
Fouad Sabry
No ratings yet
Lecture-4
No ratings yet
Lecture-4
24 pages
Lecture-1 CV
No ratings yet
Lecture-1 CV
18 pages
2.BIT students’ performance analysis with KNIME Analytics Platform
No ratings yet
2.BIT students’ performance analysis with KNIME Analytics Platform
58 pages
lec2
No ratings yet
lec2
12 pages
3DS Max R3
No ratings yet
3DS Max R3
26 pages
Color Image Processing: © 2002 R. C. Gonzalez & R. E. Woods
No ratings yet
Color Image Processing: © 2002 R. C. Gonzalez & R. E. Woods
54 pages
To TRCV To TRCV: SM-T575 Block Diagram - 2020.07.01
No ratings yet
To TRCV To TRCV: SM-T575 Block Diagram - 2020.07.01
44 pages
Texture Analysis Review
No ratings yet
Texture Analysis Review
28 pages
Effective Kidney Segmentation Using Gradient Based Approach in Abdominal CT Images
No ratings yet
Effective Kidney Segmentation Using Gradient Based Approach in Abdominal CT Images
6 pages
Engine 2.ini
No ratings yet
Engine 2.ini
6 pages
The Poor School Graduates 2012
No ratings yet
The Poor School Graduates 2012
36 pages
Log
No ratings yet
Log
808 pages
GPU Graftals - Stylized Rendering of Fur and Grass
No ratings yet
GPU Graftals - Stylized Rendering of Fur and Grass
15 pages
Cutawaytechnical453465436 5464356 345634 653456 34563 4563563 645635634 6
No ratings yet
Cutawaytechnical453465436 5464356 345634 653456 34563 4563563 645635634 6
8 pages
BME 404 - Medical Imaging Sessional - Lab3 (U)
No ratings yet
BME 404 - Medical Imaging Sessional - Lab3 (U)
16 pages
Design Principles: Chapter 13: Color
No ratings yet
Design Principles: Chapter 13: Color
41 pages
Applications of Computer Graphics
No ratings yet
Applications of Computer Graphics
40 pages
Fuente de Poder AOC 715t1180-3 Adpf24180a1 Psu
No ratings yet
Fuente de Poder AOC 715t1180-3 Adpf24180a1 Psu
5 pages
Adobe Photoshop CS3 Keyboard Shortcuts PC
100% (1)
Adobe Photoshop CS3 Keyboard Shortcuts PC
4 pages
PHP 0 DYAgw
No ratings yet
PHP 0 DYAgw
2,926 pages
Class 7 Worksheet 2
No ratings yet
Class 7 Worksheet 2
4 pages
Charles Tart World Simulation Article
No ratings yet
Charles Tart World Simulation Article
23 pages
Digital Image Processing Assign 2 (KirthiainiMuraly)
No ratings yet
Digital Image Processing Assign 2 (KirthiainiMuraly)
8 pages
Problem It As
No ratings yet
Problem It As
379 pages
Untitled
No ratings yet
Untitled
67 pages
Unit II CG
No ratings yet
Unit II CG
105 pages
Photo-Realistic Forests in GIMP PDF
No ratings yet
Photo-Realistic Forests in GIMP PDF
6 pages
Multimedia
No ratings yet
Multimedia
39 pages
CS304 Computer Graphics L2
No ratings yet
CS304 Computer Graphics L2
17 pages
Matter
No ratings yet
Matter
248 pages
Field Guide
No ratings yet
Field Guide
698 pages
Cemu 2.1.0 Oficial
No ratings yet
Cemu 2.1.0 Oficial
1 page
Parts of Computer
No ratings yet
Parts of Computer
5 pages
Introduction To Computer & Image Editing
100% (1)
Introduction To Computer & Image Editing
22 pages

06_feature_detection_2

Uploaded by

06_feature_detection_2

Uploaded by

Vision Algorithms for Mobile Robotics

• Census descriptor (binary values)

Geometric changes: scale, rotation, viewpoint Photometric changes: illumination

Image 1 Image 2 Image 1 Image 2

patch size patch size

s1 patch size s2 patch size

f ( I ( x, y,  )) f ( I ( x' , y ' ,  ' ))

f ( I ( x, y,  )) f ( I ( x' , y ' ,  ' ))

f ( I ( x, y,  )) f ( I ( x' , y ' ,  ' ))

f ( I ( x, y,  )) f ( I ( x' , y ' ,  ' ))

f ( I ( x, y,  )) f ( I ( x' , y ' ,  ' ))

f ( I ( x, y,  )) f ( I ( x' , y ' ,  ' ))

When the right scale is found, the patches must be normalized to

• What if there are multiple peaks? Is it really a problem?

• What is a good function?

Dominant gradient direction

𝑥’ = 𝑠(𝑥 cos𝜃 – 𝑦 sin𝜃) + 𝑎

Empty canonical patch

Is HOG invariant to additive or affine illumination changes (i.e., 𝐼′ (𝑥, 𝑦) = 𝛼𝐼 𝑥, 𝑦 + 𝛽 )?

For each extrema, the output of the

Magnitude of (𝐺(𝑘𝜎) − 𝐺(𝜎)) | 𝑠 = 4; 𝜎 = 1.6 |

Magnitude of (𝐺(𝑘 2 𝜎) − 𝐺 𝑘𝜎 )| 𝑠 = 4; 𝜎 = 1.6 |

Magnitude of (𝐺(𝑘 3 𝜎) − 𝐺 𝑘 2 𝜎 )| 𝑠 = 4; 𝜎 = 1.6 |

Magnitude of (𝐺(𝑘 4 𝜎) − 𝐺(𝑘 3 𝜎)) | 𝑠 = 4; 𝜎 = 1.6 |

What are SIFT features like?

downsampled by a factor of 2 and then the procedure is G(𝑘 3 𝜎)

G(𝑘𝜎) G(𝑘𝜎) Scale 3

Difference-of-Gaussian (DoG) images G(𝑘 4 𝜎)

2. Scale-Space extrema detection G(𝑘 3 𝜎) Scale 2

• Detect local maxima and minima in space-scales (see previous G(𝑘 2 𝜎)

>>[image1, descriptor1s, locs1] = sift('scene.pgm');

𝑑1 is the distance from the closest descriptor

“A threshold of 0.8 eliminates 90% of the

“This figure was generated by matching

SURF Approximation using box filters

• Goal: high speed description computation and matching

• Binary descriptor formation:

Rublee,Rabaud, Konolige, Bradski,“ORB: an efficient alternative to SIFT or SURF".

• Joint regression of keypoint location and descriptor. Self-supervised.

Harris ++++ Patch +++ +

SURF +++ SURF ++ ++++

SuperPoint ++ SuperPoint + +++++

You might also like