0% found this document useful (0 votes)
16 views

Image Features and Descriptors

The document discusses image features and descriptors, focusing on feature detection and description, which are crucial for image processing tasks. It covers various algorithms and techniques such as Harris Corner, SIFT, ORB, and RANSAC for robust image matching and object detection. Key characteristics of features, including their invariance to transformations and the importance of preprocessing for accurate extraction, are also highlighted.

Uploaded by

aryantk1020
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Image Features and Descriptors

The document discusses image features and descriptors, focusing on feature detection and description, which are crucial for image processing tasks. It covers various algorithms and techniques such as Harris Corner, SIFT, ORB, and RANSAC for robust image matching and object detection. Key characteristics of features, including their invariance to transformations and the importance of preprocessing for accurate extraction, are also highlighted.

Uploaded by

aryantk1020
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

1/12/2022

Image Features and


Descriptors

Well, but reflect; have we not several times

acknowledged that names rightly given are the

likenesses and images of the things which they name?

-- Socrates

Department of E & C, RIT 2

1
2
1/12/2022

CONTENTS

 Feature detectors
 Feature descriptors
 Harris Corner
 SIFT, ORB, BRIEF features
 Blob detectors (LoG, DoG, DoH, HoG)
 Image matching and object detection
 Haar-like features – Face detection

Department of E & C, RIT 3

Feature detectors vs descriptors

 Features (local) – group of key/salient points or information relevant to an


image processing task
 Abstract, more general, robust representation of an image
 Family of algorithms that choose a set of interest points based on some
criterion
 Cornerness, local maximum/minimum
 Feature descriptor – collection of values to represent the image with the
features/interest points
 HoG features
 Transforms an image into a set of attributes

Department of E & C, RIT 4

2
4
1/12/2022

Feature Extraction

 After an image has been segmented into regions or their boundaries, the resulting sets of
segmented pixels usually have to be converted into a form suitable for further computer
processing
 Typically, the step after segmentation is feature extraction, which consists of feature
detection and feature description
 Feature detection refers to finding the features in an image, region, or boundary
 Feature description assigns quantitative attributes to the detected features
 For example, we might detect corners in a region boundary, and describe those corners by their
orientation and location, both of which are quantitative attributes
 Feature processing methods are subdivided into three principal categories, depending on
whether they are applicable to boundaries, regions, or whole images. Some features are
applicable to more than one category
 Feature descriptors should be as insensitive as possible to variations in parameters such as
scale, translation, rotation, illumination, and viewpoint
 descriptors are either insensitive to, or can be normalized to compensate for, variations in
one or more of these parameters
Department of E & C, RIT 5

Image feature
 a feature is a distinctive attribute or description of “something” to label or differentiate
 key words here are label and differentiate
 “something” of interest refers either to individual image objects, or even to entire images or sets of images
 features as attributes that are going to help to assign unique labels to objects in an image or, more
generally, are going to be of value in differentiating between entire images or families of images
 two principal aspects of image feature extraction: feature detection, and feature description
 feature extraction – refer to both detecting the features and then describing them
 Extraction process must encompass both
 Example:
 object corners as features

 Detection refers to finding the corners in a region or image

 Description refers to assigning quantitative (or qualitative) attributes to the detected features, such as corner
orientation, and location with respect to other corners

 knowing that there are corners in an image has limited use without additional information that can help us differentiate
between objects in an image, or between images, based on corners and their attributes

Department of E & C, RIT 6

3
6
1/12/2022

Important Characteristics of Features

 Features should be independent of location, rotation, and scale


 Other factors, such as independence of illumination levels and changes caused by the
viewpoint between the imaging sensor(s) and the scene
 preprocessing should be used to normalize input images before feature extraction
 For example, in situations where changes in illumination are severe enough to cause
difficulties in feature detection, it would make sense to preprocess an image to compensate
for those changes
 Histogram equalization or specification are automatic techniques helpful in this regard
 The idea is to use as much a priori information as possible to preprocess images
in order to improve the chances of accurate feature extraction

Department of E & C, RIT 7

Important Characteristics of Features

 A feature descriptor is invariant with respect to a set of transformations if its value remains
unchanged after the application of any transformation from the family
 A feature descriptor is covariant with respect to a set of transformations if applying to the
entity any transformation from the set produces the same result in the descriptor
 area is an invariant feature descriptor with respect to the given family of transformations
 If we add the affine transformation scaling to the family, descriptor area ceases to be
invariant with respect to the extended family
 The descriptor is now covariant with respect to the family, because scaling the area of the
region by any factor scales the value of the descriptor by the same factor
 the descriptor direction (of the principal axis of the region) is covariant because rotating the
region by any angle has the same effect on the value of the descriptor
 compensate for changes in direction of a region by computing its actual direction and rotating
the region so that its principal axis points in a predefined direction

Department of E & C, RIT 8

4
8
1/12/2022

Important Characteristics of Features

 local feature - if it is applies to a member of a set


 global feature - if it applies to the entire set, where “member” and “set” are determined
by the application
 feature description typically is used as a preprocessing step for higher-level tasks, such
as image registration, object recognition for automated inspection, searching for
patterns (e.g., individual faces and/or fingerprints) in image databases, and autonomous
applications, such as robot and vehicle navigation
 For these applications, numerical features usually are “packaged” in the form of a
feature vector, (i.e., a 1 x n or n x 1 matrix) whose elements are the descriptors
 An RGB image is one of the simplest examples - each pixel of an RGB image can be
expressed as 3-D vector

Department of E & C, RIT 9

Corner Detector

 corner - a rapid change of direction in a curve


 highly effective features – distinctive, reasonably invariant to viewpoint
 used routinely for matching image features in applications such as tracking for autonomous
navigation, stereo machine vision algorithms, and image database queries
 detected by running a small window over an image
 detector window is designed to compute intensity changes
 Areas of zero (or small) intensity changes in all directions, which happens when the window is located in a
constant (or nearly constant) region, as in location A
 Areas of changes in one direction but no (or small) changes in the orthogonal direction, which this happens
when the window spans a boundary between two regions, as in location B
 Areas of significant changes in all directions, a condition that happens when the window contains a corner
(or isolated points), as in location C

Department of E & C, RIT 10

5
10
1/12/2022

Corner Detector

Department of E & C, RIT 11

11

Harris Detector Formulation

Department of E & C, RIT 12

6
12
1/12/2022

Harris Detector Formulation

Department of E & C, RIT 13

13

Harris Detector Formulation

Department of E & C, RIT 14

7
14
1/12/2022

Corner Detector

Measure R has large positive values when both eigenvalues are large, indicating the
presence of a corner;
it has large negative values when one eigenvalue is large and the other small, indicating
an edge;
its absolute value is small when both eigenvalues are small, indicating that the
Department of E & C, RIT 15

image patch under consideration is flat

15

Corner Detector

Department of E & C, RIT 16

8
16
1/12/2022

Interpreting the eigen values

Department of E & C, RIT 17

17

Corner Response Function

Department of E & C, RIT 18

9
18
1/12/2022

Corner Detector

 Constant k is determined empirically, and its range of values depends on the


implementation
 For example, the MATLAB Image Processing Toolbox uses 0 < k < 0.25.
 You can interpret k as a “sensitivity factor;” the smaller it is, the more likely the
detector is to find corners.
 Typically, R is used with a threshold, T. We say that a corner at an image location has
been detected only if R > T for a patch at that location.

Department of E & C, RIT 19

19

Harris Corner Detector

 Explores intensity changes within a window


 Edge – intensity values change abruptly in only one direction
 Corners – significant changes in intensity values in all directions
 Invariant to rotation, not to scale
 Sub-pixel accuracy – refined corners

Department of E & C, RIT 20

10
20
1/12/2022

Harris Detector

Department of E & C, RIT 21

21

Example

Department of E & C, RIT 22

11
22
1/12/2022

Example

Department of E & C, RIT 23

23

Example

Department of E & C, RIT 24

12
24
1/12/2022

Example

Department of E & C, RIT 25

25

Image Matching

 Detect interest points in an image


 Match the points across different images of the same object
 Steps
 Compute the points of interest (corner points with the Harris Corner Detector)
 Consider a region (window) around each of the keypoints
 From the region, compute a local feature descriptor, for each key point for each
image and normalize
 Match the local descriptors computed in two images (using Euclidean distance)

Department of E & C, RIT 26

13
26
1/12/2022

Robust image matching using the RANSAC


algorithm and Harris Corner features

 First, we will compute the interest points or the Harris Corners in both the images
 A small space around the points will be considered, and the correspondences in-
between the points will then be computed using a weighted sum of squared
differences. This measure is not very robust, and it's only usable with slight
viewpoint changes
 A set of source and corresponding destination coordinates will be obtained once
the correspondences are found; they are used to estimate the geometric
transformations between both the images
 A simple estimation of the parameters with the coordinates is not enough—many
of the correspondences are likely to be faulty
 The RANdom SAmple Consensus (RANSAC) algorithm is used to robustly estimate
the parameters, first by classifying the points into inliers and outliers, and
then by fitting the model to inliers while ignoring the outliers, in order to find
matches consistent with an affine transformation
Department of E & C, RIT 27

27

RANSAC – RANdom SAmple Consensus

 Approach: we want to avoid the impact of outliers, so let’s look for “inliers”, and
use only those
 Intuition: if an outlier is chosen to compute the current fit, then the resulting line
won’t have much support from rest of the points
 Keep the transformation with the largest number of inliers
 RANSAC loop:
 Randomly select a seed group of points on which to base transformation estimate (e.g.,
a group of matches)
 Compute transformation from seed group
 Find inliers to this transformation
 If the number of inliers is sufficiently large, re-compute least-squares estimate of
transformation on all of the inliers

Department of E & C, RIT 28

14
28
1/12/2022

Line Fitting Example

Department of E & C, RIT 29

29

Line Fitting Example

Department of E & C, RIT 30

15
30
1/12/2022

Algorithm

Department of E & C, RIT 31

31

After RANSAC

 RANSAC divides data into inliers and outliers and yields estimate computed
from minimal set of inliers
 Improve this initial estimate with estimation over all inliers (e.g. with
standard least-squares minimization)
 But this may change inliers, so alternate fitting with re-classification as
inlier/outlier

Department of E & C, RIT 32

16
32
1/12/2022

Pros and Cons

 Pros:
 General method suited for a wide range of model fitting problems
 Easy to implement and easy to calculate its failure rate
 Cons:
 Only handles a moderate percentage of outliers without cost blowing up
 Many real problems have high rate of outliers (but sometimes selective choice of
random subsets can help)
 A voting strategy, The Hough transform, can handle high percentage of
outliers

Department of E & C, RIT 33

33

Local Features

 Global representations have major limitations


 Instead, describe and match only local regions
 Increased robustness to occlusions, articulation, intra-category variations

Department of E & C, RIT 34

17
34
1/12/2022

General Approach

Department of E & C, RIT 35

35

Scale Covariance
 Goal: independently detect corresponding regions in scaled versions of the
same image
 Need scale selection mechanism for finding characteristic region size that is
covariant with the image transformation

Department of E & C, RIT 36

18
36
1/12/2022

From edges to blobs

 Edge = ripple
 Blob = superposition of two ripples
 Spatial selection: the magnitude of the Laplacian response will achieve a
maximum at the center of the blob, provided the scale of the Laplacian is
“matched” to the scale of the blob

Department of E & C, RIT 37

37

Scale selection

 find the characteristic scale of the blob by convolving it with Laplacians at


several scales and looking for the maximum response
 However, Laplacian response decays as scale increases

Department of E & C, RIT 38

19
38
1/12/2022

Scale normalization

 The response of a derivative of Gaussian filter to a perfect step edge


decreases as σ increases

Department of E & C, RIT 39

39

Scale normalization

 The response of a derivative of Gaussian filter to a perfect step edge


decreases as σ increases
 To keep response the same (scale-invariant), must multiply Gaussian
derivative by σ
 Laplacian is the second Gaussian derivative, so it must be multiplied by σ2

Department of E & C, RIT 40

20
40
1/12/2022

Effect of scale normalization

Department of E & C, RIT 41

41

Blob detection in 2D

 Laplacian of Gaussian: Circularly symmetric operator for blob detection in 2D


 Scale normalized

Department of E & C, RIT 42

21
42
1/12/2022

Scale Selection

 At what scale does the Laplacian achieve a maximum response to a binary


circle of radius r?
 To get maximum response, the zeros of the Laplacian have to be aligned with
the circle
 The Laplacian is given by (up to scale):

 Therefore, the maximum response occurs at

Department of E & C, RIT 43

43

Characteristic scale

 scale that produces peak of Laplacian response in the blob center


 Scale-space blob detector
 Convolve image with scale-normalized Laplacian at several scales
 Find maxima of squared Laplacian response in scale-space

Department of E & C, RIT 44

22
44
1/12/2022

Efficient Implementation

 Approximating the Laplacian with a difference of Gaussians (DoG)

Department of E & C, RIT 45

45

BLOB DETECTORS WITH LoG, DoG and


DoH
 defined as either a bright on a dark region, or a dark on a bright region
 LoG
 used to find scale invariant regions by searching 3D (location + scale) extrema
of the LoG with the concept of Scale Space
 If the scale of the Laplacian (σ of the LoG filter) gets matched with the scale of
the blob, the magnitude of the Laplacian response attains a maximum at the
center of the blob
 LoG-convolved images are computed with gradually increasing σ and they are
stacked up in a cube
 blobs correspond to the local maximums
 only detects the bright blobs on the dark backgrounds
 It is accurate, but slow (particularly for detecting larger blobs)
Department of E & C, RIT 46

23
46
1/12/2022

DoG

 LoG approach is approximated by the DoG approach


 Faster
 image is smoothed (using Gaussians) with increasing σ values, and the
difference between two consecutive smoothed images is stacked up in a cube
 detects the bright blobs on the dark backgrounds
 faster than LoG but less accurate
 larger blobs detection is still expensive

Department of E & C, RIT 47

47

DoH (Determinant of Hessian)

 DoH approach is the fastest of all these approaches


 detects the blobs by computing maximums in the matrix of the Determinant of
Hessian of the image
 size of blobs does not have any impact on the speed of detection
 Both the bright blobs on the dark background and the dark blobs on the bright
backgrounds are detected
 small blobs are not detected accurately

Department of E & C, RIT 48

24
48
1/12/2022

Eliminate rotation ambiguity

 To assign a unique orientation to circular image windows:


 Create histogram of local gradient directions in the patch
 Assign canonical orientation at peak of smoothed histogram

Department of E & C, RIT 49

49

Histogram of Oriented Gradients (HoG)

 Popular feature descriptor for object detection


 globally normalize the image
 compute the horizontal and vertical gradient images
 compute the gradient histograms
 normalize across blocks
 flatten into a feature descriptor vector

Department of E & C, RIT 50

25
50
1/12/2022

HoG

Department of E & C, RIT 51

51

From covariant regions to invariant


features

Department of E & C, RIT 52

26
52
1/12/2022

Invariance vs. covariance

Department of E & C, RIT 53

53

Scale-Invariant Feature Transform (SIFT)

 Developed by Lowe (2004) – extracting invariant features


 Transform – transforms data into scale-invariant coordinates relative to local
image features
 Most complex feature detection and description approach
 Strongly heuristic
 When images are similar in nature (same scale, similar orientation, etc),
corner detection is suitable as whole image features
 In the presence of variables such as scale changes, rotation, changes in
illumination, and changes in viewpoint - SIFT

Department of E & C, RIT 54

27
54
1/12/2022

SIFT

 SIFT features - called keypoints


 invariant to image scale and rotation
 robust across a range of affine distortions, changes in 3-D viewpoint, noise,
and changes of illumination
 The input to SIFT is an image
 Its output is an n-dimensional feature vector whose elements are the
invariant feature descriptors

Department of E & C, RIT 55

55

Scale Space

 First stage - find image locations that are invariant to scale change
 achieved by searching for stable features across all possible scales, using a function of scale
known as scale space, which is a multi-scale representation suitable for handling image
structures at different scales in a consistent manner
 objects in unconstrained scenes will appear in different ways, depending on the scale at
which images are captured. Because these scales may not be known beforehand, a
reasonable approach is to work with all relevant scales simultaneously
 Scale space represents an image as a one-parameter family of smoothed images, with the
objective of simulating the loss of detail that would occur as the scale of an image decreases
 The parameter controlling the smoothing is referred to as the scale parameter
 Gaussian kernels are used to implement smoothing, so the scale parameter is the standard
deviation
 only smoothing kernel that meets a set of important constraints, such as linearity and shift-
invariance, is the Gaussian lowpass kernel

Department of E & C, RIT 56

28
56
1/12/2022

Scale Space

Department of E & C, RIT 57

57

Scale Space

 subdivides scale space into octaves


 each octave corresponding to a doubling of s, just as an octave in music
theory corresponds to doubling the frequency of a sound signal
 further subdivides each octave into an integer number, s, of intervals, so that
an interval of 1 consists of two images, an interval of 2 consists of three
images, and so forth
 the value used in the Gaussian kernel that generates the image corresponding
to an octave is k^s  = 2 which means that k = 2 ^ 1/ s
 For example, for  = 2, k = 2, and the input image is successively smoothed using
standard deviations of , (2), and (2)^2, so that the third image (i.e., the
octave image for  = 2) in the sequence is filtered using a Gaussian kernel with
standard deviation (2)^2 = 2

Department of E & C, RIT 58

29
58
1/12/2022

Scale Space

 the number of smoothed images generated in an octave is s + 1

 the smoothed images in scale space are used to compute differences of Gaussian, in order to cover a full octave, implies that
an additional two images past the octave image are required

 a total of s + 3 images - Because the octave image is always the (s + 1)th image in the stack (counting from the bottom)

 it follows that this image is the third image from the top in the expanded sequence of s + 3 images

 Each octave contains five images, indicating that s = 2 was used in this case

 The first image in the second octave is formed by downsampling the original image (by skipping every other row and column),
and then smoothing it using a kernel with twice the standard deviation used in the first octave (i.e., 2 = 2 1)

 Subsequent images in that octave are smoothed using 2 , with the same sequence of values of k as in the first octave

 The same basic procedure is then repeated for subsequent octaves. That is, the first image of the new octave is formed by:

 (1) downsampling the original image enough times to achieve half the size of the image in the previous octave

 (2) smoothing the downsampled image with a new standard deviation that is twice the standard deviation of the previous octave.

 The rest of the images in the new octave are obtained by smoothing the downsampled image with the new standard deviation
multiplied by the same sequence of values of k as before

Department of E & C, RIT 59

59

Scale Space

 When k = 2, the first image of a new octave can be obtained without having to smooth the downsampled
image
 This is because, for this value of k, the kernel used to smooth the first image of every octave is the same as
the kernel used to smooth the third image from the top of the previous octave
 The first image of a new octave can be obtained directly by downsampling that third image of the previous
octave by 2
 The result will be the same. The third image from the top of any octave is called the octave image because
the standard deviation used to smooth it is twice (k^2 = 2) the value of the standard deviation used to
smooth the first image in the octave
 each octave is composed of five images, it follows that we are again using s = 2. We chose 1 = 2/2 = 0.707
and k = 2 = 1.414 for this example so that the numbers would result in familiar multiples
 The images going up scale space are blurred by using Gaussian kernels with progressively larger standard
deviations, and the first image of the second and subsequent octaves is obtained by downsampling the
octave image from the previous octave by 2
 the images become significantly more blurred (and consequently lose more fine detail) as they go up both in
scale as well as in octave
 The images in the third octave show significantly fewer details, but their gross appearance is unmistakably
that of the same structure

Department of E & C, RIT 60

30
60
1/12/2022

Scale Space

Department of E & C, RIT 61

61

Detecting Local Extrema

 SIFT initially finds the locations of keypoints using the Gaussian filtered
images
 refines the locations and validity of those keypoints using two processing
steps

Department of E & C, RIT 62

31
62
1/12/2022

Finding the initial key points

 Keypoint locations in scale space are found initially by SIFT by detecting


extrema in the difference of Gaussians (DoG) of two adjacent scale-space
images in an octave, convolved with the input image that corresponds to that
octave
 SIFT looks for extrema in D(x, y, ), whereas the Marr-Hildreth detector would
look for the zero crossings of this function
 A total of s + 2 difference functions, D(x, y, ), are formed in each octave
from all adjacent pairs of Gaussian-filtered images in that octave
 These difference functions can be viewed as images, and one sample of such
an image is shown for each of the three octaves
 the level of detail in these images decreases the further up we go in scale
space

Department of E & C, RIT 63

63

Finding the initial keypoints

Department of E & C, RIT 64

32
64
1/12/2022

Finding the initial keypoints

Department of E & C, RIT 65

65

Improving the accuracy of keypoint


locations
 The Hessian and gradient of D are approximated using differences of neighboring
points
 The resulting 3 × 3 system of linear equations is easily solved computationally
 If the offset x^ is greater than 0.5 in any of its three dimensions, the extremum
lies closer to another sample point, in which case the sample point is changed and
the interpolation is performed about that point instead
 The final offset x^ is added to the location of its sample point to obtain the
interpolated estimate of the location of the extremum
 The function value at the extremum is used by SIFT for rejecting unstable extrema
with low contrast
 any extrema for which D was less than 0.03 was rejected, based on all image
values being in the range [0, 1]
 This eliminates keypoints that have low contrast and/or are poorly localized

Department of E & C, RIT 66

33
66
1/12/2022

Eliminating Edge Responses

 using a difference of Gaussians yields edges in an image


 keypoints of interest in SIFT are “corner-like” features, which are
significantly more localized
 intensity transitions caused by edges are eliminated
 quantify the difference between edges and corners - local curvature
 An edge is characterized by high curvature in one direction, and low
curvature in the orthogonal direction
 Curvature at a point in an image can be estimated from the 2 × 2 Hessian
matrix evaluated at that point
 to estimate local curvature of the DoG at any level in scalar space, we
compute the Hessian matrix of D at that level
Department of E & C, RIT 67

67

Eliminating Edge Responses

 If the determinant is negative, the curvatures have different signs and the
keypoint in question cannot be an extremum, so it is discarded
 Let r denote the ratio of the largest to the smallest eigenvalue. Then a = rb
and which depends on the ratio of the eigenvalues, rather than their
individual values
 The minimum of (r + 1)^2/r occurs when the eigenvalues are equal, and it
increases with r
 to check that the ratio of principal curvatures is below some threshold, r, we
only need to check
 which is a simple computation. In the experimental results reported by Lowe
[2004], a value of r = 10 was used, meaning that keypoints with ratios of
curvature greater than 10 were eliminated

Department of E & C, RIT 68

34
68
1/12/2022

Eliminating Edge Responses

Department of E & C, RIT 69

69

Keypoint Orientation

 keypoints that SIFT considers stable


 know the location of each keypoint in scale space – achieved scale independence
 assign a consistent orientation to each keypoint based on local image properties
 represent a keypoint relative to its orientation and thus achieve invariance to image rotation
 The scale of the keypoint is used to select the Gaussian smoothed image, L, that is closest to
that scale
 In this way, all orientation computations are performed in a scale-invariant manner. Then, for
each image sample, L(x, y), at this scale compute the gradient magnitude, M(x, y), and
orientation angle, u(x, y), using pixel differences

Department of E & C, RIT 70

35
70
1/12/2022

Keypoint Orientation

 A histogram of orientations is formed from the gradient orientations of sample points in a


neighborhood of each keypoint
 The histogram has 36 bins covering the 360° range of orientations on the image plane
 Each sample added to the histogram is weighed by its gradient magnitude, and by a circular
Gaussian function with a standard deviation 1.5 times the scale of the keypoint
 Peaks in the histogram correspond to dominant local directions of local gradients
 The highest peak in the histogram is detected and any other local peak that is within 80% of
the highest peak is used also to create another keypoint with that orientation
 Thus, for the locations with multiple peaks of similar magnitude, there will be multiple
keypoints created at the same location and scale, but with different orientations
 SIFT assigns multiple orientations to only about 15% of points with multiple orientations, but
these contribute significant to image matching
 A parabola is fit to the three histogram values closest to each peak to interpolate the peak
position for better accuracy
Department of E & C, RIT 71

71

Keypoint Orientation

Department of E & C, RIT 72

36
72
1/12/2022

Keypoint Descriptors

Department of E & C, RIT 73

73

Keypoint Descriptors

 compute a descriptor for a local region around each keypoint that is highly
distinctive, but is at the same time as invariant as possible to changes in
scale, orientation, illumination, and image viewpoint
 The idea is to be able to use these descriptors to identify matches
(similarities) between local regions in two or more images
 The approach used by SIFT to compute descriptors is based on experimental
results suggesting that local image gradients appear to perform a function
similar to what human vision does for matching and recognizing 3-D objects
from different viewpoints

Department of E & C, RIT 74

37
74
1/12/2022

Procedure

 A region of size 16 × 16 pixels is centered on a keypoint, and the gradient


magnitude and direction are computed at each point in the region using pixel
differences
 These are shown as randomly oriented arrows in the upper-left of the figure
 A Gaussian weighting function with standard deviation equal to one-half the
size of the region is then used to assign a weight that multiplies the
magnitude of the gradient at each point
 The Gaussian weighting function is shown as a circle in the figure, but it is
understood that it is a bell-shaped surface whose values (weights) decrease as
a function of distance from the center
 The purpose of this function is to reduce sudden changes in the descriptor
with small changes in the position of the function

Department of E & C, RIT 75

75

Procedure

 Because there is one gradient computation for each point in the region surrounding a keypoint,
there are (16)^2 gradient directions to process for each keypoint
 There are 16 directions in each 4 × 4 subregion. The top-rightmost subregion is shown zoomed to
simplify the explanation of the next step, which consists of quantizing all gradient orientations in
the 4 × 4 subregion into eight possible directions differing by 45°
 Rather than assigning a directional value as a full count to the bin to which it is closest, SIFT
performs interpolation that distributes a histogram entry among all bins proportionally, depending
on the distance from that value to the center of each bin
 This is done by multiplying each entry into a bin by a weight of 1 − d, where d is the shortest
distance from the value to the center of a bin, measured in the units of the histogram spacing, so
that the maximum possible distance is 1
 For example, the center of the first bin is at 45°/2 = 22.5°, the next center is at 22.5° + 45° = 67.5°, and so
on. Suppose that a particular directional value is 22.5°. The distance from that value to the center of the first
histogram bin is 0, so we would assign a full entry (i.e., a count of 1) to that bin in the histogram. The distance
to the next center would be greater than 0, so we would assign a fraction of a full entry, that is 1 * (1 − d), to
that bin, and so forth for all bins. In this way, every bin gets a proportional fraction of a count, thus avoiding
“boundary” effects in which a descriptor changes abruptly as a small change in orientation causes it to be
assigned from one bin to another

Department of E & C, RIT 76

38
76
1/12/2022

Procedure

 eight directions of a histogram as a small cluster of vectors, with the length of each vector being equal to the value of its
corresponding bin
 Sixteen histograms are computed, one for each 4 × 4 subregion of the 16 × 16 region surrounding a keypoint
 A descriptor then consists of a 4 × 4 array, each containing eight directional values
 Descriptor data is organized as a 128-dimensional vector
 In order to achieve orientation invariance, the coordinates of the descriptor and the gradient orientations are rotated relative
to the keypoint orientation
 In order to reduce the effects of illumination, a feature vector is normalized in two stages
 First, the vector is normalized to unit length by dividing each component by the vector norm. A change in image contrast resulting
from each pixel value being multiplied by a constant will multiply the gradients by the same constant, so the change in contrast will
be cancelled by the first normalization
 A brightness change caused by a constant being added to each pixel will not affect the gradient values because they are computed
from pixel differences. Therefore, the descriptor is invariant to affine changes in illumination
 However, nonlinear illumination changes resulting, for example, from camera saturation, can also occur. These types of changes can
cause large variations in the relative magnitudes of some of the gradients, but they are less likely to affect gradient orientation
 SIFT reduces the influence of large gradient magnitudes by thresholding the values of the normalized feature vector so that all
components are below the experimentally determined value of 0.2
 After thresholding, the feature vector is renormalized to unit length

Department of E & C, RIT 77

77

Summary

1. Construct the scale space. The parameters that need to be specified are , s, (k is
computed from s), and the number of octaves. Suggested values are  = 1.6, s = 2, and three
octaves
2. Obtain the initial keypoints. Compute the difference of Gaussians, D(x, y, ), from the
smoothed images in scale space. Find the extrema in each D(x, y, ) image. These are the
initial keypoints.
3. Improve the accuracy of the location of the keypoints. Interpolate the values of D(x, y, )
via a Taylor expansion.
4. Delete unsuitable keypoints. Eliminate keypoints that have low contrast and/or are poorly
localized. This is done by evaluating D from Step 3 at the improved locations. All keypoints
whose values of D are lower than a threshold are deleted. A suggested threshold value is
0.03. Keypoints associated with edges are deleted also. A value of 10 is suggested for r.
5. Compute keypoint orientations. Compute the magnitude and orientation of each keypoint
using the histogram-based procedure.
6. Compute keypoint descriptors. Compute a feature (descriptor) vector for each keypoint. If a
region of size 16 × 16 around each keypoint is used, the result will be a 128-dimensional
feature vector for each keypoint.
Department of E & C, RIT 78

39
78
1/12/2022

Example

Department of E & C, RIT 79

79

Example

Department of E & C, RIT 80

40
80
1/12/2022

Example

Department of E & C, RIT 81

81

Example

Department of E & C, RIT 82

41
82
1/12/2022

Observations

 Number of matches between the image of a building and a subimage


 Right corner edge of the building
 643 and 54 keypoints
 36 keypoint matches, only 3 were incorrect
 Rotation by 5 counterclockwise, subimage: right corner edge
 547 and 49 keypoints
 26 keypoint matches, only 2 were incorrect
 Reduced to half the size in both spatial directions
 No matches
 brightening the reduced image slightly by manipulating the intensity gamma
 Histogram specification – normalizing the intensity of all images using the characteristics of the image being
queried
 195 and 24 keypoints
 7 keypoint matches, only 1 was incorrect

Department of E & C, RIT 83

83

Observations

 we do not always know a priori when images have been acquired under different
conditions and geometrical arrangements
 more practical test is to compute features for a prototype image and test them
against unknown samples
 Rotated subimage
 10 matches were found, 2 were incorrect
 excellent results, considering the relatively small size of the subimage, and the fact that
it was rotated
 Half-sized subimage
 11 matches were found, 4 were incorrect
 good results, considering the fact that significant detail was lost in the subimage when it
was rotated or reduced in size

Department of E & C, RIT 84

42
84
1/12/2022

Scale-invariant Feature Transform (SIFT)

 alternative representation for image regions


 very useful for matching images
 simple corner detectors work well when the images to be matched are similar in
nature (with respect to scale, orientation, and so on)
 if they have different scales and rotations, the SIFT descriptors are needed to be
used to match them
 not only just scale invariant, but it still obtains good results when rotation,
illumination, and viewpoints of the images change as well
 transforms image content into local feature coordinates that are invariant to
translation, rotation, scale, and other imaging parameters
 robust with regard to small variations in illumination (due to gradient and
normalization), pose

Department of E & C, RIT 85

85

Algorithm

 Scale-space extrema detection: Search over multiple scales and image


locations, the location and characteristic scales are given by DoG detector
 Keypoint localization: Select keypoints based on a measure of stability, keep
only the strong interest points by eliminating the low-contrast and edge
keypoints
 Orientation assignment: Compute the best orientation(s) for each keypoint
region, which contributes to the stability of matching
 Keypoint descriptor computation: Use local image gradients at selected
scale and rotation to describe each keypoint region

Department of E & C, RIT 86

43
86
1/12/2022

Matching images with BRIEF binary


descriptors
 Binary Robust Independent Elementary Features
 comparatively few bits (short binary descriptor)
 computed using a set of intensity difference tests
 has a low memory footprint
 matching using this descriptor turns out to be very efficient with the Hamming
distance metric
 desired scale-invariance can be obtained by detecting features at different
scales, although it does not provide rotation invariance

Department of E & C, RIT 87

87

BRIEF

 convert image patches into a binary feature vector so that together they can
represent an object
 each keypoint is described by a feature vector which is 128–512 bits string
 images need to be smoothed before they can be meaningfully differentiated
when looking for edges
 The (x,y) pair is also called random pair which is located inside the patch
Total we have to select n test(random pair) for creating a binary feature
vector and we have to choose this n test from one of five approaches
(Sampling Geometries)

Department of E & C, RIT 88

44
88
1/12/2022

BRIEF

 Define test  on patch p of size S x S as


 Choose a set of n (x, y)-location pairs which uniquely defines a set of binary
tests (n = 128, 256, 512)
 Good compromise between speed, storage efficiency, recognition rate
 BRIEF-k – k = n/8 (number of bytes to store the descriptor)
 BRIEF descriptor – n-dimensional bit string

Department of E & C, RIT 89

89

Sampling Strategies

Department of E & C, RIT 90

45
90
1/12/2022

Sampling Strategies

 Uniform(G I): Both x and y pixels in the random pair is drawn from a Uniform
distribution or spread of S/2 around keypoint. The pair(test) can lie close to the
patch border
 Gaussian(G II): Both x and y pixels in the random pair is drawn from a Gaussian
distribution or spread of 0.04 * S² around keypoint
 Gaussian(G III): The first pixel(x) in the random pair is drawn from a Gaussian
distribution centered around the keypoint with a standard deviation or spread of
0.04 x S². The second pixel(y) in the random pair is drawn from a Gaussian
distribution centered around the first pixel(x) with a standard deviation or spread
of 0.01 x S². This forces the test(pair) to be more local. Test(pair) locations
outside the patch are clamped to the edge of the patch.
 Coarse Polar Grid(G IV): Both x and y pixels in the random pair is sampled from
discrete locations of a coarse polar grid introducing a spatial quantization.
 Coarse Polar Grid(G V): The first pixel(x) in random pair is at (0, 0) and the second
pixel(y) in the random pair is drawn from discrete locations of a coarse polar grid.

Department of E & C, RIT 91

91

Matching with ORB (Oriented FAST and


Rotated BRIEF) feature detector and binary
descriptor
 An oriented FAST (Features from Accelerated and Segments Test) detection
method and the rotated BRIEF descriptors are used by this algorithm
 However, FAST features do not have an orientation component and multiscale
features. So ORB algorithm uses a multiscale image pyramid
 As compared to BRIEF, ORB is more scale and rotation invariant, but even this
applies the Hamming distance metric for matching, which is more efficient
 Hence, this method is preferred over BRIEF when considering real-time
applications
 Describe feature points by a binary string
 Detected by FAST feature detection
 Described using an improved BRIEF feature descriptor

Department of E & C, RIT 92

46
92
1/12/2022

ORB

 FAST (Feature from Accelerated Segment Test)


 When the neighborhood around a pixel A has enough pixels in a different gray area
with pixel A, then A is recognized as a FAST corner
 |Ix – Iy| > t, Ix is the gray value of consecutive n pixels, Iy is the gray value of
point p
 Rotation invariance
 Moments of a circular neighborhood, center of gravity – angle formed

Department of E & C, RIT 93

93

ORB Algorithm

Department of E & C, RIT 94

47
94
1/12/2022

Image Matching

Department of E & C, RIT 95

95

Haar-like Features

 very useful image features used in object detection


 introduced in the first real-time face detector by Viola and Jones
 Using integral images, Haar-like features of any size (scale) can be efficiently
computed in constant time
 computation speed is the key advantage of a Haar-like feature over most other
features
 features are just like the convolution kernels (rectangle filters)
 Each feature corresponds to a single value computed by subtracting a sum of
pixels under a white rectangle from a sum of pixels under a black rectangle

Department of E & C, RIT 96

48
96
1/12/2022

Haar-like Features

 considers adjacent rectangular regions at a specific location in a detection


window
 sums up the pixel intensities in each region and calculates the difference
between these sums
 This difference is then used to categorize subsections of an image
 For example, with a human face, it is a common observation that among all faces
the region of the eyes is darker than the region of the cheeks
 a common Haar feature for face detection is a set of two adjacent rectangles that
lie above the eye and the cheek region
 position of these rectangles is defined relative to a detection window that acts like
a bounding box to the target object (the face in this case)

Department of E & C, RIT 97

97

Haar-like Features

 Detection phase
 a window of the target size is moved over the input image
 for each subsection of the image the Haar-like feature is calculated
 difference is then compared to a learned threshold that separates non-objects from
objects
 This is only a weak learner or classifier (its detection quality is slightly better than
random guessing)
 a large number of Haar-like features are necessary to describe an object with sufficient
accuracy
 organized in something called a classifier cascade to form a strong learner or classifier
 key advantage: calculation speed
 Due to the use of integral images, a Haar-like feature of any size can be calculated in
constant time (approximately 60 microprocessor instructions for a 2-rectangle feature)
Department of E & C, RIT 98

49
98
1/12/2022

Integral Images

 Summed-area tables
 defined as two-dimensional lookup tables in the form of a matrix with the
same size of the original image
 Each element of the integral image contains the sum of all pixels located on
the up-left region of the original image (in relation to the element's position)
 This allows to compute sum of rectangular areas in the image, at any position
or scale, using only four lookups
 Each Haar-like feature may need more than four lookups, depending on how
it was defined
 2-rectangle features need six lookups
 3-rectangle features need eight lookups
 4-rectangle features need nine lookups
Department of E & C, RIT 99

99

Integral Images

Department of E & C, RIT 100

50
100
1/12/2022

Face Detection

Department of E & C, RIT 101

101

Face Detection

Department of E & C, RIT 102

51
102
1/12/2022

Haar-like Features

 region of the eyes is often darker than


the region of the nose and cheeks
 eyes are darker than the bridge of the
nose

Department of E & C, RIT 103

103

Face Detection with Haar-like Features

 Each Haar-like feature is only a weak classifier, and hence a large number of Haar-like features are
required to detect a face with good accuracy
 A huge number of Haar-like features are computed for all possible sizes and locations of each Haar-like
kernel using the integral images
 an AdaBoost ensemble classifier is used to select important features from the huge number of features
and combine them into a strong classifier model during the training phase
 The model learned is then used to classify a face region with the selected features.
 Most of the regions in an image is a non-face region in general. So, first it is checked whether a window
is not a face region. If it is not, it is discarded in a single shot and a different region is inspected where a
face is likely to be found. This ensures that more time is dedicated to checking a possible face region
 In order to implement this idea, the concept of cascade of classifiers is introduced. Instead of applying
all the huge number of features on a window, the features are grouped into different stages of
classifiers and applied one-by-one
 The first few stages contain very few features. If a window fails at the first stage it is discarded, and the
remaining features on it are not considered. If it passes, the second stage of features are applied, and so
on and so forth. A face region corresponds to the window that passes all the stages

Department of E & C, RIT 104

52
104
1/12/2022

Stages

Department of E & C, RIT 105

105

Adaptive Boosting Classifier (AdaBoost)

 combines multiple classifiers to increase the accuracy of classifiers


 an iterative ensemble method
 builds a strong classifier by combining multiple poorly performing classifiers - high
accuracy strong classifier
 set the weights of classifiers and training the data sample in each iteration such
that it ensures the accurate predictions of unusual observations
 Any machine learning algorithm can be used as base classifier if it accepts weights
on the training set
 Adaboost should meet two conditions:
 The classifier should be trained interactively on various weighed training examples
 In each iteration, it tries to provide an excellent fit for these examples by minimizing
training error

Department of E & C, RIT 106

53
106
1/12/2022

Algorithm

1. Initially, Adaboost selects a training subset randomly


2. It iteratively trains the AdaBoost machine learning model by selecting the
training set based on the accurate prediction of the last training
3. It assigns the higher weight to wrong classified observations so that in the
next iteration these observations will get the high probability for
classification
4. Also, It assigns the weight to the trained classifier in each iteration according
to the accuracy of the classifier. The more accurate classifier will get high
weight.
5. This process iterate until the complete training data fits without any error or
until reached to the specified maximum number of estimators.
6. To classify, perform a "vote" across all of the learning algorithms you built.

Department of E & C, RIT 107

107

AdaBoost

 Pros
 AdaBoost is easy to implement.
 It iteratively corrects the mistakes of the weak classifier and improves accuracy by
combining weak learners.
 You can use many base classifiers with AdaBoost.
 AdaBoost is not prone to overfitting. This can be found out via experiment results,
but there is no concrete reason available.
 Cons
 AdaBoost is sensitive to noise data.
 It is highly affected by outliers because it tries to fit each point perfectly.
 AdaBoost is slower compared to XGBoost.
Department of E & C, RIT 108

54
108
1/12/2022

AdaBoost

Department of E & C, RIT 109

109

Summary

 feature detection and extraction techniques to compute different types of


feature descriptors from an image
 local feature detectors and descriptors for an image, along with their desirable
properties
 Harris Corner Detectors to detect corner interest points of an image and use
them to match two images
 blob detection using LoG/DoG/DoH filters
 HOG, SIFT, ORB, BRIEF binary detectors/descriptors and match images with
these features
 Haar-like features and face detection with the Viola—Jones algorithm

Department of E & C, RIT 110

55
110

You might also like