0% found this document useful (0 votes)
49 views25 pages

CV MOD 1 Notes

The document provides an overview of computer vision, detailing its definition, challenges, applications, and historical development from the 1970s to the 2010s. It discusses key concepts such as photometric image formation, lighting, reflectance, and shading models, including the Bidirectional Reflectance Distribution Function (BRDF) and various reflection types. Additionally, it covers global illumination techniques and the optics involved in capturing images through lenses.

Uploaded by

manumanusha298
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views25 pages

CV MOD 1 Notes

The document provides an overview of computer vision, detailing its definition, challenges, applications, and historical development from the 1970s to the 2010s. It discusses key concepts such as photometric image formation, lighting, reflectance, and shading models, including the Bidirectional Reflectance Distribution Function (BRDF) and various reflection types. Additionally, it covers global illumination techniques and the optics involved in capturing images through lenses.

Uploaded by

manumanusha298
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

COMPUTER VISION (BAI515A)

MODULE-1
What is Computer Vision?
Humans easily understand the 3D world around them, like recognizing the shape of objects
and people's expressions. Computer vision is a field where researchers use math to make
computers understand and interpret images similarly.
Challenges in Computer Vision:
Even though there have been advancements, making a computer understand an image like a
human is challenging. Vision is a complex "inverse problem," where we try to figure out
unknowns using limited information.
Computer vision uses physics-based models, probabilistic models, and machine learning to
interpret images.
Applications of computer vision and simplify the explanation:
Optical Character Recognition (OCR) and ANPR: Reading handwritten postal codes
and recognizing license plates automatically.
Machine Inspection: Checking the quality of parts using stereo vision for aircraft wings or
auto body parts, or detecting defects in steel castings with X-ray vision.
Retail: Using object recognition for automated checkout lanes and fully automated stores.
Warehouse Logistics: Enabling autonomous package delivery, pallet-carrying "drives," and
robotic parts picking.
Medical Imaging: Registering pre-operative and intra-operative imagery, studying brain
morphology over time.
Self-Driving Vehicles: Driving point-to-point between cities and autonomous flight.
3D Model Building (Photogrammetry): Automatically constructing 3D models from
aerial and drone photographs.
Match Move: Merging computer-generated imagery (CGI) with live action footage, widely

KNSIT, DEPT. OF AIML 1


COMPUTER VISION (BAI515A)

used in Hollywood movies.


Motion Capture (Mocap): Capturing actors for computer animation using retro-reflective
markers and vision-based techniques.
Surveillance: Monitoring for intruders, analyzing highway traffic, and watching over
swimming pools for potential drownings.
Fingerprint Recognition and Biometrics: Used for automatic access authentication and
forensic applications.
Consumer-Level Applications: Including photo stitching for panoramas, exposure
bracketing for better photos under challenging lighting conditions, morphing images, 3D
modeling from snapshots, video match move and stabilization, and more.

BRIEF HISTORY

1970s:

KNSIT, DEPT. OF AIML 2


COMPUTER VISION (BAI515A)

 Computer vision started in the 1970s as part of a broader effort to make robots
intelligent.
 The early pioneers believed solving the "visual input" problem (teaching computers to
understand images) would be easy compared to higher-level thinking.
 Initial attempts involved extracting edges from images and inferring 3D structure from
2D lines.

1980s:
 More advanced mathematical techniques were developed for quantitative image and
scene analysis.
 Techniques like image pyramids and wavelets were introduced for image blending and
correspondence search. Stereo vision for 3D shape perception became popular, along
with various edge and contour detection methods.

1990s:
 Efforts to solve the structure from motion problem increased, leading to techniques
like projective reconstructions and factorization.
 Physics-based vision, using detailed color and intensity measurements, gained
attention.
 Advances in optical flow methods, stereo correspondence, and 3D range data
processing continued.

2000s:
 The decade saw a closer integration of computer vision with computer graphics.
 Techniques like image stitching, light-field rendering, and 3D modeling gained
prominence. Computational photography emerged, involving HDR imaging, exposure
bracketing, and tone mapping.

KNSIT, DEPT. OF AIML 3


COMPUTER VISION (BAI515A)

2010s:
 The use of large labeled datasets and powerful GPUs revolutionized machine learning
in computer vision.
 Deep neural networks, especially convolutional neural networks (CNNs), became
dominant for image recognition and segmentation.
 Computational photography techniques like image stitching, light-field capture, and
HDR merged with mainstream photography.
 Object recognition shifted towards feature-based techniques combined with learning
approaches.

PHOTOMETRIC IMAGE FORMATION


Photometric image formation in computer vision focuses on how light interacts with
surfaces and how this interaction is captured and processed to form images.
Understanding this process is crucial for various applications, including object
recognition, 3D reconstruction, and scene understanding.

Figure 1. A simplified model of photometric image formation. Light is emitted by


one or more light sources and is then reflected from an object’s surface. A portion
KNSIT, DEPT. OF AIML 4
COMPUTER VISION (BAI515A)

of this light is directed towards the camera. This simplified model ignores multiple
reflections, which often occur in real-world scenes.

Lighting
 Light Sources:
Images require light. Scenes are illuminated by one or more light sources.
Light sources can be categorized as point or area light sources.
Point light sources, originate from a single location and have intensity and color.
Area light sources, like fluorescent fixtures, emit light over a finite area.
 Point Light Sources:
Originate at a specific location (e.g., light bulb or the Sun).
Have intensity and a color spectrum (distribution over wavelengths).
Intensity falls off with the square of the distance between the source and the illuminated
object.
 Area Light Sources:
Can be modeled as a finite rectangular area emitting light equally in all directions.
More complex distributions may use a four-dimensional light field or an environment map.
Environment maps represent incident light directions to color values, assuming all light
sources are at infinity.
Reflectance and Shading
Reflectance: When light hits an object's surface, it is scattered and reflected.
Bidirectional Reflectance Distribution Function (BRDF) is a general model describing this
interaction.
Shading Models: Different models (e.g., diffuse, specular, Phong) are used to simulate
how light interacts with surfaces.
These models help understand how light contributes to the color and intensity of a pixel.
KNSIT, DEPT. OF AIML 5
COMPUTER VISION (BAI515A)

Global Illumination: Models are used to compute global illumination corresponding to a


scene, considering complex interactions between light and surfaces.

n^ n^ ^vr
^
vi
θi θr

φr
φi
d^

(a) (b)

Figure 2 (a) Light scatters when it hits a surface. (b) The bidirectional reflectance
distribution function (BRDF) f (θi, φi, θr, φr) is parameterized by the angles that the inci-
dent, vˆi, and reflected, vˆr, light ray directions make with the local surface coordinate
frame (dˆx, dˆy , nˆ).

The Bidirectional Reflectance Distribution Function (BRDF)


The most general model of light scattering is the bidirectional reflectance distribution
function (BRDF).5 Relative to some local coordinate frame on the surface, the BRDF is a
four- dimensional function that describes how much of each wavelength arriving at an
incident direction vˆi is emitted in a reflected direction vˆr (Figure 2b). The function can be
written in terms of the angles of the incident and reflected directions relative to the surface
frame as
fr(θi, φi, θr, φr; λ).
The BRDF is reciprocal, i.e., because of the physics of light transport, you can interchange
the roles of v̂ i and vˆ r and still get the same answer.
Most surfaces are isotropic, i.e., there are no preferred directions on the surface as far as

KNSIT, DEPT. OF AIML 6


COMPUTER VISION (BAI515A)

light transport is concerned.


For an isotropic material, we can simplify the BRDF to
fr(θi, θr, |φr − φi|; λ) or f r (v̂ i , v̂ r , n̂ ; λ),

since the quantities θi, θr and φr − φi can be computed from the directions

vˆ i , vˆ r , and nˆ .

To calculate the amount of light exiting a surface point p in a direction vˆ r under a

given lighting condition, we integrate the product of the incoming light L i (v̂ i ; λ)

with the BRDF Taking into account the foreshortening factor cos+ θi, we obtain

ʃ
Lr (v̂ r ; λ) = Li (v̂ i ; λ)fr (v̂ i , v̂ r , n̂ ; λ) cos+ θi dv̂ i ,
where
If the light sources are discrete (a finite number of point light sources), we
can replace the integral with a summation,

L r (v̂ r ; λ) =⅀Li (λ)fr (v̂ i , v̂ r , n̂ ; λ) cos+ θi.


i

Diffuse reflection
The diffuse component (also known as Lambertian or matte reflection) scatters light
uniformly in all directions and is the phenomenon we most normally associate with
shading, e.g., the smooth (non-shiny) variation of intensity with surface normal that is seen
when observing a statue. Diffuse reflection also often imparts a strong body color to the
light since it is caused by selective absorption and re-emission of light inside the object’s
material
While light is scattered uniformly in all directions, i.e., the BRDF is constant,

fd (v̂ i , v̂ r , n̂ ; λ) = fd(λ),
the amount of light depends on the angle between the incident light direction and the
KNSIT, DEPT. OF AIML 7
COMPUTER VISION (BAI515A)

surface normal θi. This is because the surface area exposed to a given amount of light
becomes larger at oblique angles, becoming completely self-shadowed as the outgoing
surface normal points away from the light (Figure 3a). The shading equation for diffuse
reflection can thus be written as

Specular reflection
The second major component of a typical BRDF is specular (gloss or highlight) reflection,
which depends strongly on the direction of the outgoing light. Consider light reflecting off a
mirrored surface (Figure 3b). Incident light rays are reflected in a direction that is rotated by
180◦ around the surface normal n̂ .
we can compute the specular reflection direction sˆi as
sˆi = v − v⊥ = (2n̂ n̂ T – I )vi.
The amount of light reflected in a given direction vˆr thus depends on the angle θs =
cos−1(vˆr · sˆi) between the view direction vˆr and the specular direction sˆi. For example,
the Phong (1975) model uses a power of the cosine of the angle,

fs(θs; λ) = ks(λ) coske θs,

while the Torrance and Sparrow (1967) micro-facet model uses a Gaussian,
fs(θs; λ) = k s(λ) exp(−c
s s θ ).
2 2

Larger exponents ke (or inverse Gaussian widths cs) correspond to more specular surfaces
with distinct highlights, while smaller exponents better model materials with softer gloss.

KNSIT, DEPT. OF AIML 8


COMPUTER VISION (BAI515A)

Figure 3: (a) The diminution of returned light caused by foreshortening depends on v in, the
cosine of the angle between the incident light direction vi and the surface normal n. (b)
Mirror (specular) reflection: The incident light ray direction vi is reflected onto the specular
direction si around the surface normal n.
Phong shading
Phong (1975) combined the diffuse and specular components of reflection with
another term, which he called the ambient illumination. This term accounts for the
fact that objects are generally illuminated not only by point light sources but also by a
general diffuse illumination corresponding to inter-reflection (e.g., the walls in a
room) or distant sources, such as the blue sky. In the Phong model, the ambient term
does not depend on surface orientation, but depends on the color of both the ambient
illumination La(λ) and the object ka(λ),

KNSIT, DEPT. OF AIML 9


COMPUTER VISION (BAI515A)

Figure 4: shows a typical set of Phong shading model components as a function of


the angle away from the surface normal (in a plane containing both the lighting
direction and the viewer).

Figure 4 Cross-section through a Phong shading model BRDF for a fixed incident illu-
mination direction: (a) component values as a function of angle away from surface normal;
(b) Polar plot. The value of the Phong exponent ke is indicated by the “Exp” labels and the
light source is at an angle of 30◦ away from the normal.
Typically, the ambient and diffuse reflection color distributions k a(λ) and kd(λ) are the
same, since they are both due to sub-surface scattering (body reflection) inside the surface
material (Shafer 1985). The specular reflection distribution ks(λ) is often uniform (white),
since it is caused by interface reflections that do not change the light color.
The ambient illumination L a(λ) often has a different color cast from the direct light sources
Li(λ), e.g., it may be blue for a sunny outdoor scene or yellow for an interior lit with

KNSIT, DEPT. OF AIML 10


COMPUTER VISION (BAI515A)

candles or incandescent lights. The diffuse component of the Phong model (or of any
shading model) depends on the angle of the incoming light source vˆ i, while the specular
component depends on the relative angle between the viewer vr and the specular reflection
direction sˆi (which itself depends on the incoming light direction vˆ i and the surface normal
nˆ).
Di-chromatic reflection model
The Torrance and Sparrow (1967) model of reflection also forms the basis of
Shafer’s (1985) di-chromatic reflection model, which states that the apparent color
of a uniform material lit from a single source depends on the sum of two terms,

i.e., the radiance of the light reflected at the interface, Li, and the radiance reflected
at the surface body, Lb. Each of these, in turn, is a simple product between a relative
power spectrum c(λ), which depends only on wavelength, and a magnitude m (vˆr,
vˆi, nˆ), which depends only on geometry.
Global illumination (ray tracing and radiosity)
The simple shading model presented thus far assumes that light rays leave the light
sources, bounce off surfaces visible to the camera, thereby changing in intensity or
color, and arrive at the camera. In reality, light sources can be shadowed by
occluders and rays can bounce multiple times around a scene while making their
trip from a light source to the camera.
Two methods have traditionally been used to model such effects. If the scene is
mostly specular (the classic example being scenes made of glass objects and
KNSIT, DEPT. OF AIML 11
COMPUTER VISION (BAI515A)

mirrored or highly polished balls), the preferred approach is ray tracing or path
tracing, which follows individual rays from the camera across multiple bounces
towards the light sources (or vice versa). If the scene is composed mostly of
uniform albedo simple geometry illuminators and surfaces, radiosity (global
illumination) techniques are preferred.
Radiosity works by associating lightness values with rectangular surface areas in
the scene (including area light sources). The amount of light interchanged between
any two (mutually visible) areas in the scene can be captured as a form factor,
which depends on their relative orientation and surface reflectance properties, as
well as the 1/r2 fall-off as light is distributed over a larger effective sphere the
further away it is.
Optics
Once the light from a scene reaches the camera, it must still pass through the lens
before reaching the sensor (analog film or digital silicon). For many applications, it
suffices to treat the lens as an ideal pinhole that simply projects all rays through a

common center of projection.


Figure 5 shows a diagram of the most basic lens model, i.e., the thin lens

KNSIT, DEPT. OF AIML 12


COMPUTER VISION (BAI515A)

composed of a single piece of glass with very low, equal curvature on both sides.
According to the lens law (which can be derived using simple geometric arguments
on light ray refraction), the relationship between the distance to an object z o and the
distance behind the lens at which a focused image is formed zi can be expressed as

Where f is called the focal length of the lens. If we let z o → ∞, i.e., we adjust the
lens (move the image plane) so that objects at infinity are in focus, we get z i = f ,
which is why we can think of a lens of focal length f as being equivalent (to a first
approximation) to a pinhole a distance f from the focal plane.
THE DIGITAL CAMERA
In the world of digital imaging, understanding how light captured by a camera sensor is
converted into digital values is crucial. Here's a simplified explanation of the process:
Light Sensing Technology:
Modern digital cameras use either charge-coupled devices (CCD) or complementary metal
oxide on silicon (CMOS) sensors.
CCD Sensors:
Photons accumulate in each pixel during exposure.
Charges transfer through a "bucket brigade" to sense amplifiers. Anti-blooming technology
prevents overflow.
CMOS Sensors:
Photons directly affect a pixel's conductivity.
Gating controls exposure, and local amplification occurs. CMOS is widely used due to its
efficiency.

KNSIT, DEPT. OF AIML 13


COMPUTER VISION (BAI515A)

Figure 3. Image sensing pipeline, showing the various sources of noise as well as typical
digital post-processing steps.
Key Factors Affecting Sensor Performance:
Shutter Speed: Determines exposure time. Controls the amount of light reaching the
sensor. Affects motion blur in dynamic scenes.
Sampling Pitch: Physical spacing between adjacent sensor cells. Smaller pitch provides
higher resolution but may reduce sensitivity.
Fill Factor: Active sensing area size relative to the available area. Higher fill factor
captures more light and reduces aliasing.
Chip Size: Larger chips are more light-sensitive but more expensive. Digital SLRs aim for
larger chip sizes, resembling traditional film frames.
Analog Gain: Boosts sensed signal before conversion. User-controlled through ISO
setting. Higher gain improves low-light performance but may amplify noise.
Sensor Noise: Various types of noise added during sensing. Includes fixed pattern noise,

KNSIT, DEPT. OF AIML 14


COMPUTER VISION (BAI515A)

dark current noise, shot noise, etc.


Noise level varies with incoming light, exposure time, and sensor gain.
ADC Resolution: Analog-to-digital conversion step. Quoted bits may exceed usable bits.
Calibration required to estimate actual usable bits.
Digital Post-Processing: After conversion, cameras undergo digital signal processing
(DSP) operations, including CFA demosaicing, white point setting, and gamma correction
to enhance images before compression and storage.
IMAGE PROCESSING
Point operators
The simplest kinds of image processing transforms are point operators, where each
output pixel’s value depends on only the corresponding input pixel value (plus, potentially,
some globally collected information or parameters). Examples of such operators include
brightness and contrast adjustments (Figure 6) as well as color correction and
transformations. In the image processing literature, such operations are also known as point
processes.

KNSIT, DEPT. OF AIML 15


COMPUTER VISION (BAI515A)

Figure 6: Some local image processing operations: (a) original image along with its
three color (per-channel) histograms; (b) brightness increased (additive offset, b =
16); (c) contrast increased (multiplicative gain, a = 1.1); (d) gamma (partially)
linearized (γ = 1.2); (e) full histogram equalization; (f) partial histogram

KNSIT, DEPT. OF AIML 16


COMPUTER VISION (BAI515A)

equalization.
Pixel transforms
A general image processing operator is a function that takes one or more input
images and produces an output image. In the continuous domain, this can be
denoted as
g(x) = h(f (x)) or g(x) = h(f0(x), . . . , fn(x)),
where x is in the D-dimensional domain of the functions (usually D = 2 for images)
and the functions f and g operate over some range, which can either be scalar or
vector-valued, e.g., for color images or 2D motion. For discrete (sampled) images,
the domain consists of a finite number of pixel locations, x = (i, j), and we can write
g(i, j) = h(f (i, j))

Figure 7: Visualizing image data: (a) original image; (b) cropped portion and
scanline plot using an image inspection tool; (c) grid of numbers; (d) surface plot.
For figures (c)–(d), the image was first converted to grayscale.
Figure 7 shows how an image can be represented either by its color (appearance), as
a grid of numbers, or as a two-dimensional function (surface plot).
Two commonly used point processes are multiplication and addition with a
constant,
g(x) = af (x) + b.
KNSIT, DEPT. OF AIML 17
COMPUTER VISION (BAI515A)

The parameters a > 0 and b are often called the gain and bias parameters; sometimes
these parameters are said to control contrast and brightness. The bias and gain
parameters can also be spatially varying,
g(x) = a(x)f (x) + b(x)
e.g., when simulating the graded density filter used by photographers to selectively
darken the sky or when modeling vignetting in an optical system.
Multiplicative gain (both global and spatially varying) is a linear operation, since it
obeys the superposition principle,
h(f0 + f1) = h(f0) + h(f1).
Another commonly used dyadic (two-input) operator is the linear blend operator,
g(x) = (1 − α)f0(x) + αf1(x).
By varying α from 0 → 1, this operator can be used to perform a temporal cross-dissolve
between two images or videos, as seen in slide shows and film production, or as a component
of image morphing algorithms.
One highly used non-linear transform that is often applied to images before further
processing is gamma correction, which is used to remove the non-linear mapping
between input radiance and quantized pixel values. To invert the gamma mapping
applied by the sensor, we can use
g(x) = [f (x)]1/γ
where a gamma value of γ ≈ 2.2 is a reasonable fit for most digital cameras.
Color Transforms
Color transformations in point operators are essential techniques in image processing that
manipulate the intensity values of pixels in an image based on a defined mathematical
function. These transformations are applied independently to each pixel, making them
KNSIT, DEPT. OF AIML 18
COMPUTER VISION (BAI515A)

efficient for various applications, such as contrast enhancement, color correction, and
image segmentation.
COMPOSITING AND MATTING
In many photo editing and visual effects applications, it is often desirable to cut a
foreground object out of one scene and put it on top of a different background
(Figure 8). The process of extracting the object from the original image is often
called matting, while the process of inserting it into another image (without visible
artifacts) is called compositing.

Figure 8 Image matting and compositing (a) source image; (b) extracted foreground
object F; (c) alpha matte α shown in grayscale; (d) new composite C.
The intermediate representation used for the foreground object between these two
stages is called an alpha-matted color image (Figure 8b–c). In addition to the three
color RGB channels, an alpha-matted image contains a fourth alpha channel α (or
A) that describes the relative amount of opacity or fractional coverage at each pixel
(Figures 8c and 8b). The opacity is the opposite of the transparency. Pixels within
the object are fully opaque (α = 1), while pixels fully outside the object are
transparent (α = 0). Pixels on the boundary of the object vary smoothly between
these two extremes, which hides the perceptual visible jaggies that occur if only
binary opacities are used.

KNSIT, DEPT. OF AIML 19


COMPUTER VISION (BAI515A)

Figure 9 Compositing equation C = (1 − α)B + αF . The images are taken from a


close-up of the region of the hair in the upper right part of the lion in Figure 8.

To composite a new (or foreground) image on top of an old (background) image,


the over operator is used,
C = (1 − α)B + αF.
This operator attenuates the influence of the background image B by a factor (1 − α)
and then adds in the color (and opacity) values corresponding to the foreground
layer F, as shown in Figure 9.
HISTOGRAM EQUALIZATION
Histogram equalization is a widely used technique in image processing to enhance
the contrast of an image. It aims to improve the visual quality by redistributing the
intensity values across the entire range, making features more distinguishable.
To perform histogram equalization, i.e., to find an intensity mapping function f (I)
such that the resulting histogram is flat. The trick to finding such a mapping is the
same one that people use to generate random samples from a probability density
function, which is to first compute the cumulative distribution function.

KNSIT, DEPT. OF AIML 20


COMPUTER VISION (BAI515A)

Where N is the number of pixels in the image or students in the class. For any given
grade or intensity, we can look up its corresponding percentile c(I) and determine
the final value that pixel should take. When working with eight-bit pixel values, the
I and c axes are rescaled from [0, 255].
Linear Filtering
Linear filtering is a fundamental technique in image processing used for a variety of
applications, including noise reduction, edge detection, and image enhancement. It
involves the application of a linear filter (or kernel) to an image, which modifies the
pixel values based on their neighborhood.
The most commonly used type of neighborhood operator is a linear filter, in which
an output pixel’s value is determined as a weighted sum of input pixel values.

The entries in the weight kernel or mask h(k, l) are often called the filter
coefficients. The above correlation operator can be more compactly notated as
g = f ⊗ h.
A common variant on this formula is

where the sign of the offsets in f has been reversed. This is called the convolution
operator,
g = f ∗ h,
and h is then called the impulse response function. The reason for this name is that

KNSIT, DEPT. OF AIML 21


COMPUTER VISION (BAI515A)

the kernel function, h, convolved with an impulse signal, δ(i, j) (an image that is 0
everywhere except at the origin) reproduces itself, h ∗ δ = h, whereas correlation
produces the reflected signal.
The superposition (addition) of shifted impulse response functions h(i−k, j −l)
multiplied by the input pixel values f (k, l).
Both correlation and convolution are linear shift-invariant (LSI) operators, which obey
both the superposition principle

and the shift invariance principle,

which means that shifting a signal commutes with applying the operator (◦ stands for the
LSI operator).

Figure 10: One-dimensional signal convolution as a sparse matrix-vector multiply, g = Hf.

Occasionally, a shift-variant version of correlation or convolution may be used, e.g.,

where h(k, l; i, j) is the convolution kernel at pixel


(i, j). For example, such a spatially varying kernel can be used to model blur in an image
due to variable depth-dependent defocus.

KNSIT, DEPT. OF AIML 22


COMPUTER VISION (BAI515A)

Correlation and convolution can both be written as a matrix-vector multiply, if we first


convert the two-dimensional images f (i, j) and g(i, j) into raster-ordered vectors f and g,
g = Hf
where the (sparse) H matrix contains the convolution kernels. Figure 10 shows how a one-
dimensional convolution can be represented in matrix-vector form.
Padding (border effects)
The astute reader will notice that the matrix multiply shown in Figure 10 suffers from
boundary effects, i.e., the results of filtering the image in this form will lead to a darkening
of the corner pixels. This is because the original image is effectively being padded with 0
values wherever the convolution kernel extends beyond the original image boundaries.
To compensate for this, a number of alternative padding or extension modes have been
developed (Figure 11):
Zero: set all pixels outside the source image to 0 (a good choice for alpha-matted cutout
images);
Constant (border color): set all pixels outside the source image to a specified border
value;
Clamp (replicate or clamp to edge): repeat edge pixels indefinitely;
(Cyclic) wrap (repeat or tile): loop “around” the image in a “toroidal” configuration;
Mirror: reflect pixels across the image edge;
Extend: extend the signal by subtracting the mirrored version of the signal from the edge
pixel value.

KNSIT, DEPT. OF AIML 23


COMPUTER VISION (BAI515A)

Figure 11: Border padding (top row) and the results of blurring the padded image (bottom
row). The normalized zero image is the result of dividing (normalizing) the blurred zero-
padded RGBA image by its corresponding soft alpha value.

KNSIT, DEPT. OF AIML 24


COMPUTER VISION (BAI515A)

KNSIT, DEPT. OF AIML 25

You might also like