0% found this document useful (0 votes)
14 views

OpenCV

The document provides an overview of various OpenCV functions for image and video processing, including reading and displaying images, writing images to files, and manipulating video streams. It explains key functions such as imread, imshow, VideoCapture, and VideoWriter, along with details on image resizing, rotation, translation, and drawing shapes. Additionally, it covers color spaces like RGB, LAB, YCrCb, and HSV, and discusses how images are represented as 3D NumPy arrays in OpenCV.

Uploaded by

joshismita174
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

OpenCV

The document provides an overview of various OpenCV functions for image and video processing, including reading and displaying images, writing images to files, and manipulating video streams. It explains key functions such as imread, imshow, VideoCapture, and VideoWriter, along with details on image resizing, rotation, translation, and drawing shapes. Additionally, it covers color spaces like RGB, LAB, YCrCb, and HSV, and discusses how images are represented as 3D NumPy arrays in OpenCV.

Uploaded by

joshismita174
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

imread(filename, flags)

It takes two arguments:


1. The first argument is the image name, which requires a fully qualified pathname to
the file.
2. The second argument is an optional flag that lets you specify how the image should be
represented. OpenCV offers several options for this flag, but those that are most
common include:
• cv2.IMREAD_UNCHANGED or -1
• cv2.IMREAD_GRAYSCALE or 0
• cv2.IMREAD_COLOR or 1 (default)
OpenCV reads color images in BGR format.
imshow(window_name, image)
This function also takes two arguments:
1. The first argument is the window name that will be displayed on the window.
2. The second argument is the image that you want to display.
To display multiple images at once, specify a new window name for every image you want
to display.
The imshow() function is designed to be used along with
the waitKey() and destroyAllWindows() / destroyWindow() functions.
The waitKey() function is a keyboard-binding function.
• It takes a single argument, which is the time (in milliseconds), for which the window
will be displayed.
• If the user presses any key within this time period, the program continues.
• If 0 is passed, the program waits indefinitely for a keystroke.
• You can also set the function to detect specific keystrokes like the Q key or the ESC
key on the keyboard, thereby telling more explicitly which key shall trigger which
behavior.
The function destroyAllWindows() destroys all the windows we created.
Using destroyAllWindows() also clears the window or image from the main memory of the
system.
imwrite(filename, image).
This function allows you to write (save) images after processing them, such as after applying
filters or transformations.
1. The first argument is the filename, which must include the filename extension (for
example .png, .jpg etc). OpenCV uses this filename extension to specify the format of
the file.
2. The second argument is the image you want to save. The function returns True if the
image is saved successfully.
VideoCapture(path, apiPreference)
The first argument is the filename/path to the video file. The second is an optional argument,
indicating an API preference.
The isOpened() method returns a boolean that indicates whether or not the video stream is
valid. Otherwise you will get an error message. Assuming the video file was opened
successfully, we can use the get() method to retrieve important metadata associated with the
video stream. Note that this method does not apply to web cameras. This function is used to
access various attributes, such as frame dimensions, frame rate, or codec, of video streams or
video files.
The vid_capture.read() method returns a tuple, where the first element is a boolean and the
next element is the actual video frame. When the first element is True, it indicates the video
stream contains a frame to read.
while(vid_capture.isOpened()):
# vCapture.read() methods returns a tuple, first element is a bool
# and the second is frame

ret, frame = vid_capture.read()


if ret == True:
cv2.imshow('Frame',frame)
k = cv2.waitKey(20)
# 113 is ASCII code for q key
if k == 113:
break
else:
break
• If your system has a built-in webcam, then the device index for the camera will be ‘0’.
• If you have more than one camera connected to your system, then the device index
associated with each additional camera is incremented (e.g. 1, 2, etc).
VideoWriter(filename, apiPreference, fourcc, fps, frameSize[, isColor])
• filename: pathname for the output video file
• apiPreference: API backends identifier
• fourcc: 4-character code of codec, used to compress the frames (fourcc)
• fps: Frame rate of the created video stream
• frame_size: Size of the video frames
• isColor: If not zero, the encoder will expect and encode color frames. Else it will work
with grayscale frames (the flag is currently supported on Windows only).
A special convenience function is used to retrieve the four-character codec, required as the
second argument to the video writer object, cv2.
• VideoWriter_fourcc('M', 'J', 'P', 'G') in Python.
• VideoWriter::fourcc('M', 'J', 'P', 'G') in C++.
image.shape in Python returns three values: Height, width and number of channels.
• image.size().width returns the width
• image.size().height returns the height
resize(src, dsize[, dst[, fx[, fy[, interpolation]]]])
• src: It is the required input image, it could be a string with the path of the input image
(eg: ‘test_image.png’).
• dsize: It is the desired size of the output image, it can be a new height and width.
• fx: Scale factor along the horizontal axis.
• fy: Scale factor along the vertical axis.
• interpolation: It gives us the option of different methods of resizing the image.
Different interpolation methods are used for different resizing purposes.
• INTER_AREA: INTER_AREA uses pixel area relation for resampling. This is best
suited for reducing the size of an image (shrinking). When used for zooming into the
image, it uses the INTER_NEAREST method.
• INTER_CUBIC: This uses bicubic interpolation for resizing the image. While
resizing and interpolating new pixels, this method acts on the 4×4 neighboring pixels
of the image. It then takes the weights average of the 16 pixels to create the new
interpolated pixel.
• INTER_LINEAR: This method is somewhat similar to
the INTER_CUBIC interpolation. But unlike INTER_CUBIC, this uses 2×2
neighboring pixels to get the weighted average for the interpolated pixel.
• INTER_NEAREST: The INTER_NEAREST method uses the nearest neighbor
concept for interpolation. This is one of the simplest methods, using only one
neighboring pixel from the image for interpolation.
down_width = 300
down_height = 200
down_points = (down_width, down_height)
resize_down = cv2.resize(image, down_points, interpolation= cv2.INTER_LINEAR)
Scaling Factor or Scale Factor is usually a number that scales or multiplies some quantity, in
our case the width and height of the image. It helps keep the aspect ratio intact and preserves
the display quality. So the image does not appear distorted, while you
are upscaling or downscaling it.
scale_up_x = 1.2
scale_up_y = 1.2
# Scaling Down the image 0.6 times specifying a single scale factor.
scale_down = 0.6

scaled_f_down = cv2.resize(image, None, fx= scale_down, fy= scale_down, interpolation=


cv2.INTER_LINEAR)
scaled_f_up = cv2.resize(image, None, fx= scale_up_x, fy= scale_up_y, interpolation=
cv2.INTER_LINEAR)
• We define new scaling factors along the horizontal and vertical axis.
• Defining the scaling factors removes the need to have new points for width and height.
Hence, we keep dsize as None.

vertical = np.concatenate((res_inter_nearest, res_inter_linear, res_inter_area), axis=0)


• np.concatenate: This function from NumPy is used to join multiple arrays (or images
in this case) along a specified axis.
• Arguments:
o (res_inter_nearest, res_inter_linear, res_inter_area): These are the three images
being concatenated.
o axis=0: Specifies that the concatenation should happen along the vertical axis
(row-wise).
▪ This stacks the images on top of each other.
• Result: The variable vertical contains the vertically stacked combination of the three
images.
Images res_inter_nearest, res_inter_linear, res_inter_area: These are presumably resized
versions of the same image, but each was resized using a different interpolation method:
1. Nearest: Nearest-neighbor interpolation.
2. Linear: Bilinear interpolation.
3. Area: Pixel area relation interpolation.
cropped = img[start_row:end_row, start_col:end_col]
getRotationMatrix2D(center, angle, scale)
This function is used to generate a matrix that rotates an image by a specified angle around a
given point (usually the center of the image). This matrix can then be used with warpAffine()
to apply the rotation.
• center: the center of rotation for the input image
• angle: the angle of rotation in degrees
• scale: an isotropic scale factor that scales the image up or down according to the value
provided
If the angle is positive, the image gets rotated in the counter-clockwise direction. If you want
to rotate the image clockwise by the same amount, then the angle needs to be negative.
Rotation is a three-step operation:
1. First, you need to get the center of rotation. This typically is the center of the image
you are trying to rotate.
2. Next, create the 2D-rotation matrix. OpenCV provides
the getRotationMatrix2D() function that we discussed above.
3. Finally, apply the affine transformation to the image, using the rotation matrix you
created in the previous step. The warpAffine() function in OpenCV does the job.
The warpAffine() function applies an affine transformation to the image. After applying
affine transformation, all the parallel lines in the original image will remain parallel in the
output image as well.
warpAffine(src, M, dsize[, dst[, flags[, borderMode[, borderValue]]]])
• src: the source mage
• M: the transformation matrix
• dsize: size of the output image
• dst: the output image
• flags: combination of interpolation methods such as INTER_LINEAR or
INTER_NEAREST
• borderMode: the pixel extrapolation method
• borderValue: the value to be used in case of a constant border, has a default value of 0
• import cv2

• # Reading the image
• image = cv2.imread('image.jpg')

• # dividing height and width by 2 to get the center of the image
• height, width = image.shape[:2]
• # get the center coordinates of the image to create the 2D rotation matrix
• center = (width/2, height/2)

• # using cv2.getRotationMatrix2D() to get the rotation matrix
• rotate_matrix = cv2.getRotationMatrix2D(center=center, angle=45, scale=1)

• # rotate the image using cv2.warpAffine
• rotated_image = cv2.warpAffine(src=image, M=rotate_matrix, dsize=(width, height))

• cv2.imshow('Original image', image)
• cv2.imshow('Rotated image', rotated_image)
• # wait indefinitely, press any key on keyboard to exit
• cv2.waitKey(0)
• # save the rotated image to disk
• cv2.imwrite('rotated_image.jpg', rotated_image)
Image translation means shifting it by a specified number of pixels, along the x and y axes.
Let the pixels by which the image needs to shifted be tx and ty. Then you can define a
translation matrix :

(4)
Now, there are a few points you should keep in mind while shifting the image
by tx and ty values.
• Providing positive values for tx will shift the image to the right, and negative values
will shift the image to the left.
• Similarly, positive values of ty will shift the image down, while negative values will
shift the image up.
line(image, start_point, end_point, color, thickness)
circle(image, center_coordinates, radius, color, thickness)
Filled circle: thickness=-1
rectangle(image, start_point, end_point, color, thickness)
ellipse(image, centerCoordinates, axesLength, angle, startAngle, endAngle, color,
thickness)
Half-ellipse:
• Set the endAngle for the blue ellipse as 180 deg
• Change the orientation of the red ellipse from 90 to 0
• Specify the start and end angles for the red ellipse, as 0 and 180 respectively
• Specify the thickness of the red ellipse to be a negative number
ellipse_center = (415,190)
# define the axis point
axis1 = (100,50)

cv2.ellipse(halfEllipse, ellipse_center, axis1, 0, 180, 360, (255, 0, 0), thickness=3)


# if you want to draw a Filled ellipse, use this line of code
cv2.ellipse(halfEllipse, ellipse_center, axis1, 0, 0, 180, (0, 0, 255), thickness=-2)
putText(image, text, org, font, fontScale, color)
• As usual, the first argument is the input image.
• The next argument is the actual text string that we want to annotate the image with.
• The third argument specifies the starting location for the top left corner of the text
string.
• The next two arguments specify the font style and scale.
• OpenCV supports several font-face styles from the Hershey font collection, and
an italic font as well. Check out this list:
• FONT_HERSHEY_SIMPLEX = 0,
• FONT_HERSHEY_PLAIN = 1,
• FONT_HERSHEY_DUPLEX = 2,
• FONT_HERSHEY_COMPLEX = 3,
• FONT_HERSHEY_TRIPLEX = 4,
• FONT_HERSHEY_COMPLEX_SMALL = 5,
• FONT_HERSHEY_SCRIPT_SIMPLEX = 6,
• FONT_HERSHEY_SCRIPT_COMPLEX = 7,
• FONT_ITALIC = 16
• The font scale is a floating-point value, used to scale the base size of the font up or
down. Depending on the resolution of your image, select an appropriate font scale.
COLOR SPACES
The RGB Color Space
The RGB colorspace has the following properties
• It is an additive colorspace where colors are obtained by a linear combination of Red,
Green, and Blue values.
• The three channels are correlated by the amount of light hitting the surface.
The LAB Color-Space
The Lab color space has three components.
1. L – Lightness ( Intensity ).
2. a – color component ranging from Green to Magenta.
3. b – color component ranging from Blue to Yellow.
The Lab color space is quite different from the RGB color space. In RGB color space the
color information is separated into three channels but the same three channels also encode
brightness information. On the other hand, in Lab color space, the L channel is independent
of color information and encodes brightness only. The other two channels encode color.
The YCrCb Color-Space
The YCrCb color space is derived from the RGB color space and has the following three
compoenents.
1. Y – Luminance or Luma component obtained from RGB after gamma correction.
2. Cr = R – Y ( how far is the red component from Luma ).
3. Cb = B – Y ( how far is the blue component from Luma ).
This color space has the following properties.
• Separates the luminance and chrominance components into different channels.
• Mostly used in compression ( of Cr and Cb components ) for TV Transmission.
• Device dependent.
The HSV Color Space
The HSV color space has the following three components
1. H – Hue ( Dominant Wavelength ).
2. S – Saturation ( Purity / shades of the color ).
3. V – Value ( Intensity ).
Let’s enumerate some of its properties.
• Best thing is that it uses only one channel to describe color (H), making it very
intuitive to specify color. Images are stored as 3D NumPy arrays in OpenCV. The
• Device dependent. dimensions are (height, width, channels): Channel 0 → Blue,
Example: Channel 1 → Green, Channel 2 → Red
B = np.array([]) # Blue channel array Empty arrays (B, G, R) are created to store pixel values from
G = np.array([]) # Green channel array multiple images. These arrays will grow as new pixel values are
appended from each image.
R = np.array([]) # Red channel array
im = cv2.imread(fi) # Load an image im[:,:,0] extracts the Blue
b = im[:,:,0] # Extract the Blue channel channel of the image. Each
b = b.reshape(b.shape[0]*b.shape[1]) # Flatten the 2D array into 1D channel is a 2D array
representing pixel intensities for
g = im[:,:,1] # Extract the Green channel
that specific color.
g = g.reshape(g.shape[0]*g.shape[1]) # Flatten the 2D array into 1D reshape function converts a 2D
r = im[:,:,2] # Extract the Red channel array (image) into a 1D array
r = r.reshape(r.shape[0]*r.shape[1]) # Flatten the 2D array into 1D (vector). Flattening allows all
B = np.append(B,b) # Append the flattened Blue channel data pixel values to be stored in a
single array, making it easier to
G = np.append(G,g) # Append the flattened Green channel data
analyze or combine with data
R = np.append(R,r) # Append the flattened Red channel data from other images.
nbins = 10
plt.hist2d(B, G, bins=nbins, norm=LogNorm()) np.append adds the flattened
pixel values from each channel to
plt.xlabel('B') Pixel intensities are often unevenly their respective global arrays (B,
plt.ylabel('G') distributed, with some bins having G, R). This accumulates pixel data
plt.xlim([0,255]) very high counts. LogNorm applies from multiple images. 2D
plt.ylim([0,255]) a logarithmic scale to better histogram represents the joint
visualize both dense and sparse distribution of Blue (B) and Green
regions. (G) pixel intensities. 2D histogram
plt.xlim([0,255]) and represents the joint distribution
plt.ylim([0,255]) limit the of Blue (B) and Green (G) pixel
histogram axes to the valid range intensities. Each bin represents a
of pixel intensities (0–255 for 8-bit range of values for Blue and
images). Green intensities and counts how
many pixels fall within that range.
A convolution kernel is a 2D matrix that is used to filter images. Also known as a
convolution matrix, a convolution kernel is typically a square, MxN matrix, where
both M and N are odd integers (e.g. 3×3, 5×5, 7×7 etc.). See the 3×3 example matrix given
below.

Such kernels can be used to perform mathematical operations on each pixel of an image to
achieve a desired effect (like blurring or sharpening an image).

filter2D(src, ddepth, kernel)


• The first argument is the source image
• The second argument is ddepth, which indicates the depth of the resulting image. A
value of -1 indicates that the final image will also have the same depth as the source
image
• The final input argument is the kernel, which we apply to the source image
Kernel1=np.array([[0, 0, 0],
[0, 1, 0],
[0, 0, 0]]) # original image
BLURRING IMAGE
kernel2 = np.ones((5, 5), np.float32) / 25
img = cv2.filter2D(src=image, ddepth=-1, kernel=kernel2)

img_blur = cv2.blur(src=image, ksize=(5,5))# using built-in image


GaussianBlur(src, ksize, sigmaX[, dst[, sigmaY[, borderType]]])
• The first argument, src, specifies the source image that you want to filter.
• The second argument is ksize, which defines the size of the Gaussian kernel. Here, we
are using a 5×5 kernel.
• The final two arguments are sigmaX and sigmaY, which are both set to 0. These are
the Gaussian kernel standard deviations, in the X (horizontal) and Y (vertical)
direction. The default setting of sigmaY is zero. If you simply set sigmaX to zero,
then the standard deviations are computed from the kernel size (width and height
respectively). You can also explicitly set the size of each argument to positive values
greater than zero.
medianBlur(src, ksize)
This type of filter replaces each pixel's value with the median value of the intensities in its
neighborhood. The size of the neighborhood is defined by a square kernel.
SHARPENING THE IMAGE
kernel3 = np.array([[0, -1, 0],
[-1, 5, -1],
[0, -1, 0]])
sharp_img = cv2.filter2D(src=image, ddepth=-1, kernel=kernel3)
BILATERAL FILTERING
This technique applies the filter selectively to blur similar intensity pixels in a neighborhood.
Sharp edges are preserved, wherever possible.
Bilateral filtering essentially applies a 2D Gaussian (weighted) blur to the image, while also
considering the variation in intensities of neighboring pixels to minimize the blurring near
edges (which we wish to preserve).
bilateralFilter(src, d, sigmaColor, sigmaSpace)
• The first argument of the function is the source image.
• The next argument d, defines the diameter of the pixel neighborhood used for filtering.
• The next two arguments, sigmaColor and sigmaSpace define the standard deviation
of the (1D) color-intensity distribution and (2D) spatial distribution respectively.
• The sigmaSpace parameter defines the spatial extent of the kernel, in both the x
and y directions (just like the Gaussian blur filter previously described).
• The sigmaColor parameter defines the one-dimensional Gaussian distribution,
which specifies the degree to which differences in pixel intensity can be
tolerated.
THRESHOLDING
‘255’ is the brightest and ‘5’ the darkest. Recall that grayscale intensities range from pure
black (0) to pure white (255).
Thresholding is a technique to separate pixels in an image based on their intensity values. A
threshold value (TTT) is defined, and pixels are classified based on whether their intensity is
above or below this value. Used for tasks like segmentation, binarization, and preprocessing
for object detection.

cv2.threshold(src, thresh, maxval, type)


1.src: Input grayscale image.
2.thresh: Threshold value.
3.maxval: Maximum value assigned to output pixels for certain types.
4.type: Type of thresholding to apply (e.g., cv2.THRESH_BINARY,
cv2.THRESH_TRUNC).
Returns:
5. th: The used threshold value (useful for some types of thresholding).
6. dst: Thresholded image.
TEHNIQUES
1. Basic Threshold (cv2.THRESH_BINARY):
o Rule:

o Converts the image to a binary image with pixel values of 0 or 255.


o Example:
th, dst = cv2.threshold(src, 0, 255, cv2.THRESH_BINARY)
2. Inverse Binary Threshold (cv2.THRESH_BINARY_INV):
o Rule:

o Inverts the binary thresholding result:


th, dst = cv2.threshold(src, 127, 255, cv2.THRESH_BINARY_INV)
3. Truncated Threshold (cv2.THRESH_TRUNC):
o Rule:

o Values above the threshold are set to the threshold, while others remain
unchanged:
th, dst = cv2.threshold(src, 127, 255, cv2.THRESH_TRUNC)
4. To Zero Threshold (cv2.THRESH_TOZERO):
o Rule:

o Values below the threshold are set to 0, while others remain unchanged:
th, dst = cv2.threshold(src, 127, 255, cv2.THRESH_TOZERO)
5. Inverse To Zero Threshold (cv2.THRESH_TOZERO_INV):
o Rule:

o Values above the threshold are set to 0, while others remain unchanged:
th, dst = cv2.threshold(src, 127, 255, cv2.THRESH_TOZERO_INV)
EDGE DETECION: SOBEL AND CANNY

X-Direction Kernel

Y-Direction Kernel
• If we use only the Vertical Kernel, the convolution yields a Sobel image, with edges
enhanced in the X-direction
• Using the Horizontal Kernel yields a Sobel image, with edges enhanced in the Y-
direction.
Let and represent the intensity gradient in the and directions respectively. If
and denote the X and Y kernels defined above:

where denotes the convolution operator, and represents the input image.
The final approximation of the gradient magnitude, can be computed as

Sobel(src, ddepth, dx, dy)


The parameter ddepth specifies the precision of the output image, while dx and dy specify
the order of the derivative in each direction. For example:
• If dx=1 and dy=0, we compute the 1st derivative Sobel image in the x-direction.
If both dx=1 and dy=1, we compute the 1st derivative Sobel image in both directions
Example:
sobelx = cv2.Sobel(src=img_blur, ddepth=cv2.CV_64F, dx=1, dy=0, ksize=5) # Sobel Edge
Detection on the X axis
sobely = cv2.Sobel(src=img_blur, ddepth=cv2.CV_64F, dx=0, dy=1, ksize=5) # Sobel Edge
Detection on the Y axis
sobelxy = cv2.Sobel(src=img_blur, ddepth=cv2.CV_64F, dx=1, dy=1, ksize=5) # Combined
X and Y Sobel Edge Detection

CANNY EDGE DETECTION


Add to it image blurring, a necessary preprocessing step to reduce noise. This makes it a
four-stage process, which includes:
1. Noise Reduction
2. Calculating the Intensity Gradient of the Image
3. Suppression of False Edges
4. Hysteresis Thresholding
Canny(image, threshold1, threshold2)
Eg: edges = cv2.Canny(image=img_blur, threshold1=100, threshold2=200)
ANNOTATING IMAGES USING MOUSE
cv2.setMouseCallback(winname, onMouse, userdata)
Used to capture and handle mouse events in a specific window. It allows you to define a
custom callback function that processes mouse actions such as clicks, movement, and button
presses.
winname:
• Name of the window where the mouse callback will be active.
• This must match the name used when creating the window with cv2.imshow().
onMouse:
• The callback function to handle mouse events.
• This function must accept the following parameters:
def onMouse(event, x, y, flags, userdata):
pass
o event: The type of mouse event (e.g., cv2.EVENT_LBUTTONDOWN for a
left-click).
o x, y: The coordinates of the mouse pointer when the event occurred.
o flags: Indicates if any specific key or mouse button was pressed.
o userdata: Additional user-defined data passed to the callback
Mouse Event Types:
• cv2.EVENT_LBUTTONDOWN: Left mouse button pressed.
• cv2.EVENT_RBUTTONDOWN: Right mouse button pressed.
• cv2.EVENT_MBUTTONDOWN: Middle mouse button pressed.
• cv2.EVENT_MOUSEMOVE: Mouse movement.
• cv2.EVENT_LBUTTONUP: Left mouse button released.
• cv2.EVENT_RBUTTONUP: Right mouse button released.
• cv2.EVENT_LBUTTONDBLCLK: Left mouse button double-click.
Flags (key modifiers):
• cv2.EVENT_FLAG_CTRLKEY: CTRL key pressed.
• cv2.EVENT_FLAG_SHIFTKEY: SHIFT key pressed.
• cv2.EVENT_FLAG_ALTKEY: ALT key pressed.

userdata (optional):
• Any additional data you want to pass to the callback function.
Example:
import cv2
import numpy as np

# Define mouse callback function


def onMouse(event, x, y, flags, userdata):
if event == cv2.EVENT_LBUTTONDOWN:
print(f"Left click at ({x}, {y})")
# Draw a circle where the mouse clicked
cv2.circle(userdata, (x, y), 10, (255, 0, 0), -1)
cv2.imshow('Image', userdata)
# Create a blank image
image = np.zeros((400, 400, 3), dtype=np.uint8)

# Show the image


cv2.imshow('Image', image)

# Set the mouse callback


cv2.setMouseCallback('Image', onMouse, userdata=image)

# Wait for key press and close the window


cv2.waitKey(0)
cv2.destroyAllWindows()

Exmple:
# Import packages
import cv2

# Lists to store the bounding box coordinates


top_left_corner = []
bottom_right_corner = []

# function which will be called on mouse input


def drawRectangle(action, x, y, flags, *userdata):
# Referencing global variables
global top_left_corner, bottom_right_corner
# Mark the top left corner when left mouse button is pressed
if action == cv2.EVENT_LBUTTONDOWN:
top_left_corner = [(x, y)]
# When left mouse button is released, mark bottom right corner
elif action == cv2.EVENT_LBUTTONUP:
bottom_right_corner = [(x, y)]
# Draw the rectangle
cv2.rectangle(image, top_left_corner[0], bottom_right_corner[0], (0, 255, 0), 2, 8)
cv2.imshow("Window", image)

# Read Images
image = cv2.imread("Assets/cards.jpg")
# Make a temporary image, will be useful to clear the drawing
temp = image.copy()
# Create a named window
cv2.namedWindow("Window")
# highgui function called when mouse events occur
cv2.setMouseCallback("Window", drawRectangle)

k=0
# Close the window when key q is pressed
while k != 113:
# Display the image
cv2.imshow("Window", image)
k = cv2.waitKey(0)
# If c is pressed, clear the window, using the dummy image
if (k == 99):
image = temp.copy()
cv2.imshow("Window", image)

cv2.destroyAllWindows()

cv2.createTrackbar( trackbarName, windowName, value, count, onChange)


Used to create a trackbar (slider) in a specified window, which allows interactive adjustments
to a variable. Trackbars are particularly useful for tweaking parameters in real-time while
working with images or videos.
trackbarName:
• The name of the trackbar (slider).
• Appears in the specified window.
windowName:
• The name of the window where the trackbar will be displayed.
• Must match the name of an existing window created using cv2.imshow().
value:
• The initial position (value) of the trackbar slider.
count:
• The maximum value of the slider.
• The slider ranges from 0 to count.
onChange:
• A callback function triggered whenever the trackbar's value changes.
• This function must accept a single argument (the current value of the slider).
Key Functions:
• Get Trackbar Value:
o Use cv2.getTrackbarPos(trackbarName, windowName) to retrieve the
current value of the trackbar.
• Set Trackbar Value:
o Use cv2.setTrackbarPos(trackbarName, windowName, position) to set the
trackbar to a specific value programmatically.
Example:
import cv2
import numpy as np
# Callback function for trackbar
def onChange(value):
# Adjust the brightness of the image
brightness = value - 100 # Map trackbar (0-200) to brightness (-100 to +100)
adjusted = cv2.convertScaleAbs(image, alpha=1, beta=brightness)
cv2.imshow('Image', adjusted)

# Read an image
image = cv2.imread("example.jpg")

# Create a window
cv2.imshow('Image', image)

# Create a trackbar
cv2.createTrackbar('Brightness', 'Image', 100, 200, onChange)

# Wait for key press


cv2.waitKey(0)
cv2.destroyAllWindows()
CONTOURS
Using contour detection, we can detect the borders of objects, and localize them easily in an
image.
It provides two simple functions:
1. findContours()
2. drawContours()
Also, it has two different algorithms for contour detection:
1. CHAIN_APPROX_SIMPLE
2. CHAIN_APPROX_NONE
1. Read the Image and Convert It to Grayscale
• Why?
o Grayscale images simplify processing because they contain a single intensity
channel rather than three (R, G, B).
o Many image processing algorithms, including contour detection, require
uniform intensity information.
• How?
o Use cv2.imread() to load the image.
o Convert the image to grayscale using cv2.cvtColor(image,
cv2.COLOR_BGR2GRAY).
2. Apply Binary Thresholding
• Why Thresholding?
o Thresholding converts an image into a binary format (black and white).
o White pixels (value = 255) represent the objects of interest, while black pixels
(value = 0) represent the background.
o This simplification makes it easier for contour detection algorithms to find
object boundaries.
• How?
o Use cv2.threshold():
_, binary_image = cv2.threshold(grayscale_image, thresh_value, max_value,
cv2.THRESH_BINARY)
▪ thresh_value: Intensity value that separates black (below threshold) and
white (above threshold).
▪ max_value: The value assigned to pixels above the threshold.
• Why Not R, G, or B Channels?
o R, G, or B channels might not accurately represent object edges due to color
variations.
o A grayscale image provides a more uniform intensity distribution, making it
more suitable for thresholding.
3. Find the Contours
• Why?
o Contours are curves that connect continuous points along a boundary having the
same intensity.
o They are used to detect and analyze the shapes of objects in an image.
• How?
o Use cv2.findContours():
contours, hierarchy = cv2.findContours(binary_image, cv2.RETR_TREE,
cv2.CHAIN_APPROX_SIMPLE)
• image: The binary input image obtained in the previous step.
• mode: This is the contour-retrieval mode. We provided this as RETR_TREE, which
means the algorithm will retrieve all possible contours from the binary image.
• method: This defines the contour-approximation method. In this example, we will
use CHAIN_APPROX_NONE.Though slightly slower
than CHAIN_APPROX_SIMPLE, we will use this method here tol store ALL
contour points.
4. Draw Contours on the Original RGB Image
• Why?
o Overlapping the contours on the original image provides a visual representation
of detected objects.
• How?
o Use cv2.drawContours():
cv2.drawContours(original_image, contours, -1, (0, 255, 0), 2)
• image: This is the input RGB image on which you want to draw the contour.
• contours: Indicates the contours obtained from the findContours() function.
• contourIdx: The pixel coordinates of the contour points are listed in the obtained
contours. Using this argument, you can specify the index position from this list,
indicating exactly which contour point you want to draw. Providing a negative value
will draw all the contour points.
• color: This indicates the color of the contour points you want to draw. We are drawing
the points in green.
• thickness: This is the thickness of contour points.
Example:
import cv2

# Step 1: Read the image and convert to grayscale


image = cv2.imread("example.jpg")
grayscale = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# Step 2: Apply binary thresholding


_, binary = cv2.threshold(grayscale, 127, 255, cv2.THRESH_BINARY)

# Step 3: Find contours


contours, _ = cv2.findContours(binary, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)

# Step 4: Draw contours on the original image


output = image.copy()
cv2.drawContours(output, contours, -1, (0, 255, 0), 2)

# Display the results


cv2.imshow("Original Image", image)
cv2.imshow("Grayscale Image", grayscale)
cv2.imshow("Binary Image", binary)
cv2.imshow("Contours", output)

cv2.waitKey(0)
cv2.destroyAllWindows()

Types of cv2.RETR_* Modes:


1. cv2.RETR_EXTERNAL:
o Description:
▪ Retrieves only the outermost contours.
▪ Ignores any child (nested) contours inside the outer contours.
o Use Case:
▪ When you're only interested in the outer boundaries of objects.
o Example:
▪ Detecting individual shapes in an image
2. cv2.RETR_LIST:
o Description:
▪ Retrieves all contours but does not organize them into a hierarchy (no
parent-child relationship).
▪ All contours are stored in a flat list.
o Use Case:
▪When hierarchy is irrelevant, and you need to process all contours
without nesting information.
o Example:
▪ Counting objects in an image.
3. cv2.RETR_CCOMP:
o Description:
▪ Retrieves all contours and organizes them into a two-level hierarchy:
1. Outer contours of objects (level 0).
2. Holes (child contours) within those objects (level 1).
▪ Alternates levels between external contours and their holes.
o Use Case:
▪ When you need a clear distinction between object boundaries and their
holes.
o Example:
▪ Detecting coins with holes in a top-down view.
4. cv2.RETR_TREE:
o Description:
▪ Retrieves all contours and reconstructs the full hierarchy of nested
contours.
▪ Parent-child and sibling relationships between contours are preserved.
o Use Case:
▪ When you need detailed hierarchical information about nested contours.
o Example:
▪ Detecting objects with multiple nested levels, like an onion with multiple
layers.
5. cv2.RETR_FLOODFILL (Obsolete/Deprecated):
o Description:
▪ Used for the flood-fill algorithm in earlier OpenCV versions.
▪ Not commonly used for modern contour detection.

import cv2
# Load an image
image = cv2.imread("Assets/cards.jpg", cv2.IMREAD_GRAYSCALE)
# Apply binary threshold
_, binary = cv2.threshold(image, 127, 255, cv2.THRESH_BINARY)
# Find contours using different modes
contours_external, _ = cv2.findContours(binary, cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)
contours_list, _ = cv2.findContours(binary, cv2.RETR_LIST,
cv2.CHAIN_APPROX_SIMPLE)
contours_tree, hierarchy = cv2.findContours(binary, cv2.RETR_TREE,
cv2.CHAIN_APPROX_SIMPLE)

# Draw and visualize contours for each mode


external_img = cv2.drawContours(image.copy(), contours_external, -1, (255, 0, 0), 2)
list_img = cv2.drawContours(image.copy(), contours_list, -1, (0, 255, 0), 2)
tree_img = cv2.drawContours(image.copy(), contours_tree, -1, (0, 0, 255), 2)

cv2.imshow("External Contours", external_img)


cv2.imshow("List Contours", list_img)
cv2.imshow("Tree Contours", tree_img)

cv2.waitKey(0)
cv2.destroyAllWindows()
TEMPORAL MEDIAN FILTERING
Temporal median filtering is a technique used in video processing or sequences of images to
reduce noise or remove unwanted transient objects (like flickering or moving elements). The
"temporal" aspect means it operates across frames over time, rather than within a single
image.
How It Works:
1. Input:
o A sequence of images or video frames Ft, Ft−1, Ft−2, …, Ft−n, where t represents
the current time (frame).
o Each pixel location across the frames forms a time series of intensity values.
2. Processing:
o For each pixel location (x,y), collect intensity values from the corresponding
pixel in the last N frames.
o Calculate the median of these pixel values.
o Replace the pixel intensity at (x,y) in the current frame with this median value.
3. Output:
o A new sequence of frames where noise (e.g., flickering or transient objects) is
significantly reduced, while preserving the overall structure of the video.
Applications:
1. Noise Reduction:
o Removes transient noise like salt-and-pepper noise, especially in videos
captured under poor lighting conditions.
2. Background Subtraction:
o Creates a clean background by filtering out moving objects or temporary
occlusions over time.
3. Object Detection:
o By stabilizing the background, it improves accuracy in detecting objects by
highlighting only consistent foreground changes.
EXAMPLE:
import cv2
import numpy as np

def temporal_median_filter(frames, num_frames):


"""
Applies temporal median filtering to a sequence of frames.
Args:
frames: List of video frames (as numpy arrays).
num_frames: Number of frames to consider for median calculation.
Returns:
Filtered frames.
"""
frame_stack = [] # List to hold the recent frames

filtered_frames = [] # List to hold the filtered frames

for frame in frames:


# Add current frame to the stack
frame_stack.append(frame)

# Ensure the stack contains only the last num_frames frames


if len(frame_stack) > num_frames:
frame_stack.pop(0)

# Calculate median across the stack


median_frame = np.median(np.array(frame_stack), axis=0).astype(np.uint8)

# Append the filtered frame to the output list


filtered_frames.append(median_frame)

return filtered_frames

# Example Usage:
# Read video
cap = cv2.VideoCapture("input_video.mp4")
frames = []

while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
# Convert to grayscale for simplicity
gray_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
frames.append(gray_frame)

cap.release()

# Apply temporal median filter


filtered_frames = temporal_median_filter(frames, num_frames=5)

# Save filtered frames as a new video


height, width = filtered_frames[0].shape
out = cv2.VideoWriter('filtered_video.mp4', cv2.VideoWriter_fourcc(*'mp4v'), 30, (width,
height), isColor=False)

for frame in filtered_frames:


out.write(frame)

out.release()

You might also like