Pattern Recognition and Computer Vision
Pattern Recognition and Computer Vision
VISION
"To attain global excellence through education, innovation, research, and work ethics in the
field of Computer Science and engineering with the commitment to serve humanity."
MISSION
M1: To lead in the advancement of computer science and engineering through internationally
recognized research and education.
M2: To prepare students for full and ethical participation in a diverse society and encourage
lifelong learning.
M3: To foster development of problem solving and communication skills as an integral
component of the profession.
M4: To impart knowledge, skills and cultivate an environment supporting incubation,
product development, technology transfer, capacity building and entrepreneurship in the field
of computer science and engineering.
M5: To encourage faculty, student’s networking with alumni, industry, institutions, and other
stakeholders for collective engagement.
Rubrics for Lab Assessment:
Is proposed
PO1,PO2, PSO1,
R2 design/procedure/algorithm No Partially Completely
PO3 PSO2
solves the problem?
PSO1,
R5 Individuality of submission? No Partially Completely PO8, PO12
PSO3
INDEX
R1 R2 R3 R4 R5
Has the
underst
Is able Is anding Are the
to propose of the result(s)
identify d design tool/pro verified
and /proced grammi using
Individu
define ure ng sufficie
ality of
Date Of the /algorit languag nt test
submiss
Total Faculty
S.No Experiment objectiv hm e to data to
Performance ion? Marks Signature
e of the solves implem support
given the ent the the
problem problem propose conclusi
? ? d ons?
solution
?
2 2 2 2 2
Marks Marks Marks Marks Marks
Program – 1
AIM: Write a Python function that computes the value of the Gaussian distribution N (m, s) at
a given vector X and plot the effect of varying mean and variance to the normal distribution.
THEORY:
The Gaussian distribution, also known as the normal distribution, is a continuous probability
distribution characterized by its bell-shaped curve. It is defined by two parameters:
1. Mean (μ): The mean represents the center of the distribution and indicates where the peak
of the curve occurs.
2. Variance (σ2): The variance measures the spread of the distribution. A higher variance
results in a wider and flatter curve, while a lower variance produces a narrower and taller
curve.
The probability density function (PDF) of a Gaussian distribution is given by:
where:
• x is the variable,
• μ is the mean of the distribution,
• σ2 is the variance,
• √2𝜋𝜎2 is the normalization factor to ensure that the total area under the curve equals 1.
The Gaussian distribution is symmetric around the mean and has a bell-shaped curve. The
standard deviation controls the spread of the distribution: the larger the standard deviation, the
wider the curve
CODE:
import numpy as np
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 5))
for m in [-3, 0, 3]:
plt.plot(x, gaussian(x, m, 2), label=f"mean={m}, sd=2")
plt.legend(); plt.title("Varying Mean"); plt.grid(True)
plt.show()
plt.figure(figsize=(10, 5))
for s in [0.5, 1, 2, 4]:
plt.plot(x, gaussian(x, 0, s), label=f"sd={s}")
plt.legend(); plt.title("Varying Variance"); plt.grid(True)
plt.show()
OUTPUT:
Program – 2
THEORY:
The gradient descent algorithm works as follows:
1. Initialization: Start with an initial guess for the parameters θ.
2. Compute the Gradient: Calculate the gradient of the cost function with respect to the
parameters. The gradient is a vector of partial derivatives that indicates the direction of the
steepest ascent.
3. Update the Parameters: Adjust the parameters in the direction opposite to the gradient to
reduce the cost. The update rule is given by:
where:
θ represents the parameters being optimized,
α is the learning rate, a small positive scalar that controls the step size of the updates,
∇θJ(θ) is the gradient of the cost function J(θ) with respect to θ.
4. Iterate: Repeat steps 2 and 3 until the algorithm converges, meaning the changes in the cost
function or parameters become sufficiently small, or a predefined number of iterations is
reached.
5. Convergence Check: The algorithm converges when the cost function reaches a minimum
value or changes very little between iterations.
CODE:
import numpy as np
import matplotlib.pyplot as plt
return m, c, cost_history
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 6, 8, 10])
plt.plot(cost_history)
plt.title("Convergence of Cost Function")
plt.xlabel("Iterations")
plt.ylabel("Cost (MSE)")
plt.grid(True)
plt.show()
OUTPUT:
Program – 3
THEORY:
Linear Regression is a fundamental algorithm in supervised learning used to model the
relationship between the dependent variable y and one or more independent variables 𝑋 by
fitting a linear equation to the observed data. The equation for Linear Regression is given by:
h 𝜃 (𝑥) = 𝜃0 + 𝜃1𝑥
where:
h 𝜃 (𝑥) is the predicted output,
𝜃0 is the intercept (bias),
𝜃1 is the coefficient (slope) of the input feature x.
Gradient Descent is an optimization algorithm used to minimize this cost function iteratively
by updating the parameters.
CODE:
import numpy as np
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 4))
plt.plot(cost_history)
plt.title("Cost Function Convergence")
plt.xlabel("Epochs")
plt.ylabel("Mean Squared Error")
plt.grid(True)
plt.show()
plt.figure(figsize=(7, 5))
plt.scatter(x, y, label="Actual Data", color="blue")
plt.plot(x, m * x + c, color="red", label=f"Fitted Line: y = {m:.2f}x + {c:.2f}")
plt.title("Linear Regression Fit using Gradient Descent")
plt.xlabel("X")
plt.ylabel("Y")
plt.legend()
plt.grid(True)
plt.show()
OUTPUT:
Program – 4
AIM: Comparison of Classification Accuracy of SVM, ELM and CNN for a given dataset.
THEORY:
CODE:
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import time
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
digits = load_digits()
X = digits.images
y = digits.target
X_flat = X.reshape((len(X), -1)) # Shape: (1797, 64)
scaler = StandardScaler()
X_flat = scaler.fit_transform(X_flat)
input_dim = X_train.shape[1]
hidden_dim = 100 # fixed hidden layer size
W = np.random.randn(input_dim, hidden_dim)
H = np.tanh(X_train @ W)
beta = np.linalg.pinv(H) @ y_train
H_test = np.tanh(X_test @ W)
y_pred_elm = np.round(H_test @ beta)
elm_acc = accuracy_score(y_test, y_pred_elm)
elm_time = time.time() - start_time
print(f"ELM Accuracy: {elm_acc*100:.2f}%")
print(f"ELM Training Time: {elm_time:.2f} seconds")
cnn = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(16, (3, 3), activation='relu', input_shape=(8, 8, 1)),
tf.keras.layers.MaxPooling2D((2, 2)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
cnn.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
cnn.fit(X_train_cnn, y_train_cnn, epochs=10, batch_size=32, verbose=0,
validation_split=0.1)
cnn_loss, cnn_acc = cnn.evaluate(X_test_cnn, y_test_cnn, verbose=0)
cnn_time = time.time() - start_time
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.bar(models, accuracies, color=['blue', 'orange', 'green'])
plt.title("Model Accuracy Comparison")
plt.ylabel("Accuracy (%)")
plt.subplot(1, 2, 2)
plt.bar(models, times, color=['blue', 'orange', 'green'])
plt.title("Training Time Comparison")
plt.ylabel("Time (seconds)")
plt.tight_layout()
plt.show()
OUTPUT:
Program – 5
AIM: Implementation basic Image Handling and processing operations on the image.
THEORY:
Image processing is a fundamental step in computer vision that deals with manipulating and
analyzing images to extract meaningful information.
Python’s OpenCV (Open Source Computer Vision Library) provides a wide range of tools for
image acquisition, enhancement, and transformation.
Common Operations:
1. Loading an Image — Reading an image file into memory using cv2.imread().
2. Displaying an Image — Viewing the image with cv2.imshow() or using Matplotlib for
visualization.
3. Resizing — Adjusting image dimensions to standardize input for models or improve
display efficiency.
4. Cropping — Selecting a specific region of interest (ROI) from the image.
5. Rotation — Rotating the image around a center point by a given angle.
6. Grayscale Conversion — Converting a color image (RGB) to grayscale for
simplifying analysis.
7. Saving — Writing the processed image back to disk with cv2.imwrite()
CODE:
import cv2
import matplotlib.pyplot as plt
# 1. Load Image
img = cv2.imread('image.jpg')
if img is None:
raise Exception("Image not found. Please check the file path.")
# 6. Convert to Grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# 8. Display Results
titles = ['Original', 'Resized', 'Cropped', 'Rotated', 'Grayscale']
images = [cv2.cvtColor(img, cv2.COLOR_BGR2RGB),
cv2.cvtColor(resized, cv2.COLOR_BGR2RGB),
cv2.cvtColor(crop, cv2.COLOR_BGR2RGB),
cv2.cvtColor(rotated, cv2.COLOR_BGR2RGB),
gray]
plt.figure(figsize=(10,6))
for i in range(5):
plt.subplot(2,3,i+1)
plt.imshow(images[i], cmap='gray')
plt.title(titles[i])
plt.axis('off')
plt.tight_layout()
plt.show()
OUTPUT:
Program – 6
THEORY:
Image transformation refers to operations that alter the geometry or orientation of an image
while preserving essential visual information. These operations are widely used in computer
vision for image alignment, data augmentation, and feature extraction.
Common types of transformations include:
1. Translation – Shifting the image in the x and y directions without altering its
orientation or scale.
2. Rotation – Rotating the image around a given center point by a specified angle.
3. Scaling – Changing the size of the image by enlarging or shrinking it using a scaling
factor.
4. Affine Transformation – A linear mapping that can perform a combination of
translation, rotation, scaling, and shearing. It preserves straight lines and parallelism
but not angles or lengths.
These transformations are implemented mathematically using transformation matrices applied
via OpenCV’s warpAffine() function.
CODE:
import cv2
import numpy as np
import matplotlib.pyplot as plt
# 3. Rotation
# Rotate the image by 45 degrees around its center
M_rotate = cv2.getRotationMatrix2D((cols / 2, rows / 2), 45, 1)
rotated = cv2.warpAffine(img, M_rotate, (cols, rows))
# 5. Affine Transformation
pts1 = np.float32([[50, 50], [200, 50], [50, 200]])
pts2 = np.float32([[70, 100], [210, 50], [100, 250]])
M_affine = cv2.getAffineTransform(pts1, pts2)
affine_transformed = cv2.warpAffine(img, M_affine, (cols, rows))
# 6. Saving Transformed Images
cv2.imwrite('translated.jpeg', translated)
cv2.imwrite('rotated.jpeg', rotated)
cv2.imwrite('scaled_down.jpeg', scaled_down)
cv2.imwrite('scaled_up.jpeg', scaled_up)
cv2.imwrite('affine_transformed.jpeg', affine_transformed)
# Display Results
titles = ["Translated", "Rotated", "Scaled Down", "Scaled Up", "Affine Transformed"]
images = [translated, rotated, scaled_down, scaled_up, affine_transformed]
plt.figure(figsize=(12, 8))
for i in range(5):
plt.subplot(2, 3, i+1)
plt.imshow(cv2.cvtColor(images[i], cv2.COLOR_BGR2RGB))
plt.title(titles[i])
plt.axis('off')
plt.tight_layout()
plt.show()
OUTPUT:
Program – 7
THEORY:
Perspective transformation is a geometric operation that changes the apparent perspective of
an image as if viewed from another angle. It is a non-linear transformation that maps points
from one plane to another using a 3×3 homography matrix.
It is widely used in:
• Document or object perspective correction
• Augmented reality (e.g., overlaying images on real-world planes)
• Computer vision tasks like camera calibration and image rectification
The transformation is achieved using OpenCV’s functions:
• cv2.getPerspectiveTransform(src_points, dst_points) — computes the 3×3 matrix.
• cv2.warpPerspective(image, matrix, (width, height)) — applies the transformation.
CODE:
import cv2
import numpy as np
import matplotlib.pyplot as plt
# Display Results
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
plt.title("Original Image")
plt.axis('off')
plt.subplot(1, 2, 2)
plt.imshow(cv2.cvtColor(transformed, cv2.COLOR_BGR2RGB))
plt.title("Perspective Transformed")
plt.axis('off')
plt.tight_layout()
plt.show()
OUTPUT:
Program – 8
THEORY:
Camera calibration is a critical process in computer vision and photogrammetry that aims to
determine the intrinsic and extrinsic parameters of a camera. These parameters are essential
for accurate image analysis and 3D reconstruction, enabling the conversion of 2D image
coordinates into 3D world coordinates and vice versa.
It helps in removing lens distortions and obtaining accurate real-world measurements from
images.
• Intrinsic parameters define the camera’s internal characteristics such as focal length,
optical center, and distortion coefficients.
• Extrinsic parameters define the camera’s orientation and position relative to the world
coordinate system.
CODE:
import cv2
import numpy as np
import glob
import matplotlib.pyplot as plt
# Step 1: Defining criteria for corner refinement
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
# Step 2: known object points
objp = np.zeros((6*7, 3), np.float32)
objp[:, :2] = np.mgrid[0:7, 0:6].T.reshape(-1, 2)
# Arrays to store object points and image points from all images
objpoints = []
imgpoints = []
# Step 3: Load all calibration images
images = glob.glob('calib_images/*.jpg')
for fname in images:
img = cv2.imread(fname)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Step 4: Find chessboard corners
ret, corners = cv2.findChessboardCorners(gray, (7, 6), None)
if ret:
objpoints.append(objp)
corners2 = cv2.cornerSubPix(gray, corners, (11, 11), (-1, -1), criteria)
imgpoints.append(corners2)
# Draw and display the corners for visualization
cv2.drawChessboardCorners(img, (7, 6), corners2, ret)
cv2.imshow('Detected Corners', img)
cv2.waitKey(200)
cv2.destroyAllWindows()
# Step 5: Calibrate the camera
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1], None,
None)
print("\nCamera Calibration Successful!")
print("Camera Matrix:\n", mtx)
print("Distortion Coefficients:\n", dist)
print("Re-projection Error:", ret)
# Step 6: Undistort one of the images for comparison
img = cv2.imread(images[0])
h, w = img.shape[:2]
new_camera_mtx, roi = cv2.getOptimalNewCameraMatrix(mtx, dist, (w, h), 1, (w, h))
undistorted = cv2.undistort(img, mtx, dist, None, new_camera_mtx)
# Step 7: Display original and corrected images
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
plt.title("Original (Distorted) Image")
plt.axis('off')
plt.subplot(1, 2, 2)
plt.imshow(cv2.cvtColor(undistorted, cv2.COLOR_BGR2RGB))
plt.title("Undistorted Image")
plt.axis('off')
plt.tight_layout()
plt.show()
OUTPUT:
Program – 9
THEORY:
The Fundamental Matrix (F) defines the intrinsic geometric relationship between two views of
the same 3D scene captured from different viewpoints. It satisfies the epipolar constraint:
𝑝′𝑇 𝐹𝑝 = 0
CODE:
import cv2
import numpy as np
import matplotlib.pyplot as plt
# Step 1: Load the stereo image pair (left and right)
img1 = cv2.imread('left.jpg', cv2.IMREAD_GRAYSCALE)
img2 = cv2.imread('right.jpg', cv2.IMREAD_GRAYSCALE)
# Step 2: Detect keypoints and compute descriptors using ORB
orb = cv2.ORB_create(1000)
kp1, des1 = orb.detectAndCompute(img1, None)
kp2, des2 = orb.detectAndCompute(img2, None)
# Step 3: Match features between both images
bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
matches = sorted(bf.match(des1, des2), key=lambda x: x.distance)
# Step 4: Extract matched keypoint coordinates
pts1 = np.float32([kp1[m.queryIdx].pt for m in matches])
pts2 = np.float32([kp2[m.trainIdx].pt for m in matches])
# Step 5: Compute the Fundamental Matrix using RANSAC
F, mask = cv2.findFundamentalMat(pts1, pts2, cv2.FM_RANSAC)
print("Fundamental Matrix:\n", F)
# Step 6: Select inlier points
pts1_inliers = pts1[mask.ravel() == 1]
pts2_inliers = pts2[mask.ravel() == 1]
# Step 7: Draw epipolar lines for visualization
def draw_epilines(img1, img2, lines, pts1, pts2):
''' Draws epipolar lines on one image corresponding to points in the other. '''
r, c = img1.shape
img1_color = cv2.cvtColor(img1, cv2.COLOR_GRAY2BGR)
img2_color = cv2.cvtColor(img2, cv2.COLOR_GRAY2BGR)
# Compute epilines corresponding to points in the right image and draw on the left image
lines1 = cv2.computeCorrespondEpilines(pts2_inliers.reshape(-1, 1, 2), 2, F)
lines1 = lines1.reshape(-1, 3)
img5, img6 = draw_epilines(img1, img2, lines1, pts1_inliers, pts2_inliers)
OUTPUT:
Program – 10
THEORY:
1. Hand Gesture Recognition
Hand gesture recognition involves identifying specific hand postures or movements using
computer vision techniques.
Steps include:
• Data Collection: Images or videos of hands making gestures.
• Preprocessing: Normalization, grayscale conversion, or segmentation to isolate the
hand.
• Feature Extraction: Using landmarks (fingertips, joints) from models like MediaPipe
Hands.
• Classification: Gestures are classified using ML models (e.g., SVM, CNN) or simple
rule-based systems.
• Real-time Recognition: Integration with a webcam feed for live gesture detection.
2. Human Pose Detection
Pose detection estimates the positions of body joints (like shoulders, elbows, knees) to
understand human posture.
• Model Used: Pre-trained models such as MediaPipe Pose, OpenPose, or PoseNet detect
keypoints in real-time.
• Output: A set of landmarks connected to form a skeleton overlay on the person’s
image.
• Applications: Activity recognition, fitness tracking, motion capture, and AR.
CODE:
import cv2
import mediapipe as mp
import matplotlib.pyplot as plt
mp_hands = mp.solutions.hands
mp_pose = mp.solutions.pose
mp_drawing = mp.solutions.drawing_utils
pose = mp_pose.Pose(static_image_mode=True)
results_pose = pose.process(img_rgb)
pose_annotated = img_rgb.copy()
if results_pose.pose_landmarks:
mp_drawing.draw_landmarks(
pose_annotated, results_pose.pose_landmarks, mp_pose.POSE_CONNECTIONS)
plt.figure(figsize=(10,5))
plt.subplot(1,2,1)
plt.imshow(hand_annotated)
plt.title("Hand Gesture Detection")
plt.axis('off')
plt.subplot(1,2,2)
plt.imshow(pose_annotated)
plt.title("Human Pose Detection")
plt.axis('off')
plt.show()
hands.close()
pose.close()
OUTPUT:
Program – 11
THEORY:
Face recognition is a technology that can identify or verify a person froma digital image or a
video frame. Python, combined with the OpenCV library, provides a powerful and flexible
toolkit for implementing face recognition systems. This technology is widely used in security
systems, user authentication, and social media applications.
Prerequisites
For face recognition, ensure to have Python installed, along with the following libraries:
• OpenCV: For image processing and computer vision tasks.
• NumPy: For numerical operations on arrays.
• dlib: For face detection and facial feature extraction (optional but recommended for
high accuracy).
• face_recognition: A simple face recognition library built on top of dlib (optional
but highly effective).
CODE:
import cv2
face_cascade=cv2.CascadeClassifier(cv2.data.haarcascades +
'haarcascade_frontalface_default.xml')
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray, 1.3, 5)
for (x, y, w, h) in faces:
cv2.rectangle(frame, (x, y), (x+w, y+h), (255, 0, 0), 2)
cv2.imshow('Face Detection', frame)
if cv2.waitKey(1) == 27: break
cap.release(); cv2.destroyAllWindows()
OUTPUT:
Program – 12
THEORY:
Text recognition, also known as OpticalCharacterRecognition (OCR), is the process of
extracting text from images or scanned documents. This technology is widely used in
applications such as document digitization, automated data entry, and license plate recognition.
Python, along with libraries like OpenCV and Tesseract, provides a powerful toolkit for
implementing OCR systems.
Tesseract is an open-source OCR engine that is highly efficient and accurate in recognizing
text from images. Originally developed by HP and later improved by Google, Tesseract
supports a wide range of languages and scripts. It works by analyzing the structure of the text
in an image, segmenting it into words and characters, and then recognizing these characters
based on trained models.
CODE:
import cv2
import pytesseract
import numpy as np
import matplotlib.pyplot as plt
from google.colab import files
# Upload an image containing printed or handwritten text
print("Upload an image for OCR (JPEG/PNG).")
uploaded = files.upload()
image_path = list(uploaded.keys())[0]
custom_config = r'--oem 3 --psm 6' # LSTM OCR Engine + Assume a block of text
extracted_text = pytesseract.image_to_string(processed, config=custom_config)
print("\nExtracted Text:")
print("---------------------------------------------------")
print(extracted_text)
print("---------------------------------------------------")
OUTPUT: