0% found this document useful (0 votes)

14 views29 pages

Pattern Recognition and Computer Vision

The document outlines the curriculum and lab assessments for a course in Pattern Recognition & Computer Vision at Maharaja Agrasen Institute of Technology. It includes the vision and mission of the Computer Science & Engineering Department, rubrics for lab assessment, and detailed programming assignments on Gaussian distribution, gradient descent, linear regression, and classification accuracy comparison among SVM, ELM, and CNN. Each program includes theoretical background, code implementation, and expected outputs.

Uploaded by

ipdistrib3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views29 pages

Pattern Recognition and Computer Vision

Uploaded by

ipdistrib3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Pattern Recognition & Computer Vision Lab

Paper Code: ML-411P

Faculty Name: Student Name: Jatin Bansal

Dr. Moolchand Sharma Roll No: 03596402722
(Assistant Professor) Semester: 7
Group: 7AIML-3C

Maharaja Agrasen Institute of Technology, PSP Area,

Sector – 22, Rohini, New Delhi - 110085
MAHARAJA AGRASEN INSTITUTE OF TECHNOLOGY
COMPUTER SCIENCE & ENGINEERING DEPARTMENT

VISION
"To attain global excellence through education, innovation, research, and work ethics in the
field of Computer Science and engineering with the commitment to serve humanity."

MISSION

M1: To lead in the advancement of computer science and engineering through internationally
recognized research and education.
M2: To prepare students for full and ethical participation in a diverse society and encourage
lifelong learning.
M3: To foster development of problem solving and communication skills as an integral
component of the profession.
M4: To impart knowledge, skills and cultivate an environment supporting incubation,
product development, technology transfer, capacity building and entrepreneurship in the field
of computer science and engineering.
M5: To encourage faculty, student’s networking with alumni, industry, institutions, and other
stakeholders for collective engagement.
Rubrics for Lab Assessment:

10 Marks POs and PSOs Covered

Rubrics
0 Marks 1 Marks 2 Marks PO PSO

Is able to identify and define

PSO1,
R1 the objective of the given No Partially Completely PO1, PO2
PSO2
problem?

Is proposed
PO1,PO2, PSO1,
R2 design/procedure/algorithm No Partially Completely
PO3 PSO2
solves the problem?

Has the understanding of the

tool/programming language PO1,PO3, PSO1,
R3 No Partially Completely
to implement the proposed PO5 PSO2
solution?

Are the result(s) verified

PO2,PO4,
R4 using sufficient test data to No Partially Completely PSO2
PO5
support the conclusions?

PSO1,
R5 Individuality of submission? No Partially Completely PO8, PO12
PSO3
INDEX
R1 R2 R3 R4 R5
Has the
underst
Is able Is anding Are the
to propose of the result(s)
identify d design tool/pro verified
and /proced grammi using
Individu
define ure ng sufficie
ality of
Date Of the /algorit languag nt test
submiss
Total Faculty
S.No Experiment objectiv hm e to data to
Performance ion? Marks Signature
e of the solves implem support
given the ent the the
problem problem propose conclusi
? ? d ons?
solution
?

2 2 2 2 2
Marks Marks Marks Marks Marks
Program – 1

AIM: Write a Python function that computes the value of the Gaussian distribution N (m, s) at
a given vector X and plot the effect of varying mean and variance to the normal distribution.

THEORY:
The Gaussian distribution, also known as the normal distribution, is a continuous probability
distribution characterized by its bell-shaped curve. It is defined by two parameters:
1. Mean (μ): The mean represents the center of the distribution and indicates where the peak
of the curve occurs.
2. Variance (σ2): The variance measures the spread of the distribution. A higher variance
results in a wider and flatter curve, while a lower variance produces a narrower and taller
curve.
The probability density function (PDF) of a Gaussian distribution is given by:

where:
• x is the variable,
• μ is the mean of the distribution,
• σ2 is the variance,
• √2𝜋𝜎2 is the normalization factor to ensure that the total area under the curve equals 1.
The Gaussian distribution is symmetric around the mean and has a bell-shaped curve. The
standard deviation controls the spread of the distribution: the larger the standard deviation, the
wider the curve

CODE:
import numpy as np
import matplotlib.pyplot as plt

def gaussian(x, m, s):

return (1 / (s * np.sqrt(2 * np.pi))) * np.exp(-((x - m) ** 2) / (2 * s ** 2))

x = np.linspace(-10, 10, 400)

plt.figure(figsize=(10, 5))
for m in [-3, 0, 3]:
plt.plot(x, gaussian(x, m, 2), label=f"mean={m}, sd=2")
plt.legend(); plt.title("Varying Mean"); plt.grid(True)
plt.show()

plt.figure(figsize=(10, 5))
for s in [0.5, 1, 2, 4]:
plt.plot(x, gaussian(x, 0, s), label=f"sd={s}")
plt.legend(); plt.title("Varying Variance"); plt.grid(True)
plt.show()

OUTPUT:
Program – 2

AIM: Implementation of Gradient descent.

THEORY:
The gradient descent algorithm works as follows:
1. Initialization: Start with an initial guess for the parameters θ.
2. Compute the Gradient: Calculate the gradient of the cost function with respect to the
parameters. The gradient is a vector of partial derivatives that indicates the direction of the
steepest ascent.
3. Update the Parameters: Adjust the parameters in the direction opposite to the gradient to
reduce the cost. The update rule is given by:

where:
θ represents the parameters being optimized,
α is the learning rate, a small positive scalar that controls the step size of the updates,
∇θJ(θ) is the gradient of the cost function J(θ) with respect to θ.
4. Iterate: Repeat steps 2 and 3 until the algorithm converges, meaning the changes in the cost
function or parameters become sufficiently small, or a predefined number of iterations is
reached.
5. Convergence Check: The algorithm converges when the cost function reaches a minimum
value or changes very little between iterations.

CODE:
import numpy as np
import matplotlib.pyplot as plt

def compute_cost(x, y, m, c):

n = len(y)
y_pred = m * x + c
cost = (1/n) * np.sum((y - y_pred)**2)
return cost

def gradient_descent(x, y, lr=0.01, epochs=1000):

m, c = 0, 0
n = len(x)
cost_history = []

for epoch in range(epochs):

y_pred = m * x + c
dm = (-2/n) * np.sum(x * (y - y_pred))
dc = (-2/n) * np.sum(y - y_pred)
m -= lr * dm
c -= lr * dc
cost = compute_cost(x, y, m, c)
cost_history.append(cost)
if epoch % 100 == 0:
print(f"Epoch {epoch}: Cost = {cost:.4f}, m = {m:.4f}, c = {c:.4f}")

return m, c, cost_history

x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 6, 8, 10])

m, c, cost_history = gradient_descent(x, y, lr=0.01, epochs=1000)

print(f"\nFinal Model: y = {m:.2f}x + {c:.2f}")

plt.plot(cost_history)
plt.title("Convergence of Cost Function")
plt.xlabel("Iterations")
plt.ylabel("Cost (MSE)")
plt.grid(True)
plt.show()

OUTPUT:
Program – 3

AIM: Implementation of Linear Regression using Gradient Descent.

THEORY:
Linear Regression is a fundamental algorithm in supervised learning used to model the
relationship between the dependent variable y and one or more independent variables 𝑋 by
fitting a linear equation to the observed data. The equation for Linear Regression is given by:
h 𝜃 (𝑥) = 𝜃0 + 𝜃1𝑥
where:
h 𝜃 (𝑥) is the predicted output,
𝜃0 is the intercept (bias),
𝜃1 is the coefficient (slope) of the input feature x.
Gradient Descent is an optimization algorithm used to minimize this cost function iteratively
by updating the parameters.

CODE:
import numpy as np
import matplotlib.pyplot as plt

def linear_regression(x, y, lr=0.01, epochs=1000):

m, c = 0, 0 # Initial parameters (slope and intercept)
n = len(x)
cost_history = []
for _ in range(epochs):
y_pred = m * x + c
error = y - y_pred
dm = (-2 / n) * np.sum(x * error)
dc = (-2 / n) * np.sum(error)
m -= lr * dm
c -= lr * dc
cost = np.mean(error ** 2)
cost_history.append(cost)
return m, c, cost_history

x = np.linspace(0, 10, 20)

y = 3 * x + 5 + np.random.randn(20) * 2
m, c, cost_history = linear_regression(x, y, lr=0.01, epochs=1000)

plt.figure(figsize=(10, 4))
plt.plot(cost_history)
plt.title("Cost Function Convergence")
plt.xlabel("Epochs")
plt.ylabel("Mean Squared Error")
plt.grid(True)
plt.show()

plt.figure(figsize=(7, 5))
plt.scatter(x, y, label="Actual Data", color="blue")
plt.plot(x, m * x + c, color="red", label=f"Fitted Line: y = {m:.2f}x + {c:.2f}")
plt.title("Linear Regression Fit using Gradient Descent")
plt.xlabel("X")
plt.ylabel("Y")
plt.legend()
plt.grid(True)
plt.show()

x_new = np.array([2, 5, 8])

y_pred_new = m * x_new + c
print("Predictions for new x values:")
for i in range(len(x_new)):
print(f"x = {x_new[i]} → Predicted y = {y_pred_new[i]:.2f}")

OUTPUT:
Program – 4

AIM: Comparison of Classification Accuracy of SVM, ELM and CNN for a given dataset.

THEORY:

1. Support Vector Machine (SVM)

Support Vector Machine is a supervised learning algorithm primarily used for classification.
It works by finding the optimal hyperplane that separates data points of different classes with
the maximum margin.
SVM can handle non-linear classification problems by using kernel functions such as:
• Linear Kernel
• Radial Basis Function (RBF) Kernel
• Polynomial Kernel
The RBF kernel is often preferred for image and pattern recognition tasks. SVMs are
computationally efficient for small to medium-sized datasets and perform well with high-
dimensional data.
2. Extreme Learning Machine (ELM)
Extreme Learning Machine is a type of single-hidden-layer feedforward neural network
(SLFN).
In ELM, the input weights and biases are randomly assigned and remain fixed; only the output
weights are learned using the Moore–Penrose pseudo-inverse. This makes ELM extremely fast
to train compared to traditional neural networks. However, due to random initialization, its
accuracy may vary slightly across runs.
Key characteristics:
• Fast training speed
• Simple architecture (only one hidden layer)
• Suitable for real-time applications where quick learning is required
3. Convolutional Neural Network (CNN)
CNNs are deep learning models specifically designed for image and pattern recognition tasks.
They automatically extract spatial features from raw images using convolution and pooling
layers, which helps capture local and global patterns.
Architecture typically includes:
• Convolutional Layers – for feature extraction
• Pooling Layers – for dimensionality reduction
• Fully Connected Layers – for classification
CNNs achieve high accuracy due to their ability to learn hierarchical feature representations,
but they require more computation and training time.

CODE:
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
import time
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

digits = load_digits()
X = digits.images
y = digits.target
X_flat = X.reshape((len(X), -1)) # Shape: (1797, 64)

scaler = StandardScaler()
X_flat = scaler.fit_transform(X_flat)

X_train, X_test, y_train, y_test = train_test_split(X_flat, y, test_size=0.3, random_state=42)

print(f"Training Samples: {X_train.shape[0]}, Testing Samples: {X_test.shape[0]}")

print("\nTraining Support Vector Machine (SVM)...")

start_time = time.time()

svm = SVC(kernel='rbf', gamma='scale', C=1.0)

svm.fit(X_train, y_train)

svm_time = time.time() - start_time

svm_acc = accuracy_score(y_test, svm.predict(X_test))

print(f"SVM Accuracy: {svm_acc*100:.2f}%")

print(f"SVM Training Time: {svm_time:.2f} seconds")

print("\nTraining Extreme Learning Machine (ELM)...")

start_time = time.time()

input_dim = X_train.shape[1]
hidden_dim = 100 # fixed hidden layer size
W = np.random.randn(input_dim, hidden_dim)
H = np.tanh(X_train @ W)
beta = np.linalg.pinv(H) @ y_train

H_test = np.tanh(X_test @ W)
y_pred_elm = np.round(H_test @ beta)
elm_acc = accuracy_score(y_test, y_pred_elm)
elm_time = time.time() - start_time
print(f"ELM Accuracy: {elm_acc*100:.2f}%")
print(f"ELM Training Time: {elm_time:.2f} seconds")

print("\nTraining Convolutional Neural Network (CNN)...")

start_time = time.time()
X_train_cnn = X_train.reshape(-1, 8, 8, 1)
X_test_cnn = X_test.reshape(-1, 8, 8, 1)
y_train_cnn = tf.keras.utils.to_categorical(y_train, num_classes=10)
y_test_cnn = tf.keras.utils.to_categorical(y_test, num_classes=10)

cnn = tf.keras.models.Sequential([
tf.keras.layers.Conv2D(16, (3, 3), activation='relu', input_shape=(8, 8, 1)),
tf.keras.layers.MaxPooling2D((2, 2)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
cnn.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
cnn.fit(X_train_cnn, y_train_cnn, epochs=10, batch_size=32, verbose=0,
validation_split=0.1)
cnn_loss, cnn_acc = cnn.evaluate(X_test_cnn, y_test_cnn, verbose=0)
cnn_time = time.time() - start_time

print(f"CNN Accuracy: {cnn_acc*100:.2f}%")

print(f"CNN Training Time: {cnn_time:.2f} seconds")

print("\n=== Model Accuracy Comparison ===")

print(f"SVM Accuracy: {svm_acc*100:.2f}%")
print(f"ELM Accuracy: {elm_acc*100:.2f}%")
print(f"CNN Accuracy: {cnn_acc*100:.2f}%")

print("\n=== Model Training Time (seconds) ===")

print(f"SVM: {svm_time:.2f}s")
print(f"ELM: {elm_time:.2f}s")
print(f"CNN: {cnn_time:.2f}s")

models = ['SVM', 'ELM', 'CNN']

accuracies = [svm_acc*100, elm_acc*100, cnn_acc*100]
times = [svm_time, elm_time, cnn_time]

plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.bar(models, accuracies, color=['blue', 'orange', 'green'])
plt.title("Model Accuracy Comparison")
plt.ylabel("Accuracy (%)")
plt.subplot(1, 2, 2)
plt.bar(models, times, color=['blue', 'orange', 'green'])
plt.title("Training Time Comparison")
plt.ylabel("Time (seconds)")
plt.tight_layout()
plt.show()

OUTPUT:
Program – 5

AIM: Implementation basic Image Handling and processing operations on the image.

THEORY:
Image processing is a fundamental step in computer vision that deals with manipulating and
analyzing images to extract meaningful information.
Python’s OpenCV (Open Source Computer Vision Library) provides a wide range of tools for
image acquisition, enhancement, and transformation.
Common Operations:
1. Loading an Image — Reading an image file into memory using cv2.imread().
2. Displaying an Image — Viewing the image with cv2.imshow() or using Matplotlib for
visualization.
3. Resizing — Adjusting image dimensions to standardize input for models or improve
display efficiency.
4. Cropping — Selecting a specific region of interest (ROI) from the image.
5. Rotation — Rotating the image around a center point by a given angle.
6. Grayscale Conversion — Converting a color image (RGB) to grayscale for
simplifying analysis.
7. Saving — Writing the processed image back to disk with cv2.imwrite()

CODE:
import cv2
import matplotlib.pyplot as plt

# 1. Load Image
img = cv2.imread('image.jpg')
if img is None:
raise Exception("Image not found. Please check the file path.")

# 2. Display Original Image

plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
plt.title("Original Image")
plt.axis('off')
plt.show()

# 3. Resize Image (200x200)

resized = cv2.resize(img, (200, 200))

# 4. Crop Image (center region)

h, w = img.shape[:2]
crop = img[h//4: 3*h//4, w//4: 3*w//4]

# 5. Rotate Image (90 degrees clockwise)

rotated = cv2.rotate(img, cv2.ROTATE_90_CLOCKWISE)

# 6. Convert to Grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# 7. Save Processed Images

cv2.imwrite('resized_image.jpeg', resized)
cv2.imwrite('cropped_image.jpeg', crop)
cv2.imwrite('rotated_image.jpeg', rotated)
cv2.imwrite('gray_image.jpeg', gray)

# 8. Display Results
titles = ['Original', 'Resized', 'Cropped', 'Rotated', 'Grayscale']
images = [cv2.cvtColor(img, cv2.COLOR_BGR2RGB),
cv2.cvtColor(resized, cv2.COLOR_BGR2RGB),
cv2.cvtColor(crop, cv2.COLOR_BGR2RGB),

cv2.cvtColor(rotated, cv2.COLOR_BGR2RGB),
gray]

plt.figure(figsize=(10,6))
for i in range(5):
plt.subplot(2,3,i+1)
plt.imshow(images[i], cmap='gray')
plt.title(titles[i])
plt.axis('off')

plt.tight_layout()
plt.show()

OUTPUT:
Program – 6

AIM: Implementation of Geometric Transformation.

THEORY:
Image transformation refers to operations that alter the geometry or orientation of an image
while preserving essential visual information. These operations are widely used in computer
vision for image alignment, data augmentation, and feature extraction.
Common types of transformations include:
1. Translation – Shifting the image in the x and y directions without altering its
orientation or scale.
2. Rotation – Rotating the image around a given center point by a specified angle.
3. Scaling – Changing the size of the image by enlarging or shrinking it using a scaling
factor.
4. Affine Transformation – A linear mapping that can perform a combination of
translation, rotation, scaling, and shearing. It preserves straight lines and parallelism
but not angles or lengths.
These transformations are implemented mathematically using transformation matrices applied
via OpenCV’s warpAffine() function.

CODE:
import cv2
import numpy as np
import matplotlib.pyplot as plt

# 1. Loading the Image

img = cv2.imread('image.jpg')
rows, cols = img.shape[:2]
# Display Original Image
plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
plt.title("Original Image")
plt.axis('off')
plt.show()
# 2. Translation (Shifting the Image)
# Move the image 100 pixels right and 50 pixels down
M_translate = np.float32([[1, 0, 100], [0, 1, 50]])
translated = cv2.warpAffine(img, M_translate, (cols, rows))

# 3. Rotation
# Rotate the image by 45 degrees around its center
M_rotate = cv2.getRotationMatrix2D((cols / 2, rows / 2), 45, 1)
rotated = cv2.warpAffine(img, M_rotate, (cols, rows))

# 4. Scaling (Resizing the Image)

# Shrink the image by 50%
scaled_down = cv2.resize(img, None, fx=0.5, fy=0.5)
# Enlarge the image by 200%
scaled_up = cv2.resize(img, None, fx=2.0, fy=2.0)

# 5. Affine Transformation
pts1 = np.float32([[50, 50], [200, 50], [50, 200]])
pts2 = np.float32([[70, 100], [210, 50], [100, 250]])
M_affine = cv2.getAffineTransform(pts1, pts2)
affine_transformed = cv2.warpAffine(img, M_affine, (cols, rows))
# 6. Saving Transformed Images
cv2.imwrite('translated.jpeg', translated)
cv2.imwrite('rotated.jpeg', rotated)
cv2.imwrite('scaled_down.jpeg', scaled_down)
cv2.imwrite('scaled_up.jpeg', scaled_up)
cv2.imwrite('affine_transformed.jpeg', affine_transformed)

# Display Results
titles = ["Translated", "Rotated", "Scaled Down", "Scaled Up", "Affine Transformed"]
images = [translated, rotated, scaled_down, scaled_up, affine_transformed]

plt.figure(figsize=(12, 8))
for i in range(5):
plt.subplot(2, 3, i+1)
plt.imshow(cv2.cvtColor(images[i], cv2.COLOR_BGR2RGB))
plt.title(titles[i])
plt.axis('off')
plt.tight_layout()
plt.show()

OUTPUT:
Program – 7

AIM: Implementation of Perspective Transformation.

THEORY:
Perspective transformation is a geometric operation that changes the apparent perspective of
an image as if viewed from another angle. It is a non-linear transformation that maps points
from one plane to another using a 3×3 homography matrix.
It is widely used in:
• Document or object perspective correction
• Augmented reality (e.g., overlaying images on real-world planes)
• Computer vision tasks like camera calibration and image rectification
The transformation is achieved using OpenCV’s functions:
• cv2.getPerspectiveTransform(src_points, dst_points) — computes the 3×3 matrix.
• cv2.warpPerspective(image, matrix, (width, height)) — applies the transformation.

CODE:
import cv2
import numpy as np
import matplotlib.pyplot as plt

# Step 1: Load the Image

img = cv2.imread('image.jpg')
if img is None:
raise Exception("Image not found. Please check the file path.")
rows, cols = img.shape[:2]

# Display the original image

plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
plt.title("Original Image")
plt.axis('off')
plt.show()

# Step 2: Selecting Source Points (four points on the tilted object)

src_points = np.float32([[50,50], [200,50], [50,200], [200,200]])

# Step 3: Defining Destination Points (rectangle where it should map)

dst_points = np.float32([[10,100], [200,50], [100,250], [220,220]])

# Step 4: Compute the Perspective Transformation Matrix

M = cv2.getPerspectiveTransform(src_points, dst_points)

# Step 5: Apply the Perspective Transformation

transformed = cv2.warpPerspective(img, M, (400, 400))
# Step 6: Save the Transformed Image
cv2.imwrite('perspective_corrected.jpeg', transformed)

# Display Results
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
plt.title("Original Image")
plt.axis('off')

plt.subplot(1, 2, 2)
plt.imshow(cv2.cvtColor(transformed, cv2.COLOR_BGR2RGB))
plt.title("Perspective Transformed")
plt.axis('off')

plt.tight_layout()
plt.show()

OUTPUT:
Program – 8

AIM: Implementation of Camera Calibration.

THEORY:
Camera calibration is a critical process in computer vision and photogrammetry that aims to
determine the intrinsic and extrinsic parameters of a camera. These parameters are essential
for accurate image analysis and 3D reconstruction, enabling the conversion of 2D image
coordinates into 3D world coordinates and vice versa.
It helps in removing lens distortions and obtaining accurate real-world measurements from
images.
• Intrinsic parameters define the camera’s internal characteristics such as focal length,
optical center, and distortion coefficients.
• Extrinsic parameters define the camera’s orientation and position relative to the world
coordinate system.

CODE:
import cv2
import numpy as np
import glob
import matplotlib.pyplot as plt
# Step 1: Defining criteria for corner refinement
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
# Step 2: known object points
objp = np.zeros((6*7, 3), np.float32)
objp[:, :2] = np.mgrid[0:7, 0:6].T.reshape(-1, 2)
# Arrays to store object points and image points from all images
objpoints = []
imgpoints = []
# Step 3: Load all calibration images
images = glob.glob('calib_images/*.jpg')
for fname in images:
img = cv2.imread(fname)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Step 4: Find chessboard corners
ret, corners = cv2.findChessboardCorners(gray, (7, 6), None)
if ret:
objpoints.append(objp)
corners2 = cv2.cornerSubPix(gray, corners, (11, 11), (-1, -1), criteria)
imgpoints.append(corners2)
# Draw and display the corners for visualization
cv2.drawChessboardCorners(img, (7, 6), corners2, ret)
cv2.imshow('Detected Corners', img)
cv2.waitKey(200)
cv2.destroyAllWindows()
# Step 5: Calibrate the camera
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1], None,
None)
print("\nCamera Calibration Successful!")
print("Camera Matrix:\n", mtx)
print("Distortion Coefficients:\n", dist)
print("Re-projection Error:", ret)
# Step 6: Undistort one of the images for comparison
img = cv2.imread(images[0])
h, w = img.shape[:2]
new_camera_mtx, roi = cv2.getOptimalNewCameraMatrix(mtx, dist, (w, h), 1, (w, h))
undistorted = cv2.undistort(img, mtx, dist, None, new_camera_mtx)
# Step 7: Display original and corrected images
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
plt.title("Original (Distorted) Image")
plt.axis('off')
plt.subplot(1, 2, 2)
plt.imshow(cv2.cvtColor(undistorted, cv2.COLOR_BGR2RGB))
plt.title("Undistorted Image")
plt.axis('off')
plt.tight_layout()
plt.show()

OUTPUT:
Program – 9

AIM: Compute Fundamental Matrix.

THEORY:
The Fundamental Matrix (F) defines the intrinsic geometric relationship between two views of
the same 3D scene captured from different viewpoints. It satisfies the epipolar constraint:
𝑝′𝑇 𝐹𝑝 = 0

where 𝑝 and 𝑝′ are corresponding points in the two images.

Each point in one image corresponds to a line (called an epipolar line) in the other image, and
all these lines intersect at the epipole.
Purpose:
• Used for stereo vision, 3D reconstruction, camera calibration, and depth estimation.
• Ensures corresponding points satisfy geometric consistency between views.

CODE:
import cv2
import numpy as np
import matplotlib.pyplot as plt
# Step 1: Load the stereo image pair (left and right)
img1 = cv2.imread('left.jpg', cv2.IMREAD_GRAYSCALE)
img2 = cv2.imread('right.jpg', cv2.IMREAD_GRAYSCALE)
# Step 2: Detect keypoints and compute descriptors using ORB
orb = cv2.ORB_create(1000)
kp1, des1 = orb.detectAndCompute(img1, None)
kp2, des2 = orb.detectAndCompute(img2, None)
# Step 3: Match features between both images
bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
matches = sorted(bf.match(des1, des2), key=lambda x: x.distance)
# Step 4: Extract matched keypoint coordinates
pts1 = np.float32([kp1[m.queryIdx].pt for m in matches])
pts2 = np.float32([kp2[m.trainIdx].pt for m in matches])
# Step 5: Compute the Fundamental Matrix using RANSAC
F, mask = cv2.findFundamentalMat(pts1, pts2, cv2.FM_RANSAC)
print("Fundamental Matrix:\n", F)
# Step 6: Select inlier points
pts1_inliers = pts1[mask.ravel() == 1]
pts2_inliers = pts2[mask.ravel() == 1]
# Step 7: Draw epipolar lines for visualization
def draw_epilines(img1, img2, lines, pts1, pts2):
''' Draws epipolar lines on one image corresponding to points in the other. '''
r, c = img1.shape
img1_color = cv2.cvtColor(img1, cv2.COLOR_GRAY2BGR)
img2_color = cv2.cvtColor(img2, cv2.COLOR_GRAY2BGR)

for r_line, pt1, pt2 in zip(lines, pts1, pts2):

color = tuple(np.random.randint(0, 255, 3).tolist())
x0, y0 = map(int, [0, -r_line[2] / r_line[1]])
x1, y1 = map(int, [c, -(r_line[2] + r_line[0] * c) / r_line[1]])
img1_color = cv2.line(img1_color, (x0, y0), (x1, y1), color, 1)
img1_color = cv2.circle(img1_color, tuple(np.int32(pt1)), 5, color, -1)
img2_color = cv2.circle(img2_color, tuple(np.int32(pt2)), 5, color, -1)
return img1_color, img2_color

# Compute epilines corresponding to points in the right image and draw on the left image
lines1 = cv2.computeCorrespondEpilines(pts2_inliers.reshape(-1, 1, 2), 2, F)
lines1 = lines1.reshape(-1, 3)
img5, img6 = draw_epilines(img1, img2, lines1, pts1_inliers, pts2_inliers)

# Step 8: Display results

plt.figure(figsize=(12, 6))
plt.subplot(121), plt.imshow(img5[..., ::-1])
plt.title("Epilines on Left Image")
plt.subplot(122), plt.imshow(img6[..., ::-1])
plt.title("Matched Points on Right Image")
plt.show()

OUTPUT:
Program – 10

AIM: Hand Gesture Recognition and Human Pose Detection

THEORY:
1. Hand Gesture Recognition

Hand gesture recognition involves identifying specific hand postures or movements using
computer vision techniques.
Steps include:
• Data Collection: Images or videos of hands making gestures.
• Preprocessing: Normalization, grayscale conversion, or segmentation to isolate the
hand.
• Feature Extraction: Using landmarks (fingertips, joints) from models like MediaPipe
Hands.
• Classification: Gestures are classified using ML models (e.g., SVM, CNN) or simple
rule-based systems.
• Real-time Recognition: Integration with a webcam feed for live gesture detection.
2. Human Pose Detection
Pose detection estimates the positions of body joints (like shoulders, elbows, knees) to
understand human posture.
• Model Used: Pre-trained models such as MediaPipe Pose, OpenPose, or PoseNet detect
keypoints in real-time.
• Output: A set of landmarks connected to form a skeleton overlay on the person’s
image.
• Applications: Activity recognition, fitness tracking, motion capture, and AR.

CODE:

import cv2
import mediapipe as mp
import matplotlib.pyplot as plt
mp_hands = mp.solutions.hands
mp_pose = mp.solutions.pose
mp_drawing = mp.solutions.drawing_utils

from google.colab import files

uploaded = files.upload()
for filename in uploaded.keys():
img = cv2.imread(filename)
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
hands = mp_hands.Hands(static_image_mode=True, max_num_hands=2)
results_hands = hands.process(img_rgb)
hand_annotated = img_rgb.copy()
if results_hands.multi_hand_landmarks:
for hand_landmarks in results_hands.multi_hand_landmarks:
mp_drawing.draw_landmarks(
hand_annotated, hand_landmarks, mp_hands.HAND_CONNECTIONS)

pose = mp_pose.Pose(static_image_mode=True)
results_pose = pose.process(img_rgb)

pose_annotated = img_rgb.copy()
if results_pose.pose_landmarks:
mp_drawing.draw_landmarks(
pose_annotated, results_pose.pose_landmarks, mp_pose.POSE_CONNECTIONS)

plt.figure(figsize=(10,5))
plt.subplot(1,2,1)
plt.imshow(hand_annotated)
plt.title("Hand Gesture Detection")
plt.axis('off')

plt.subplot(1,2,2)
plt.imshow(pose_annotated)
plt.title("Human Pose Detection")
plt.axis('off')
plt.show()
hands.close()
pose.close()

OUTPUT:
Program – 11

AIM: Face Recognition with Python and OpenCV

THEORY:
Face recognition is a technology that can identify or verify a person froma digital image or a
video frame. Python, combined with the OpenCV library, provides a powerful and flexible
toolkit for implementing face recognition systems. This technology is widely used in security
systems, user authentication, and social media applications.
Prerequisites
For face recognition, ensure to have Python installed, along with the following libraries:
• OpenCV: For image processing and computer vision tasks.
• NumPy: For numerical operations on arrays.
• dlib: For face detection and facial feature extraction (optional but recommended for
high accuracy).
• face_recognition: A simple face recognition library built on top of dlib (optional
but highly effective).

CODE:
import cv2
face_cascade=cv2.CascadeClassifier(cv2.data.haarcascades +
'haarcascade_frontalface_default.xml')
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray, 1.3, 5)
for (x, y, w, h) in faces:
cv2.rectangle(frame, (x, y), (x+w, y+h), (255, 0, 0), 2)
cv2.imshow('Face Detection', frame)
if cv2.waitKey(1) == 27: break
cap.release(); cv2.destroyAllWindows()

OUTPUT:
Program – 12

AIM: Text Recognition using OpenCV and Tesseract (OCR)

THEORY:
Text recognition, also known as OpticalCharacterRecognition (OCR), is the process of
extracting text from images or scanned documents. This technology is widely used in
applications such as document digitization, automated data entry, and license plate recognition.
Python, along with libraries like OpenCV and Tesseract, provides a powerful toolkit for
implementing OCR systems.
Tesseract is an open-source OCR engine that is highly efficient and accurate in recognizing
text from images. Originally developed by HP and later improved by Google, Tesseract
supports a wide range of languages and scripts. It works by analyzing the structure of the text
in an image, segmenting it into words and characters, and then recognizing these characters
based on trained models.

CODE:
import cv2
import pytesseract
import numpy as np
import matplotlib.pyplot as plt
from google.colab import files
# Upload an image containing printed or handwritten text
print("Upload an image for OCR (JPEG/PNG).")
uploaded = files.upload()
image_path = list(uploaded.keys())[0]

# Step 1: Read the image

img = cv2.imread(image_path)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# Step 2: Apply Gaussian blur to remove small noise

blurred = cv2.GaussianBlur(gray, (5, 5), 0)

# Step 3: Apply Otsu’s thresholding to create a binary image

_, thresh = cv2.threshold(blurred, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)

# Step 4: Morphological operations to clean text edges

kernel = np.ones((2, 2), np.uint8)
processed = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel)

# Display preprocessing results

titles = ["Original Image", "Grayscale", "Thresholded", "Noise Removed"]
images = [img[..., ::-1], gray, thresh, processed]
plt.figure(figsize=(12, 6))
for i in range(4):
plt.subplot(1, 4, i+1)
plt.imshow(images[i], cmap='gray')
plt.title(titles[i])
plt.axis('off')
plt.show()

custom_config = r'--oem 3 --psm 6' # LSTM OCR Engine + Assume a block of text
extracted_text = pytesseract.image_to_string(processed, config=custom_config)

print("\nExtracted Text:")
print("---------------------------------------------------")
print(extracted_text)
print("---------------------------------------------------")

OUTPUT:

Deep Medicine
No ratings yet
Deep Medicine
6 pages
HPCZ Health Practitioner Renewal On-Line Guide With CPD Component - Re-Editted
No ratings yet
HPCZ Health Practitioner Renewal On-Line Guide With CPD Component - Re-Editted
8 pages
Unit 4
No ratings yet
Unit 4
20 pages
PROJECT PROPOSAL of Student Portal
100% (1)
PROJECT PROPOSAL of Student Portal
3 pages
Install Jitsi Meet Di Docker Dengan Multiple Jibri-AaFikry
No ratings yet
Install Jitsi Meet Di Docker Dengan Multiple Jibri-AaFikry
24 pages
Compal Electronics Engineering Drawings
No ratings yet
Compal Electronics Engineering Drawings
53 pages
VariTrans P 41000 High Voltage Transducer
No ratings yet
VariTrans P 41000 High Voltage Transducer
24 pages
Pc200-6 Custom Parts Book
100% (21)
Pc200-6 Custom Parts Book
261 pages
Remote Diagnostic Services
No ratings yet
Remote Diagnostic Services
6 pages
MATH Reviewer
No ratings yet
MATH Reviewer
9 pages
JCL Procedures PDF
No ratings yet
JCL Procedures PDF
3 pages
How To Replace A Processor On A Dell Latitude D620
No ratings yet
How To Replace A Processor On A Dell Latitude D620
37 pages
Ch-01 Introduction To Digital Electronics
No ratings yet
Ch-01 Introduction To Digital Electronics
35 pages
Apple's Customer Quality Feedback Insights
No ratings yet
Apple's Customer Quality Feedback Insights
2 pages
Goldman Sachs - Internal Audit - Model Risk
No ratings yet
Goldman Sachs - Internal Audit - Model Risk
2 pages
Assembly Examples To Solve PDF
No ratings yet
Assembly Examples To Solve PDF
36 pages
FX120 Manual
No ratings yet
FX120 Manual
24 pages
Describing Data in R
No ratings yet
Describing Data in R
3 pages
Stmicro Mcu Pfe 2012x
No ratings yet
Stmicro Mcu Pfe 2012x
15 pages
TESDA Memos & Circulars Index
No ratings yet
TESDA Memos & Circulars Index
2 pages
Multiple Subnet AG Groups in SQL Server-Overview
No ratings yet
Multiple Subnet AG Groups in SQL Server-Overview
4 pages
Computer Networks: Fundamental Notions (4th Edition)
No ratings yet
Computer Networks: Fundamental Notions (4th Edition)
1 page
Introduction To OpenCV Shashi 02
No ratings yet
Introduction To OpenCV Shashi 02
20 pages
Beyond Text-to-Image - Multimodal Prompts To Explore Generative AI
No ratings yet
Beyond Text-to-Image - Multimodal Prompts To Explore Generative AI
6 pages
Gallagher Scott
No ratings yet
Gallagher Scott
16 pages
OpenMic AdvancedPasswordPolicy 22mar2016
No ratings yet
OpenMic AdvancedPasswordPolicy 22mar2016
20 pages
Practical Questions With Answer
No ratings yet
Practical Questions With Answer
3 pages
Prairie Four Square Insurance Proposal: Infosys Technologies LTD Growing Share of A Customer
No ratings yet
Prairie Four Square Insurance Proposal: Infosys Technologies LTD Growing Share of A Customer
13 pages
BoQ Pekerjaan Instalasi CCTV Gedung IME FK Unsoed.
No ratings yet
BoQ Pekerjaan Instalasi CCTV Gedung IME FK Unsoed.
2 pages
General Electric
No ratings yet
General Electric
14 pages

Pattern Recognition and Computer Vision

Uploaded by

Pattern Recognition and Computer Vision

Uploaded by

Pattern Recognition & Computer Vision Lab

Paper Code: ML-411P

Faculty Name: Student Name: Jatin Bansal

Maharaja Agrasen Institute of Technology, PSP Area,

10 Marks POs and PSOs Covered

Is able to identify and define

Has the understanding of the

Are the result(s) verified

def gaussian(x, m, s):

x = np.linspace(-10, 10, 400)

AIM: Implementation of Gradient descent.

def compute_cost(x, y, m, c):

def gradient_descent(x, y, lr=0.01, epochs=1000):

for epoch in range(epochs):

m, c, cost_history = gradient_descent(x, y, lr=0.01, epochs=1000)

print(f"\nFinal Model: y = {m:.2f}x + {c:.2f}")

AIM: Implementation of Linear Regression using Gradient Descent.

def linear_regression(x, y, lr=0.01, epochs=1000):

x = np.linspace(0, 10, 20)

x_new = np.array([2, 5, 8])

1. Support Vector Machine (SVM)

X_train, X_test, y_train, y_test = train_test_split(X_flat, y, test_size=0.3, random_state=42)

print(f"Training Samples: {X_train.shape[0]}, Testing Samples: {X_test.shape[0]}")

print("\nTraining Support Vector Machine (SVM)...")

svm = SVC(kernel='rbf', gamma='scale', C=1.0)

svm_time = time.time() - start_time

print(f"SVM Accuracy: {svm_acc*100:.2f}%")

print("\nTraining Extreme Learning Machine (ELM)...")

print("\nTraining Convolutional Neural Network (CNN)...")

print(f"CNN Accuracy: {cnn_acc*100:.2f}%")

print("\n=== Model Accuracy Comparison ===")

print("\n=== Model Training Time (seconds) ===")

models = ['SVM', 'ELM', 'CNN']

# 2. Display Original Image

# 3. Resize Image (200x200)

# 4. Crop Image (center region)

# 5. Rotate Image (90 degrees clockwise)

# 7. Save Processed Images

AIM: Implementation of Geometric Transformation.

# 1. Loading the Image

# 4. Scaling (Resizing the Image)

AIM: Implementation of Perspective Transformation.

# Step 1: Load the Image

# Display the original image

# Step 2: Selecting Source Points (four points on the tilted object)

# Step 3: Defining Destination Points (rectangle where it should map)

# Step 4: Compute the Perspective Transformation Matrix

# Step 5: Apply the Perspective Transformation

AIM: Implementation of Camera Calibration.

AIM: Compute Fundamental Matrix.

where 𝑝 and 𝑝′ are corresponding points in the two images.

for r_line, pt1, pt2 in zip(lines, pts1, pts2):

# Step 8: Display results

AIM: Hand Gesture Recognition and Human Pose Detection

from google.colab import files

AIM: Face Recognition with Python and OpenCV

AIM: Text Recognition using OpenCV and Tesseract (OCR)

# Step 1: Read the image

# Step 2: Apply Gaussian blur to remove small noise

# Step 3: Apply Otsu’s thresholding to create a binary image

# Step 4: Morphological operations to clean text edges

# Display preprocessing results

You might also like