0% found this document useful (0 votes)
25 views26 pages

Computer Vision: Field of AI That Enables Computers To Derive Meaningful Information From

Computer vision is a field of AI that enables machines to interpret visual data, contrasting with human vision which relies on extensive context. Applications include autonomous vehicles, translation of road signs, and facial recognition. Convolutional Neural Networks (CNNs) are key to computer vision tasks, utilizing layers for feature detection and reducing computation costs while managing image data effectively.

Uploaded by

u1904031
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views26 pages

Computer Vision: Field of AI That Enables Computers To Derive Meaningful Information From

Computer vision is a field of AI that enables machines to interpret visual data, contrasting with human vision which relies on extensive context. Applications include autonomous vehicles, translation of road signs, and facial recognition. Convolutional Neural Networks (CNNs) are key to computer vision tasks, utilizing layers for feature detection and reducing computation costs while managing image data effectively.

Uploaded by

u1904031
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

Computer Vision

Field of AI that enables computers to derive meaningful information from


images, videos, or other visual inputs.
Human Vision vs. Computer Vision

• Human vision:
– lifetimes of context to train how to tell objects apart, how far away they are, whether they are
moving and whether there is something wrong in an image.

• Computer vision:
– Trains machines to perform these functions. Can analyze thousands of products in less than a
minute
Computer Vision Applications

• Tesla: Autonomous cars for hand free driving


• Google translate: Convert road signs in one language to another language
• Photo scan: Optical Character Recognition (OCR) or QR Code Reader
• Facebook: Face recognition for automatic tagging
• Boston Dynamics: Designing intelligent robots
Basic Components of a Image Processing Task
What is an Image?
Color Channel

1 channel 1 channel 3 channel


How does computer vision works?

• How to solve computer vision tasks?


– Computer vision needs lots of data.
– It runs analyses of data over and over until it discerns distinctions and ultimately recognize
images.

• Machine learning based models can enable a computer to teach itself about the
context of visual data.
– One popular ML algorithm used for computer vision task is convolutional neural network (CNN)
Convolutional Neural Network

• Convolutional Neural Networks (CNN) are distinguished from other neural networks
by their superior performance with image or visual input signals.
• CNN consists of three main layers:
1) Convolutional layer
2) Pooling layer
3) Fully-connected layer
Why CNN
• High computation cost of ANN

• Reduce overfitting
• Successfully capture the Spatial and Temporal dependencies in an
image
• Trainable parameters depends on filter rather than image size
Convolutional Layer
• Convolutional Layers are core building block of a CNN.
• Focus on detecting edges from an image
• It requires a few components, which are input data, a filter, and a feature map.
Convolution
• Convolution: Express how one shape is modified by another
• Below amatrix is convolved with a filter to obtain matrix

Input Image
Convolutional Layer (Intuition)

How can we detect these edges from an image?


Convolutional Layer (Intuition)
• To illustrate this, we use a simplified picture
Convolutional Layer
• We have seen that convolving an input of dimension with a filter results in output.

• In general:
– Input:
– Filter size:
– Output:

• Disadvantage:
– Every time we apply a convolutional operation, the size of the image shrinks
– Pixels present in the corner of the image are used only a few number of times during convolution
as compared to the central pixels
– Hence, we do not focus too much on the corners since that can lead to information loss
Convolutional Layer (Padding)
• We can pad the image with an additional border (add pixels around the border)
• In general:
– Input:
– Padding:
– Filter size:
– Output:
Convolutional Layer (Padding)
• Hence, we have two choice for padding

• Valid Padding:
– It means no padding.
– If we are using valid padding, the output will be

• Same Padding:
– Here, we apply padding so that the output size is the same as the input size
– Need to set,
Convolutional Layer (Stride)
• Stride is how far the filter moves in every step along one direction.
• If we select stride of 2, then we will take two steps – both in the horizontal and
vertical directions.
Convolution Over Multiple Channel
• Color image contains three channel
• We can use a filter on the image
• The last dimension of the filter should be same as the number of input channel
• We can use multiple filter for capturing multiple features
Pooling Layer
• Pooling layers are generally used to reduce the size of the inputs and hence speed up
the computation.
CNN Example
• There are a combination of convolution and pooling layers at the beginning
• A few fully connected layers at the end
• And, finally a softmax classifier to classify the input into various categories
• There are a lot of hyperparameters in this network which we have to specify as well
CNN Example
Classical Networks (LeNet-5)

• Parameters: 60k
• Layers flow: Conv -> Pool -> Conv -> Pool -> FC -> FC -> Output
• Activation functions: Sigmoid/tanh and ReLu
Classical Networks (Alexnet)
More Classical Networks
• VGG-16

• ResNet

• Inception

• And many more…


Some Notes
• Building your own model from scratch can be a tedious and cumbersome
process.
• In many cases, we also face issues like lack of data availability.

• Steps to follow:
– Using Open-Source implementation
– Transfer Learning: we can take a pre-trained network and transfer that to a new
task which we are working on.
– Data Augmentation: Deep learning models perform well when we have a large
amount of data.
– E.g., Mirroring, Random cropping, Rotating
Resources
• https://www.youtube.com/playlist?list=PLGP2q2bIgaNzhSv4yMX6yPxwQ0mk4CP
wS

You might also like