Unit-5 Computer Vision(Ai)
Unit-5 Computer Vision(Ai)
Ar ficial Intelligence is a technique that enables computers to mimic human intelligence. As humans
we can see things, analyse it and then do the required ac on on the basis of what we see.
If you answered Yes, then you are absolutely right. “The Computer Vision domain of Ar ficial
Intelligence, enables machines to see through images or visual data, process and analyse them on the
basis of algorithms and methods in order to analyse actual phenomena with images. “
APPLICATIONS OF CV:
1.Facial Recogni on
• With the advent of smart ci es and smart homes, Computer Vision plays a vital role in making
the home smarter.
• Security being the most important applica on involves use of Computer Vision for facial
recogni on.
Ex: It also finds its applica on in schools for an a endance system based on facial recogni on of
students.
2.Face Filters
• The modern-day apps like Instagram and snapchat have a lot of features based on the usage
of computer vision.
• Through the camera the machine or the algorithm is able to iden fy the facial dynamics of the
person and applies the facial filter selected.
1
ARTIFICIAL INTELLIGENCE
MISS VEENA (PGT CS)
• The maximum amount of searching for data on Google’s search engine comes from textual
data, but at the same me it has an interes ng feature of ge ng search results through an
image.
• This uses Computer Vision as it compares different features of the input image to the database
of images and give us the search result while at the same me analyzing various features of
the image.
• The retail field has been one of the fastest growing fields and at the same me is using
Computer Vision for making the user experience more frui ul.
• Retailers can use Computer Vision techniques to track customers’ movements through stores,
analyse naviga onal routes and detect walking pa erns.
• Through security camera image analysis, a Computer Vision algorithm can generate a very
accurate es mate of the items available in the store.
2
ARTIFICIAL INTELLIGENCE
MISS VEENA (PGT CS)
5. Self-Driving Cars
• Most leading car manufacturers in the world are reaping the benefits of inves ng in ar ficial
intelligence for developing on-road versions of hands-free technology.
• This involves the process of iden fying the objects, ge ng naviga onal routes and also at the
same me environment monitoring.
6. Medical Imaging
• For the last decades, computer supported medical imaging applica on has been a trustworthy
help for physicians.
• It doesn’t only create and analyse images, but also becomes an assistant and helps doctors
with their interpreta on.
• The applica on is used to read and convert 2D scan images into interac ve 3D models that
enable medical professionals to gain a detailed understanding of a pa ent’s health condi on.
• All you need to do to read signs in a foreign language is to point your phone’s camera at the
words and let the Google Translate app tell you what it means in your preferred language
almost instantly.
• By using op cal character recogni on to see the image and augmented reality to overlay an
accurate transla on, this is a convenient tool that uses Computer Vision.
3
ARTIFICIAL INTELLIGENCE
MISS VEENA (PGT CS)
The various applica ons of Computer Vision are based on a certain number of tasks which are
performed to get certain informa on from the input image which can be directly used for predic on
or forms the base for further analysis. The tasks used in a computer vision applica on are:
• Classifica on
Image Classifica on problem is the task of assigning an input image one label from a fixed set of
categories. This is one of the core problems in CV that, despite its simplicity, has a large variety of
prac cal applica ons.
• Classifica on + Localisa on
This is the task which involves both processes of iden fying what object is present in the image and at
the same me iden fying at what loca on that object is present in that image. It is used only for single
objects.
4
ARTIFICIAL INTELLIGENCE
MISS VEENA (PGT CS)
• Object Detec on
Object detec on is the process of finding instances of real-world objects such as faces, bicycles, and
buildings in images or videos. Object detec on algorithms typically use extracted features and learning
algorithms to recognize instances of an object category. It is commonly used in applica ons such as
image retrieval and automated vehicle parking systems.
• Instance Segmenta on
Instance Segmenta on is the process of detec ng instances of the objects, giving them a category and
then giving each pixel a label on the basis of that. A segmenta on algorithm takes an image as input
and outputs a collec on of regions (or segments).
IMAGES:
We all see a lot of images around us and use them daily either through our mobile phones or computer
system. But do we ask some basic ques ons to ourselves while we use them on such a regular basis.
1. Basics of Pixels:
• The more pixels you have, the more closely the image resembles the original.
5
ARTIFICIAL INTELLIGENCE
MISS VEENA (PGT CS)
• Ex: In the image below, one por on has been magnified many mes over so that you can see
its individual composi on in pixels.
• The more pixels you have, the more closely the image resembles the original.
2. Resolu on
• The number of pixels covered in an image is some mes called the resolu on
• Term for area covered by the pixels in conven onally known as resolu on.
• For e.g. :1080 x 720 pixels is a resolu on giving numbers of pixels in width and height of that
picture.
3. Pixel value
• In the computer systems, computer data is in the form of ones and zeros, which we call the
binary system.
• Since each pixel uses 1 byte of an image, which is equivalent to 8 bits of data.
• Since each bit can have two possible values which tells us that the 8 bits can have 255
possibili es of values which starts from 0 and ends at 255.
6
ARTIFICIAL INTELLIGENCE
MISS VEENA (PGT CS)
4. Grayscale Images
• Grayscale images are images which have a range of shades of gray without apparent color.
• The darkest possible shade is black, which is the total absence of color or zero value of pixel.
• The lightest possible shade is white, which is the total presence of color or 255 value of a pixel.
• A grayscale has each pixel of size 1 byte having a single plane of 2d array of pixels.
• The size of a grayscale image is defined as the Height x Width of that image.
As you check, the value of pixels is within the range of 0- 255. the computers store the images we see
in the form of these numbers.
5. RGB Images
• These images are made up of three primary colours Red, Green and Blue.
• All the colours that are present can be made by combining different intensi es of red, green
and blue.
• Every RGB image is stored in the form of three different channels called the R channel, G
channel and the B channel.
• Each plane separately has a number of pixels with each pixel value varying from 0 to 255.
• All the three planes when combined together form a colour image.
• This means that in a RGB image, each pixel has a set of three different values which together
give colour to that par cular pixel.
7
ARTIFICIAL INTELLIGENCE
MISS VEENA (PGT CS)
• Example: Here, each colour image is stored in the form of three different channels, each
having different intensity. All three channels combine together to form a colour we see.
• if we split the image into three different channels, namely Red (R), Green (G) and Blue (B), the
individual layers will have the following intensity of colours of the individual pixels.
• These individual layers when stored in the memory looks like the image on the extreme right.
• The images look in the grayscale image because each pixel has a value intensity of 0 to 255
and as studied earlier, 0 is considered as black or no presence of colour and 255 means white
or full presence of colour.
• These three individual RGB values when combined together form the colour of each pixel.
• Therefore, each pixel in the RGB image has three values to form the complete colour.
IMAGE FEATURE
• Features are the specific structures in the image such as points, edges or objects.
• These features help us to perform various tasks and then get the analysis done on the basis of
the applica on.
8
ARTIFICIAL INTELLIGENCE
MISS VEENA (PGT CS)
• The features having the corners are easy to find as they can be found only at a par cular
loca on in the image, whereas the edges which are spread over a line or an edge look the
same all along.
• As we saw an example, this tells us that the corners are always good features to extract from
an image followed by the edges.
INTRODUCTION TO OpenCV:
• we have learnt about image features and its importance in image processing, we will learn
about a tool we can use to extract these features from our image for further processing.
• OpenCV or Open-Source Computer Vision Library is that tool which helps a computer extract
these features from the images.
• It is used for all kinds of images and video processing and analysis.
• It is capable of processing images and videos to iden fy objects, faces, or even handwri ng.
• We will use OpenCV for basic image processing opera ons on images such as resizing, cropping
and many more.
• To install OpenCV library, open anaconda prompt and then write the following command:
9
ARTIFICIAL INTELLIGENCE
MISS VEENA (PGT CS)
• Which can take in an input image, assign importance (learnable weights and biases) to various
aspects/objects in the image and be able to differen ate one from the other.
We give an input image, which is then processed through a CNN and then gives predic on on the basis
of the label given in the par cular dataset.
The different layers of a Convolu onal Neural Network (CNN) are as follows:
10
ARTIFICIAL INTELLIGENCE
MISS VEENA (PGT CS)
1) Convolu on Layer
3) Pooling Layer
1. Convolu on Layer
It is the first layer of a CNN.
The objec ve of the Convolu on Opera on is to extract the high-level features such
as edges, from the input image.
CNN need not be limited to only one Convolu onal Layer.
Conven onally, the first Convolu on Layer is responsible for capturing the Low-Level
features such as edges, colour, gradient orienta on, etc.
With added layers, the architecture adapts to the High-Level features as well, giving
us a network which has the wholesome understanding of images in the dataset.
In the convolu on layer, there are several kernels that are used to produce several
features.
• We only focus on the features of the image that can help us in processing the image further.
For example, you might only need to recognize someone’s eyes, nose and mouth to recognize the
person. You might not need to see the whole face.
11
ARTIFICIAL INTELLIGENCE
MISS VEENA (PGT CS)
• If we see the two graphs side by side, the one on the le is a linear graph.
• This graph when passed through the ReLU layer, gives the one on the right.
• The ReLU graph starts with a horizontal straight line and then increases linearly as it reaches a
posi ve number.
As shown in the above convolved image, there is a smooth grey gradient change from black to white.
A er applying the ReLu func on, we can see a more abrupt change in color which makes the edges
more obvious which acts as a be er feature for the further layers in a CNN as it enhances the ac va on
layer.
12
ARTIFICIAL INTELLIGENCE
MISS VEENA (PGT CS)
3. Pooling Layer
The Pooling layer is responsible for reducing the spa al size of the Convolved Feature
while s ll retaining the important features.
Type of pooling which can be performed on an image.
Max Pooling: Max Pooling returns the maximum value from the por on of the image
covered by the Kernel.
The pooling layer is an important layer in the CNN as it performs a series of tasks which
are as follows:
a. Makes the image smaller and more manageable
b. Makes the image more resistant to small transforma ons, distor ons and
transla ons in the input image.
A small difference in input image will create very similar pooled image.
13
ARTIFICIAL INTELLIGENCE
MISS VEENA (PGT CS)
Final output
Conclusion
14