Introduction to Computer Vision
Computer vision is a branch in the Domain of AI that enables computers to
analyze meaningful information from images, videos, and other visual inputs.
Computer vision can enable models to see through images or visual data,
process and analyze them on the basis of algorithms and methods in order to
analyze actual phenomena with images.
➢The entire process involves image acquiring, screening, analyzing, identifying
and extracting information.
➢This extensive processing helps computers to understand any visual content
and act on it accordingly.
➢In computer vision, Input to machines can be photographs, videos and pictures
from thermal or infrared sensors, indicators and different sources.
Applications of Computer Vision
This decade and the upcoming one can witness a significant leap in technology
that has put computer vision on the priority list. Some common uses of
Computer Vision are:
Facial recognition
The most frequently used technology is smartphones. It is a technology to
remember and verify a person, object, etc from the visuals from the given pre-
defined data. Such kinds of mechanics are often used for security and safety
purposes.
For eg: Face security lock-in devices and traffic cameras are some examples
using facial recognition.
Facial filters
Modern days social media apps like Snapchat and Instagram use such kinds of
technology that extract facial landmarks and process them using AI to get the
best result
Google Lens
To search data, Google uses Computer vision for capturing and analyzing
different features of the input image to the database of images and then gives us
the search.
Automotive
The machinery in industries is now using Computer vision . Automated cars are
equipped with sensors and software which can detect the 360 degrees of
movements determine the location, detect objects and establish the depth or
dimensions of the virtual world.
For eg: Companies like Tesla are now interested in developing self-driving cars
Medical Imaging
For the last decades, computer vision medical imaging application has been a
trustworthy help for physicians and doctors. It creates and analyzes images and
helps doctors with their interpretation.
The application is used to read and convert 2D scan images into interactive 3D
models.
Computer Vision Tasks
The Application of the computer is performed by certain tasks on the data
or input provided by the user so it can process and analyze the situation and
predict the outcome.
Single object
Image Classification
Image Classification is the task of identifying an object in the input image
and label from a predefined category.
Classification + Localization
As the name suggests, the task identifies the object and locates it in the input
image.
Multiple object
Object detection
Object detection extract features from the input and use learned
formulas to recognize instances of an object category.
Instance segmentation
Instance segmentation assigns a label to each pixel of the image. It
is used for tasks such as counting the number of objects
Basics of Images
The word “pixel” means a picture element.
Pixels
• Pixels are the fundamental element of a photograph.
• They are the smallest unit of information that make up a picture.
• They are typically arranged in a 2-dimensional grid.
• In general term, the more pixels you have, the more closely the image
resembles the original.
Resolution
• The number of pixels covered in an image is sometimes called the
resolution
• Term for area covered by the pixels in a 2-dimensional grid is known as
resolution.
• For eg :1080 x 720 pixels is a resolution giving numbers of pixels in width
and height of that picture.
• A megapixel is a million pixels
Pixel value
• Pixel value represent the brightness of the pixel.
• The range of a pixel value in 0-255(2^8-1)
• where 0 is taken as Black or no colour and 255 is taken as white.
Why do pixel values have numbers?
Computer systems only work in the form of ones and zeros or binary systems.
Each bit in a computer system can have either a zero or a one. Each pixel uses 1
byte of an image each bit can have two possible values which tells us that the 8
bit can have 255 possibilities of values that start from 0 and ends at 255.
Grayscale Images
• Grayscale images are images which have a range of shades of gray
without apparent color. The lightest shade is white total presence of colour
or 255, and darkest colour is black at 0.
• Intermediate shades of gray have equal brightness levels of the three
primary colors RBG.
• The computers store the images we see in the form of these numbers.
RGB colors
• All the colored images are made up of three primary colors Red, Green and
Blue. All the other colors are formed by using these primary colors at
different proportions.
• Computer stores RGB Images in three different channels called the R
channel, G channel and B channel.
• Thus, a pixel value of an RGB image can depend on the blue, green and red
values.
Image Features
• A feature is a description of an image. Features are specific structures in
the image such as points, edges or objects.
• Other examples of features are related to tasks of CV motion in image
sequences, or to shapes defined in terms of curves or boundaries between
different image regions.
• Corners are the easiest features to find and extract in an image.