0% found this document useful (0 votes)
105 views32 pages

Unit 2 Computer Vision Applications-1

The document discusses various applications of computer vision, including Document Image Analysis, Biometrics, Object Recognition, Object Tracking, Medical Image Analysis, Content-Based Image Retrieval, Video Data Processing, Multimedia, and Virtual/Augmented Reality. Each section outlines the techniques, technologies, and benefits associated with these applications, emphasizing the role of machine learning and computer vision in processing and analyzing visual data. It highlights the importance of image quality and sophisticated algorithms in achieving accurate results across different domains.

Uploaded by

aryansuthar194
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
105 views32 pages

Unit 2 Computer Vision Applications-1

The document discusses various applications of computer vision, including Document Image Analysis, Biometrics, Object Recognition, Object Tracking, Medical Image Analysis, Content-Based Image Retrieval, Video Data Processing, Multimedia, and Virtual/Augmented Reality. Each section outlines the techniques, technologies, and benefits associated with these applications, emphasizing the role of machine learning and computer vision in processing and analyzing visual data. It highlights the importance of image quality and sophisticated algorithms in achieving accurate results across different domains.

Uploaded by

aryansuthar194
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

Unit 2: Computer

Vision Applications
Contents

◉ Document Image Analysis


◉ Biometrics
◉ Object Recognition
◉ Object Tracking
◉ Medical Image Analysis
◉ Content-based Image Retrieval
◉ Video Data Processing
◉ Virtual Reality and Augmented Reality
◉ Image Formation
◉ Image Representations (Continuous And Discrete)
◉ Image Preprocessing Techniques
Document Image Processing System

◉ Documents containing a combination of texts, images, tables,


codes, etc., in complex layouts are digitally saved in image format.
Analyzing and extracting useful information out of these image
documents is performed with the help of machine learning. This
supervised task is termed as Document Image Analysis (DIA).
Architecture of Document Image Processing (DIP) System
◉ A DIP system should be capable of handling documents with varied
layouts, containing text, graphics, line drawings, and half-tones.
Several special-purpose modules are required to process different
types of document components. It is essential to have a global data
representation to facilitate communication between processes. This
also allows independent subsystem development without being
concerned with communication protocol.
◉ As in the figure, the image scanner optically captures text images
to be recognized. Text images are processed with DIP software and
hardware.
◉ The process involves three operations: document analysis
(extracting individual character images), recognizing these images
(based on shape), and contextual processing (either to correct
misclassifications made by the recognition algorithm or to limit
recognition choices).
Document Image Analysis

◉ Document image analysis refers to algorithms and techniques that are applied to images of
documents to obtain a computer-readable description from pixel data. A well-known document image
analysis product is the Optical Character Recognition (OCR) software that recognizes characters in a
scanned document. OCR makes it possible for the user to edit or search the document’s contents.
◉ Text is extracted from the document image in a process known as document image analysis. Reliable
character segmentation and recognition depend on both original document quality and registered
image quality.
◉ Processes that attempt to compensate for poor quality originals or poor quality scanning include
image enhancement, underline removal, and noise removal.
◉ Image enhancement methods emphasize character v/s non character discrimination. Underline
removal erases printed guidelines and other lines which may touch characters and interfere with
character recognition. Noise removal erases portions of the image that are not part of the characters.
◉ Prior to character recognition it is necessary to isolate individual characters from the text image.
◉ In low-quality or nonuniform text images these sophisticated algorithms not correctly extract
characters and thus, recognition errors may occur. Recognition of unconstrained handwritten text can
be very difficult because characters cannot be reliably isolated especially when the text is cursive
handwriting.
Biometrics

◉ Biometrics deals with the recognition of persons based on physiological characteristics, such as face,
fingerprint, vascular pattern or iris, and behavioural traits, such as gait or speech.
◉ It combines Computer Vision with knowledge of human physiology and behaviour.
◉ Computer vision technology is used to detect fingerprints and biometrics to validate a user's identity.
Biometrics is the measurement or analysis of physiological characteristics of a person that make a
person unique such as Face, Finger Print, iris Patterns, etc. It makes use of computer vision along with
knowledge of human physiology and behaviour.
The widely used methods of biometric authentication below:
◉ Fingerprints. Applied to compare two examples of friction ridge skin impression from human fingers,
palm or toes.
◉ Iris recognition. Utilized to identify the part of the eye with the help of image processing and
concepts of neural networks.
◉ Face recognition. Used face to distinguish one person from the other, automatically identifies or
verifies a person.
◉ Voice. Applied to recognize speech with the help of natural language processing technology.
◉ DNA. Utilized to measure and analyze the individual’s Deoxyribonucleic acid to distinguish people
with some degree of probability.
Biometrics (Cont.)
Biometrics (Cont.)

Why biometrics technology so beneficial?


◉ Companies across a wide range of business verticals are selecting a computer vision based biometric
technology to secure their secret and confidential data, prevent security holes and identity theft,
enhance the overall user experience.
Biometrics (Cont.)
Object Recognition

◉ Object recognition is a computer vision technique for identifying objects in images or videos. Object
recognition is a key output of deep learning and machine learning algorithms. When humans look at a
photograph or watch a video, we can readily spot people, objects, scenes, and visual details. The goal
is to teach a computer to do what comes naturally to humans: to gain a level of understanding of what
an image contains.
Object Recognition (Cont.)

◉ It is an applied artificial intelligence approach that repurposes a computer as an object detector so it


can scan an image or video from the real world. It understands the object’s features and interprets its
purpose just like humans do.
◉ It aims at helping a computer see an existing image and break it down into a series of pixels to
recognize a specific pattern or shape.
◉ A successful object recognition algorithm depends on the quality of data required to train it. More data
means that the model will more quickly classify objects based on known characteristics.
◉ The probability of accurately identifying an object depends on an image's attributes. In artificial
intelligence, the system calculates a confidence score to predict the label or class of object. However,
algorithmic computation in object recognition is a bit complex and requires complete understanding to
achieve results.
◉ It is a key technology behind driverless cars, enabling them to recognize a stop sign or to distinguish a
pedestrian from a lamppost. It is also useful in a variety of applications such as disease identification
in bioimaging, industrial inspection, and robotic vision
Object Tracking

◉ Object tracking is a computer vision technique used to follow a particular object or multiple items.
Generally, object tracking has applications in videos and real-world interactions, where objects are
firstly detected and then tracked to get observation.
◉ It is used in applications such as Autonomous vehicles, where apart from object classification and
detection such as pedestrians, other vehicles, etc., tracking of real-time motion is also required to
avoid accidents and follow the traffic rules.
◉ The basics of object tracking rely on object detection, but the object, in this case, is viewed from
different angles, and may look completely different in some scenarios.
Object Tracking (Cont.)

◉ Here is a breakdown of how object tracking allows us to select, track, and retain individual classes of
objects in space and over time:
Steps involved in Object Tracking:
◉ Input: The first step is to give input such as video or a
real-time feed from a camera and preprocess each
frame using OpenCV. Pre-preocessing is essential so
the model has consistent data with which to work.
◉ Object Detection: Next, you choose an object
detection algorithm that classifies and detects the
object by creating a bounding box around it.
◉ Labeling: Next, the object tracking algorithm assigns
a unique identification label for each object that has
been identified. For example, this could be all of the
cars in a video feed of a racing track.
◉ Tracking: The last step is keeping track of the
detected object moving through different frames while
storing its relevant path information.
Object Tracking (Cont.)
Medical Image Analysis

◉ Technology today is extremely advanced and now physicians can call upon a variety of imaging
techniques to help examine the inside of the body and therefore make an accurate diagnosis such as
scans and images of the body.
◉ Medical imaging is the process of producing visible images of inner structures of the body for scientific
and medicinal study and treatment as well as a visible view of the function of interior tissues. This
process pursues the disorder identification and management. It creates data bank of regular structure
and function of the organs to make it easy to recognize the anomalies.
◉ Medical image analysis is the science of solving/analyzing medical problems based on different
imaging modalities and digital image analysis techniques.
Types of medical imagery:
◉ Radiologic technology
◉ Ultrasound technology
◉ CT Scans
◉ MRI Scans
Medical Image Analysis

◉ Computer Vision and Machine Learning are being vigorously employed in medical image analysis for
better detecting lesions on bodies and images. The main purpose of CV for medical image analysis is
to unload medical personnel from routine work with large data (bacteria counting), recognize large
amounts of diverse data (multiple lesions), and recognize single small deviations from the norm.
Benefits of Medical Image Analysis: The benefits of medical image analysis are indeed hard to count
and depend on applications and modalities. But they can be generalized. Due to intelligent medical image
analysis doctors and researchers can:
◉ save time and attention with routine and big data volumes,
◉ define precise segments for medical treatment,
◉ detect the slightest abnormalities and lesions in early stages.
Steps of Intelligent medical image analysis can be generalized into 3 milestones similar for all Machine
Learning solutions:
◉ Detection
◉ Recognition
◉ Alert
Medical Image Analysis

◉ The quality of images is important to detect the requested objects and developer companies can
process the original images if needed.
◉ To identify the abnormalities, the solution makes segments of all the objects that are present on the
medical image. Then, the solution detects and classifies the objects using the library.
◉ Overlapping or registration of images that are taken in different modalities (from different diagnostic
equipment) or images taken in different periods in dynamics is a very important step in image analysis
AI medical field for doctor’s decision making. The detected and recognized formations can be further
quantified in size, form, and structure.
◉ With these steps, intelligent medical image analysis solutions are aimed to analyze complex and
numerous images fast.
◉ To run through these steps, software for medical image analysis has to be well trained and operate
with huge libraries.
Content Based Image Retrieval

◉ Content-Based Image Retrieval (CBIR) is a way of retrieving images from a database. In CBIR, a user
specifies a query image and gets the images in the database similar to the query image. To find the
most similar images, CBIR compares the content of the input image to the database images.
◉ The most famous CBIR system is the search per image feature of Google search
Content Based Image Retrieval

◉ "Content-based" means that the search analyzes the contents of the image rather than
the metadata such as keywords, tags, or descriptions associated with the image. The term "content"
in this context might refer to colors, shapes, textures, or any other information that can be derived
from the image itself.
◉ More specifically, CBIR compares visual features such as shapes, colors, texture and spatial
information and measures the similarity between the query image with the images in the database
with respect to those features.
Difference Between Text-Based Image Retrieval and Content-Based Image Retrieval
◉ In the Text-Based Image Retrieval (TBIR) approach, experts manually annotate images with
geolocation, keywords, tags, labels, or short descriptions. Users can use keywords, annotations, or
descriptions to retrieve similar images that exist in the database:
Content Based Image Retrieval

◉ Basically when users specify text or description as a search term, TBIR will retrieve the images that
were assigned similar textual tags.
◉ The approach is simple and intuitive but has some disadvantages. Firstly, since it takes a considerable
amount of time to annotate images manually, it’s labor-intensive. Further, the tags may be unreliable
due to the dependency on people’s perceptions and interpretations that can vary widely across groups
of individuals. To address those problems, CBIR compares the visual contents of images directly. In
that way, there is no need for human labor as well as their subjective and error-prone perception.
Video Data Processing

◉ Video processing consists in signal processing employing statistical analysis and video filters to extract
information or perform video manipulation.
◉ Basic video processing techniques include trimming, image resizing, brightness and contrast
adjustment, fade in and fade out, amongst others.
◉ More complex video processing techniques, also known as Computer Vision Techniques, are based on
image recognition and statistical analysis to perform tasks such as face recognition, detection of
certain image patterns, and computer-human interaction.
◉ Video files can be converted, compressed or decompressed using particular software devices.
◉ Usually, compression involves a reduction of the bitrate (the number of bits processed per time unit),
which makes it possible to store the video digitally and stream it over the network.
◉ Uncompressed audio or video usually are called RAW streams, and although different formats and
codecs for raw data exist, they appear to be too heavy (in bitrate terms) to be stored or streamed over
the network in these formats.
Multimedia

◉ Multimedia is a type of medium that allows information to be easily transferred from one location to
another.
◉ Multimedia is the presentation of text, pictures, audio, and video with links and tools that allow the
user to navigate, engage, create, and communicate using a computer.
◉ Multimedia refers to the computer-assisted integration of text, drawings, still and moving
images(videos) graphics, audio, animation, and any other media in which any type of information can
be expressed, stored, communicated, and processed digitally.
◉ To begin, a computer must be present to coordinate what you see and hear, as well as to interact with.
Second, there must be interconnections between the various pieces of information. Third, you’ll need
navigational tools to get around the web of interconnected data.
◉ Multimedia is being employed in a variety of disciplines, including education, training, and business.
◉ Computer vision technology plays a key role in diverse multimedia applications, including surveillance,
environmental monitoring, smart space, and so on. These has led to a massive research effort devoted
to research challenges in the development of computer vision algorithms for managing, processing,
analyzing, and interpreting the multimedia data collected. The aim of this special issue is to
consolidate the recent research achievements that address the broad challenges in computer vision
technologies with a specific focus in multimedia applications.
Virtual Reality and Augmented Reality

◉ VR immerses a person into a virtual world stimulating their real presence through senses. This
stimulation can be achieved through a source of content and hardware like headsets, treadmills,
gloves and so on. Computer vision aids virtual reality with robust vision capabilities like SLAM
(simultaneous localization and mapping), SfM (structure from motion), user body tracking and gaze
tracking. The more deeply users can immerse themselves in a VR environment -- and block out their
physical surroundings -- the more they are able to suspend their belief and accept it as real, even if it
is fantastical in nature.
Virtual Reality and Augmented Reality

◉ Using cameras and sensors, these functions help VR systems analyze the user’s environment and
detect the headset’s location. So, computer vision and virtual reality work together to make products
more sophisticated and user-responsive.
◉ Augmented reality also is sometimes referred to as a type of virtual reality, although many would
argue that it is a separate but related field. With augmented reality, virtual simulations are overlaid
onto real-world environments in order to enhance or augment those environments. For example, a
furniture retailer might provide an app that enables users to point their phones at a room and visualize
what a new chair or table might look like in that setting.
◉ Computer vision (CV) for augmented reality enables computers to obtain, process, analyze and
understand digital videos and images. By looking at an object and its appearance, location, and the
settings, it identifies what the object is. More simply, this is how Instagram recognizes your friends by
photo tags, how you can log in into your bank account with your eyes, and how you can get yourself a
flower crown on Snapchat.
Virtual Reality and Augmented Reality

◉ A predefined AR content (be it a deer face or hearts around the head), you should get an accurate
face scan. In this case, computer vision enables AR image processing, optical tracking, and scene
reconstruction, which is vital for any immersive app. Next, a computer vision-based AR system scans
your photo with sensors to add the real-time visual effects to your face. It all brings up the mix of the
physical world and AR data.
Image Formation

The function f(x,y) may be characterized by two components:


◉ The amount of source illumination incident on the scene being viewed, and
◉ The amount of illumination reflected by the objects in the scene
◉ These are called the illumination and reflectance components and are denoted by i(x,y) and r(x,y)
respectively.
◉ The two functions combine as a product to form (x,y):
Image Formation (cont.)
Digital Image Representation

◉ An image can be defined as a 2D signal that varies over the spatial coordinates x and y and can be
written mathematically as f(x,y)
Digital Image Representation (Cont.)

◉ In general, the image can be written as a mathematical function f(x,y) as follows:


◉ The value of the function f(x,y) at every point indexed by a row and a column is called grey value or
intensity of the image.

◉ Image Representation by 2D finite matrix shown in above figure is called Sampling.


◉ Each matrix element represented by one of the finite set of discrete values is called Quantization.
◉ To display the images it is first converted to analog signal which is scanned onto a display.
Digital Image Representation (Cont.)

Sampler Quantization Digital Computer

Digital Computer D to A converter Display to user


Digital Image Representation (Cont.)

◉ Resolution is an important characteristic of an imaging system.


◉ It is the ability of the imaging system to produce the smallest discernable details, i.e., the smallest
sized object clearly, and differentiate it from neighbouring small objects that are present in the image.
◉ The number of rows in digital image is called Vertical resolution and the number of column is known as
horizontal resolution.
Image resolution depends on two factors:
◉ Optical Resolution of lens
◉ Spatial Resolution
Spatial Resolution also depends on two parameters:
◉ Number of pixels of the image
◉ Number of bits necessary for adequate intensity resolution, referred to as the bit depth
◉ The number of bits necessary to encode the pixel value is called bit depth. Bit depth is a power of two;
it can be written as powers of 2.
◉ So the total number of bits necessary to represent the image is = Number of rows x Number of
columns * Bit depth
Q. What is the storage requirement for a 1024x1024 binary image?
Image Pre-processing Techniques

◉ As a Machine Learning Engineer, data pre-processing or data cleansing is a crucial step


and most of the ML engineers spend a good amount of time in data pre-processing
before building the model. Some examples for data pre-processing includes outlier
detection, missing value treatments and remove the unwanted or noisy data.
◉ Similarily, Image pre-processing is the term for operations on images at the lowest
level of abstraction. These operations do not increase image information content but
they decrease it if entropy is an information measure. The aim of pre-processing is an
improvement of the image data that suppresses undesired distortions or enhances
some image features relevant for further processing and analysis task. Those features
might vary for different applications.
◉ For example, if we are working on a project which can automate Vehicle Identification,
then our main focus lies on the vehicle, its colour, the registration plate, etc., We do not
focus on the road or the sky or something which isn't necessary for this particular
application.
Thank You

You might also like