Unit 2 Computer Vision Applications-1
Unit 2 Computer Vision Applications-1
Vision Applications
Contents
◉ Document image analysis refers to algorithms and techniques that are applied to images of
documents to obtain a computer-readable description from pixel data. A well-known document image
analysis product is the Optical Character Recognition (OCR) software that recognizes characters in a
scanned document. OCR makes it possible for the user to edit or search the document’s contents.
◉ Text is extracted from the document image in a process known as document image analysis. Reliable
character segmentation and recognition depend on both original document quality and registered
image quality.
◉ Processes that attempt to compensate for poor quality originals or poor quality scanning include
image enhancement, underline removal, and noise removal.
◉ Image enhancement methods emphasize character v/s non character discrimination. Underline
removal erases printed guidelines and other lines which may touch characters and interfere with
character recognition. Noise removal erases portions of the image that are not part of the characters.
◉ Prior to character recognition it is necessary to isolate individual characters from the text image.
◉ In low-quality or nonuniform text images these sophisticated algorithms not correctly extract
characters and thus, recognition errors may occur. Recognition of unconstrained handwritten text can
be very difficult because characters cannot be reliably isolated especially when the text is cursive
handwriting.
Biometrics
◉ Biometrics deals with the recognition of persons based on physiological characteristics, such as face,
fingerprint, vascular pattern or iris, and behavioural traits, such as gait or speech.
◉ It combines Computer Vision with knowledge of human physiology and behaviour.
◉ Computer vision technology is used to detect fingerprints and biometrics to validate a user's identity.
Biometrics is the measurement or analysis of physiological characteristics of a person that make a
person unique such as Face, Finger Print, iris Patterns, etc. It makes use of computer vision along with
knowledge of human physiology and behaviour.
The widely used methods of biometric authentication below:
◉ Fingerprints. Applied to compare two examples of friction ridge skin impression from human fingers,
palm or toes.
◉ Iris recognition. Utilized to identify the part of the eye with the help of image processing and
concepts of neural networks.
◉ Face recognition. Used face to distinguish one person from the other, automatically identifies or
verifies a person.
◉ Voice. Applied to recognize speech with the help of natural language processing technology.
◉ DNA. Utilized to measure and analyze the individual’s Deoxyribonucleic acid to distinguish people
with some degree of probability.
Biometrics (Cont.)
Biometrics (Cont.)
◉ Object recognition is a computer vision technique for identifying objects in images or videos. Object
recognition is a key output of deep learning and machine learning algorithms. When humans look at a
photograph or watch a video, we can readily spot people, objects, scenes, and visual details. The goal
is to teach a computer to do what comes naturally to humans: to gain a level of understanding of what
an image contains.
Object Recognition (Cont.)
◉ Object tracking is a computer vision technique used to follow a particular object or multiple items.
Generally, object tracking has applications in videos and real-world interactions, where objects are
firstly detected and then tracked to get observation.
◉ It is used in applications such as Autonomous vehicles, where apart from object classification and
detection such as pedestrians, other vehicles, etc., tracking of real-time motion is also required to
avoid accidents and follow the traffic rules.
◉ The basics of object tracking rely on object detection, but the object, in this case, is viewed from
different angles, and may look completely different in some scenarios.
Object Tracking (Cont.)
◉ Here is a breakdown of how object tracking allows us to select, track, and retain individual classes of
objects in space and over time:
Steps involved in Object Tracking:
◉ Input: The first step is to give input such as video or a
real-time feed from a camera and preprocess each
frame using OpenCV. Pre-preocessing is essential so
the model has consistent data with which to work.
◉ Object Detection: Next, you choose an object
detection algorithm that classifies and detects the
object by creating a bounding box around it.
◉ Labeling: Next, the object tracking algorithm assigns
a unique identification label for each object that has
been identified. For example, this could be all of the
cars in a video feed of a racing track.
◉ Tracking: The last step is keeping track of the
detected object moving through different frames while
storing its relevant path information.
Object Tracking (Cont.)
Medical Image Analysis
◉ Technology today is extremely advanced and now physicians can call upon a variety of imaging
techniques to help examine the inside of the body and therefore make an accurate diagnosis such as
scans and images of the body.
◉ Medical imaging is the process of producing visible images of inner structures of the body for scientific
and medicinal study and treatment as well as a visible view of the function of interior tissues. This
process pursues the disorder identification and management. It creates data bank of regular structure
and function of the organs to make it easy to recognize the anomalies.
◉ Medical image analysis is the science of solving/analyzing medical problems based on different
imaging modalities and digital image analysis techniques.
Types of medical imagery:
◉ Radiologic technology
◉ Ultrasound technology
◉ CT Scans
◉ MRI Scans
Medical Image Analysis
◉ Computer Vision and Machine Learning are being vigorously employed in medical image analysis for
better detecting lesions on bodies and images. The main purpose of CV for medical image analysis is
to unload medical personnel from routine work with large data (bacteria counting), recognize large
amounts of diverse data (multiple lesions), and recognize single small deviations from the norm.
Benefits of Medical Image Analysis: The benefits of medical image analysis are indeed hard to count
and depend on applications and modalities. But they can be generalized. Due to intelligent medical image
analysis doctors and researchers can:
◉ save time and attention with routine and big data volumes,
◉ define precise segments for medical treatment,
◉ detect the slightest abnormalities and lesions in early stages.
Steps of Intelligent medical image analysis can be generalized into 3 milestones similar for all Machine
Learning solutions:
◉ Detection
◉ Recognition
◉ Alert
Medical Image Analysis
◉ The quality of images is important to detect the requested objects and developer companies can
process the original images if needed.
◉ To identify the abnormalities, the solution makes segments of all the objects that are present on the
medical image. Then, the solution detects and classifies the objects using the library.
◉ Overlapping or registration of images that are taken in different modalities (from different diagnostic
equipment) or images taken in different periods in dynamics is a very important step in image analysis
AI medical field for doctor’s decision making. The detected and recognized formations can be further
quantified in size, form, and structure.
◉ With these steps, intelligent medical image analysis solutions are aimed to analyze complex and
numerous images fast.
◉ To run through these steps, software for medical image analysis has to be well trained and operate
with huge libraries.
Content Based Image Retrieval
◉ Content-Based Image Retrieval (CBIR) is a way of retrieving images from a database. In CBIR, a user
specifies a query image and gets the images in the database similar to the query image. To find the
most similar images, CBIR compares the content of the input image to the database images.
◉ The most famous CBIR system is the search per image feature of Google search
Content Based Image Retrieval
◉ "Content-based" means that the search analyzes the contents of the image rather than
the metadata such as keywords, tags, or descriptions associated with the image. The term "content"
in this context might refer to colors, shapes, textures, or any other information that can be derived
from the image itself.
◉ More specifically, CBIR compares visual features such as shapes, colors, texture and spatial
information and measures the similarity between the query image with the images in the database
with respect to those features.
Difference Between Text-Based Image Retrieval and Content-Based Image Retrieval
◉ In the Text-Based Image Retrieval (TBIR) approach, experts manually annotate images with
geolocation, keywords, tags, labels, or short descriptions. Users can use keywords, annotations, or
descriptions to retrieve similar images that exist in the database:
Content Based Image Retrieval
◉ Basically when users specify text or description as a search term, TBIR will retrieve the images that
were assigned similar textual tags.
◉ The approach is simple and intuitive but has some disadvantages. Firstly, since it takes a considerable
amount of time to annotate images manually, it’s labor-intensive. Further, the tags may be unreliable
due to the dependency on people’s perceptions and interpretations that can vary widely across groups
of individuals. To address those problems, CBIR compares the visual contents of images directly. In
that way, there is no need for human labor as well as their subjective and error-prone perception.
Video Data Processing
◉ Video processing consists in signal processing employing statistical analysis and video filters to extract
information or perform video manipulation.
◉ Basic video processing techniques include trimming, image resizing, brightness and contrast
adjustment, fade in and fade out, amongst others.
◉ More complex video processing techniques, also known as Computer Vision Techniques, are based on
image recognition and statistical analysis to perform tasks such as face recognition, detection of
certain image patterns, and computer-human interaction.
◉ Video files can be converted, compressed or decompressed using particular software devices.
◉ Usually, compression involves a reduction of the bitrate (the number of bits processed per time unit),
which makes it possible to store the video digitally and stream it over the network.
◉ Uncompressed audio or video usually are called RAW streams, and although different formats and
codecs for raw data exist, they appear to be too heavy (in bitrate terms) to be stored or streamed over
the network in these formats.
Multimedia
◉ Multimedia is a type of medium that allows information to be easily transferred from one location to
another.
◉ Multimedia is the presentation of text, pictures, audio, and video with links and tools that allow the
user to navigate, engage, create, and communicate using a computer.
◉ Multimedia refers to the computer-assisted integration of text, drawings, still and moving
images(videos) graphics, audio, animation, and any other media in which any type of information can
be expressed, stored, communicated, and processed digitally.
◉ To begin, a computer must be present to coordinate what you see and hear, as well as to interact with.
Second, there must be interconnections between the various pieces of information. Third, you’ll need
navigational tools to get around the web of interconnected data.
◉ Multimedia is being employed in a variety of disciplines, including education, training, and business.
◉ Computer vision technology plays a key role in diverse multimedia applications, including surveillance,
environmental monitoring, smart space, and so on. These has led to a massive research effort devoted
to research challenges in the development of computer vision algorithms for managing, processing,
analyzing, and interpreting the multimedia data collected. The aim of this special issue is to
consolidate the recent research achievements that address the broad challenges in computer vision
technologies with a specific focus in multimedia applications.
Virtual Reality and Augmented Reality
◉ VR immerses a person into a virtual world stimulating their real presence through senses. This
stimulation can be achieved through a source of content and hardware like headsets, treadmills,
gloves and so on. Computer vision aids virtual reality with robust vision capabilities like SLAM
(simultaneous localization and mapping), SfM (structure from motion), user body tracking and gaze
tracking. The more deeply users can immerse themselves in a VR environment -- and block out their
physical surroundings -- the more they are able to suspend their belief and accept it as real, even if it
is fantastical in nature.
Virtual Reality and Augmented Reality
◉ Using cameras and sensors, these functions help VR systems analyze the user’s environment and
detect the headset’s location. So, computer vision and virtual reality work together to make products
more sophisticated and user-responsive.
◉ Augmented reality also is sometimes referred to as a type of virtual reality, although many would
argue that it is a separate but related field. With augmented reality, virtual simulations are overlaid
onto real-world environments in order to enhance or augment those environments. For example, a
furniture retailer might provide an app that enables users to point their phones at a room and visualize
what a new chair or table might look like in that setting.
◉ Computer vision (CV) for augmented reality enables computers to obtain, process, analyze and
understand digital videos and images. By looking at an object and its appearance, location, and the
settings, it identifies what the object is. More simply, this is how Instagram recognizes your friends by
photo tags, how you can log in into your bank account with your eyes, and how you can get yourself a
flower crown on Snapchat.
Virtual Reality and Augmented Reality
◉ A predefined AR content (be it a deer face or hearts around the head), you should get an accurate
face scan. In this case, computer vision enables AR image processing, optical tracking, and scene
reconstruction, which is vital for any immersive app. Next, a computer vision-based AR system scans
your photo with sensors to add the real-time visual effects to your face. It all brings up the mix of the
physical world and AR data.
Image Formation
◉ An image can be defined as a 2D signal that varies over the spatial coordinates x and y and can be
written mathematically as f(x,y)
Digital Image Representation (Cont.)