INTRODUCTION
1.0 The history of computer vision
Scientists and engineers have been trying to develop ways for machines to see and
understand visual data for about 60 years. Experimentation began in 1959 when
neurophysiologists showed a cat an array of images, attempting to correlate a response in its
brain. They discovered that it responded first to hard edges or lines, and scientifically, this meant
that image processing starts with simple shapes like straight edges. At about the same time, the
first computer image scanning technology was developed, enabling computers to digitize and
acquire images. Another milestone was reached in 1963 when computers were able to transform
two-dimensional images into three-dimensional forms. In the 1960s, AI emerged as an academic
field of study, and it also marked the beginning of the AI quest to solve the human vision
problem.
1974 saw the introduction of optical character recognition (OCR) technology, which
could recognize text printed in any font or typeface. Similarly, intelligent character recognition
(ICR) could decipher hand-written text using neural networks. Since then, OCR and ICR have
found their way into document and invoice processing, vehicle plate recognition, mobile
payments, machine translation and other common applications.
In 1982, neuroscientist David Marr established that vision works hierarchically and
introduced algorithms for machines to detect edges, corners, curves and similar basic shapes.
Concurrently, computer scientist Kunihiko Fukushima developed a network of cells that could
recognize patterns. The network, called the Neocognitron, included convolutional layers in a
neural network. By 2000, the focus of study was on object recognition, and by 2001, the first
real-time face recognition applications appeared. Standardization of how visual data sets are
tagged and annotated emerged through the 2000s. In 2010, the ImageNet data set became
available. It contained millions of tagged images across a thousand object classes and provides a
foundation for CNNs and deep learning models used today. In 2012, a team from the University
of Toronto entered a CNN into an image recognition contest. The model, called AlexNet,
significantly reduced the error rate for image recognition. After this breakthrough, error rates
have fallen to just a few percent.
1
1.1 What is computer vision?
Computer vision is a field of artificial intelligence (AI) that enables computers and systems to
derive meaningful information from digital images, videos and other visual inputs — and take
actions or make recommendations based on that information. If AI enables computers to think,
computer vision enables them to see, observe and understand.
Computer vision works much the same as human vision, except humans have a head start.
Human sight has the advantage of lifetimes of context to train how to tell objects apart, how far
away they are, whether they are moving and whether there is something wrong in an image.
Computer vision trains machines to perform these functions, but it has to do it in much less time
with cameras, data and algorithms rather than retinas, optic nerves and a visual cortex. Because a
system trained to inspect products or watch a production asset can analyze thousands of products
or processes a minute, noticing imperceptible defects or issues, it can quickly surpass human
capabilities.
Computer vision is used in industries ranging from energy and utilities to manufacturing and
automotive – and the market is continuing to grow.
Computer vision in AI is dedicated to the development of automated systems that can interpret
visual data (such as photographs or motion pictures) in the same manner as people do. The idea
behind computer vision is to instruct computers to interpret and comprehend images on a pixel
by-pixel basis. This is the foundation of the computer vision field. Regarding the technical side
of things, computers will seek to extract visual data, manage it, and analyze the outcomes using
sophisticated software programs.
1.2 Type of Computer Vision
Computer vision encompasses various types and applications, each aimed at enabling machines
to interpret and understand visual information. Here are some key types of computer vision:
2
1. Image Classification:
In image classification, a computer system is trained to categorize images into
predefined classes or labels. This is a foundational task in computer vision and is
often used in applications like object recognition.
2. Object Detection:
Object detection involves identifying and locating multiple objects within an
image or video frame. It goes beyond classification by providing information
about the position and boundaries of detected objects.
3. Semantic Segmentation:
Semantic segmentation involves classifying each pixel in an image, assigning it a
specific label or category. This type of computer vision is used to understand the
detailed context of an image.
4. Instance Segmentation:
Instance segmentation extends semantic segmentation by distinguishing
individual instances of objects within the same class. Each object is assigned a
unique label, allowing for precise identification.
5. Object Tracking:
Object tracking involves following the movement of specific objects across
consecutive frames in a video. It is commonly used in surveillance, autonomous
vehicles, and augmented reality.
6. Pose Estimation:
Pose estimation focuses on determining the spatial orientation and position of
objects or human bodies within an image or video. It is applied in applications
like motion capture and augmented reality.
7. Face Recognition:
Face recognition identifies and verifies individuals based on facial features. It is
used in security systems, authentication processes, and various applications
requiring identity verification.
8. Gesture Recognition:
3
Gesture recognition enables computers to interpret human gestures, such as hand
movements or facial expressions, as input commands. This technology is used in
human-computer interaction and virtual reality.
9. OCR (Optical Character Recognition):
OCR involves recognizing and extracting text from images or scanned
documents. It is used in document digitization, text translation, and data
extraction from printed material.
10. Scene Recognition:
Scene recognition focuses on understanding the broader context of an image, such
as identifying landscapes, indoor environments, or specific scenes. It is used in
robotics, navigation, and content analysis.
11. Medical Image Analysis:
Computer vision is applied in medical imaging to analyze and interpret various
medical images, including X-rays, MRIs, and CT scans. It assists in diagnosis,
treatment planning, and research.
12. Video Analysis:
Video analysis involves extracting meaningful information from video streams,
including object tracking, event detection, and behavior analysis. It is used in
surveillance, sports analysis, and video summarization.
13. Augmented Reality (AR) and Virtual Reality (VR):
Computer vision is integral to AR and VR applications, enabling virtual objects to
interact with the real world and enhancing immersive experiences.
1.3 Advantages of Computer vision
Computer vision brings numerous advantages across various industries and applications. Here
are some key advantages:
1. Automation of Repetitive Tasks:
Computer vision enables the automation of repetitive and labor-intensive tasks in
industries such as manufacturing and logistics. This improves efficiency, reduces
costs, and minimizes the risk of errors.
2. Enhanced Product Quality Control:
4
In manufacturing, computer vision systems can perform real-time quality control
by inspecting products for defects, ensuring consistency, and maintaining high
production standards.
3. Improved Healthcare Diagnostics:
Computer vision applications in healthcare, such as medical image analysis, assist
in diagnostics, disease detection, and treatment planning. This leads to faster and
more accurate medical assessments.
4. Facial Recognition for Security:
Facial recognition systems, a subset of computer vision, enhance security by
accurately identifying and verifying individuals. This technology is used in access
control, surveillance, and identity verification.
5. Efficient Object Recognition:
Computer vision excels at object recognition, allowing machines to identify and
classify objects within images or video streams. This is valuable in various
applications, including autonomous vehicles and augmented reality.
6. Improved Human-Computer Interaction:
Computer vision contributes to the development of intuitive and interactive
interfaces, enabling users to interact with devices through gestures, facial
expressions, and other natural cues.
7. Autonomous Vehicles and Navigation:
In autonomous vehicles, computer vision is crucial for recognizing obstacles,
pedestrians, and traffic signs. It plays a key role in navigation and decision-
making, contributing to the advancement of self-driving technology.
8. Enhanced Agriculture Practices:
Computer vision aids in precision agriculture by analyzing crop health,
monitoring growth patterns, and detecting diseases. This helps farmers optimize
resource usage and improve overall crop yield.
9. Improved Accessibility:
Computer vision technologies enhance accessibility for individuals with
disabilities. Applications like object recognition and text-to-speech assist visually
impaired users, promoting inclusivity.
10. Retail Analytics and Customer Engagement:
5
In retail, computer vision is used for analytics, inventory management, and
enhancing the overall shopping experience. Facial recognition and object
detection contribute to personalized marketing and security.
11. Real-Time Video Analysis:
Computer vision allows for real-time video analysis, enabling applications such as
surveillance, crowd monitoring, and anomaly detection. It is valuable for security
and public safety.
12. Smart Cities and Urban Planning:
Computer vision supports smart city initiatives by providing solutions for traffic
management, waste management, and urban planning. It contributes to creating
more efficient and sustainable urban environments.
13. Content Moderation:
Social media platforms and online content providers use computer vision to
moderate and filter content, identifying and removing inappropriate or violating
material.
14. Scientific Research and Exploration:
In scientific research, computer vision aids in data analysis, pattern recognition,
and the interpretation of complex visual information. It is used in fields such as
astronomy, biology, and geology.
1.4 Advancement of computer Vision
The field of computer vision has seen significant advancements over the years, driven by
technological breakthroughs, increased computational power, and advancements in machine
learning. Here are some key areas of advancement in computer vision:
1. Deep Learning and Neural Networks:
The adoption of deep learning, particularly convolutional neural networks
(CNNs), has revolutionized computer vision. Deep learning models have
demonstrated remarkable performance in image classification, object detection,
and segmentation tasks.
2. Object Detection and Recognition:
6
Advancements in object detection techniques, such as Region-Based CNNs (R-
CNN), Faster R-CNN, and You Only Look Once (YOLO), have significantly
improved the accuracy and speed of identifying and locating objects in images
and video.
3. Semantic Segmentation:
Semantic segmentation, which involves classifying each pixel in an image, has
advanced with the introduction of models like U-Net and DeepLab. These models
have applications in medical imaging, autonomous driving, and scene
understanding.
4. Generative Adversarial Networks (GANs):
GANs have been used to generate realistic images, enabling applications like
image-to-image translation and style transfer. GANs have also been employed for
data augmentation in computer vision tasks.
5. Transfer Learning:
Transfer learning, where models pre-trained on large datasets are fine-tuned for
specific tasks, has become a common practice. This approach has facilitated the
development of accurate models with limited labeled data.
6. 3D Computer Vision:
Advancements in 3D computer vision have improved the understanding of three-
dimensional scenes. Techniques like Structure from Motion (SfM) and
simultaneous localization and mapping (SLAM) are used in robotics, augmented
reality, and virtual reality.
7. Attention Mechanisms:
Attention mechanisms, inspired by human visual attention, have been integrated
into models to focus on specific regions of an image. This has improved the
interpretability and performance of computer vision systems.
8. Real-Time Processing:
Improved algorithms and hardware accelerators have enabled real-time processing
of computer vision tasks. This is crucial for applications like augmented reality,
video analytics, and autonomous vehicles.
7
9. Explainability and Interpretability:
Addressing the "black box" nature of deep learning models, researchers have
focused on developing techniques for explaining and interpreting model
decisions. This is essential for building trust in critical applications.
10. Edge Computing in Computer Vision:
The integration of computer vision with edge computing devices has become
more prevalent. This allows for processing visual data closer to the source,
reducing latency and bandwidth requirements.
11. Custom Hardware for Computer Vision:
The development of specialized hardware, such as Graphics Processing Units
(GPUs), Tensor Processing Units (TPUs), and Field-Programmable Gate Arrays
(FPGAs), has accelerated the training and inference of computer vision models.
12. Domain Adaptation:
Techniques for domain adaptation have emerged, allowing models trained on one
dataset to generalize well to different but related datasets. This is particularly
useful for applications where labeled data is scarce.
13. Ethical Considerations and Bias Mitigation:
As computer vision technologies are deployed in real-world scenarios, there is an
increasing focus on addressing ethical concerns, biases in datasets, and fairness in
model predictions.
2.0 Computer Vision Applications
There is a lot of research being done in the computer vision field, but it’s not just research. Real-
world applications demonstrate how important computer vision is to endeavors in business,
entertainment, transportation, healthcare and everyday life. A key driver for the growth of these
applications is the flood of visual information flowing from smartphones, security systems,
traffic cameras and other visually instrumented devices. This data could play a major role in
operations across industries, but today goes unused. The information creates a test bed to train
computer vision applications and a launch pad for them to become part of a range of human
activities:
8
Computer Vision Applications One field of Machine Learning where fundamental ideas are
already included in mainstream products is computer vision. The applications include:
2.1 Self-Driving Cars
With the use of computer vision, autonomous vehicles can understand their environment.
Multiple cameras record the environment surrounding the vehicle, which is then sent into
computer vision algorithms that analyzes the photos in perfect sync to locate road edges,
decipher signposts, and see other vehicles, obstacles, and people. Then, the autonomous vehicle
can navigate streets and highways on its own, swerve around obstructions, and get its passengers
where they need to go safely.
2.2 Facial Recognition
Facial recognition programs, which use computer vision to recognize individuals in
photographs, rely heavily on this field of study. Facial traits in photos are identified by computer
vision algorithms, which then match those aspects to stored face profiles. In order to verify the
identity of the people using consumer electronics, face recognition is increasingly being used.
Facial recognition is used in social networking applications for both user detection and user
tagging. For the same reason, law enforcement uses face recognition software to track down
criminals using surveillance footage.
9
2.3 Augmented & Mixed Reality
Augmented reality, which allows computers like smartphones and wearable technology
to superimpose or embed digital content onto real-world environments, also relies heavily on
computer vision. Virtual items may be placed in the actual environment through computer vision
in augmented reality equipment. In order to properly generate depth and proportions and position
virtual items in the real environment, augmented reality apps rely on computer vision techniques
to recognize surfaces like tabletops, ceilings, and floors.
2.4 Healthcare
10
Computer vision has contributed significantly to the development of health tech.
Automating the process of looking for malignant moles on a person's skin or locating
indicators in an x-ray or MRI scan is only one of the many applications of computer vision
algorithms.
2.5 Pre-processing:
Before a computer vision method can be applied to image data in order to extract some
specific piece of information, it is usually necessary to process the data in order to assure that it
satisfies certain assumptions implied by the method. Examples are
Re-sampling in order to assure that the image coordinate system is correct.
Noise reduction in order to assure that sensor noise does not introduce false information.
o Contrast enhancement to assure that relevant information can be detected.
Scale-space representation to enhance image structures at locally appropriate scales.
11
2. 6 Feature extraction:
Image features at various levels of complexity are extracted from the image data. Typical
examples of such features are:
Lines, edges and ridges. Localized interest points such as corners, blobs or points.
3.0 Computer Vision Examples
Many organizations don’t have the resources to fund computer vision labs and create deep
learning models and neural networks. They may also lack the computing power required to
process huge sets of visual data. Companies such as IBM are helping by offering computer
vision software development services. These services deliver pre-built learning models available
from the cloud — and also ease demand on computing resources. Users connect to the services
through an application programming interface (API) and use them to develop computer vision
applications.
12
Here are a few examples of established computer vision tasks:
Image classification sees an image and can classify it (a dog, an apple, a person’s face).
More precisely, it is able to accurately predict that a given image belongs to a certain
class. For example, a social media company might want to use it to automatically identify
and segregate objectionable images uploaded by users.
Object detection can use image classification to identify a certain class of image and
then detect and tabulate their appearance in an image or video. Examples include
detecting damages on an assembly line or identifying machinery that requires
maintenance.
Object tracking follows or tracks an object once it is detected. This task is often
executed with images captured in sequence or real-time video feeds. Autonomous
vehicles, for example, need to not only classify and detect objects such as pedestrians,
other cars and road infrastructure, they need to track them in motion to avoid collisions
and obey traffic laws.
Content-based image retrieval uses computer vision to browse, search and retrieve
images from large data stores, based on the content of the images rather than metadata
tags associated with them. This task can incorporate automatic image annotation that
replaces manual image tagging. These tasks can be used for digital asset
management systems and can increase the accuracy of search and retrieval.
4.0 Computer Vision Algorithms
Computer vision algorithms include the different methods used to understand the objects in
digital images and extract high-dimensional data from the real world to produce numerical or
symbolic information. There are many other computer vision algorithms involved in recognizing
things in photographs. Some common ones are:
Object Classification - What is the main category of the object present in this
photograph?
Object Identification - What is the type of object present in this photograph?
Object Detection - Where is the object in the photograph?
Object Segmentation - What pixels belong to the object in the image?
13
Object Verification - Is the object in the photograph?
Object Recognition - What are the objects present in this photograph and where are they
located?
Object Landmark Detection - What are the key points for the object in this photograph?
5.0 Computer Vision Benefits
Computer vision can automate several tasks without the need for human intervention. As a result,
it provides organizations with a number of benefits:
Faster and simpler process - Computer vision systems can carry out repetitive and
monotonous tasks at a faster rate, which simplifies the work for humans.
Better products and services - Computer vision systems that have been trained very well
will commit zero mistakes. This will result in faster delivery of high-quality products and
services.
Cost-reduction - Companies do not have to spend money on fixing their flawed processes
because computer vision will leave no room for faulty products and services.
5.1 Computer Vision Disadvantages
There is no technology that is free from flaws, which is true for computer vision systems. Here
are a few limitations of computer vision:
Lack of specialists - Companies need to have a team of highly trained professionals with
deep knowledge of the differences between AI vs. Machine Learning vs. Deep
Learning technologies to train computer vision systems. There is a need for more
specialists that can help shape this future of technology.
Need for regular monitoring - If a computer vision system faces a technical glitch or
breaks down, this can cause immense loss to companies. Hence, companies need to have
a dedicated team on board to monitor and evaluate these systems.
6.0 Challenges of Computer Vision
14
Creating a machine with human-level vision is surprisingly challenging, and not only because of
the technical challenges involved in doing so with computers. We still have a lot to learn about
the nature of human vision.
To fully grasp biological vision, one must learn not just how various receptors like the eye work,
but also how the brain processes what it sees. The process has been mapped out, and its tricks
and shortcuts have been discovered, but, as with any study of the brain, there is still a
considerable distance to cover.
15
7.0 CONCLUSION
The journey through the history of computer vision, from its early days to
the current era, highlights the pioneering efforts of researchers and the pivotal role
of technological breakthroughs. The advent of deep learning, neural networks, and
sophisticated algorithms has propelled computer vision to unprecedented levels of
accuracy and efficiency. The seminar underscored the wide-ranging applications of
computer vision, ranging from image classification and object detection to more
complex tasks such as semantic segmentation and 3D scene understanding. We
witnessed how computer vision is not merely a technological tool but a catalyst for
innovation, impacting fields as diverse as healthcare, manufacturing, autonomous
vehicles, and entertainment. The discussion on ethical considerations and bias
mitigation emphasized the need for responsible deployment of computer vision
technologies. As we embrace the power of these systems, it is crucial to address
concerns related to privacy, fairness, and the societal impact of automated visual
analysis. Looking ahead, the future of computer vision appears promising and
dynamic. Continued research and development will likely lead to even more
sophisticated models, improved real-time processing capabilities, and enhanced
interpretability. As we anticipate further advancements, it is essential to remain
vigilant about ethical implications and actively work towards creating inclusive,
unbiased, and transparent computer vision systems.
16
REFERENCE
1. Mahmoud Hassaballah (Link resides outside ibm.com)
2 S. Chandra, G. Sharma, S. Malhotra, D. Jha, and A. P. Mittal. (Dec 2015) Applications and
their uses. pages 1–5.
A. I. Comport, E. Marchand, M. Pressigout, and F. Chaumette. (July 2006) Realtime markerless
tracking for augmented reality: the virtual visual servoing framework. IEEE Transactions on
Visualization and Computer Graphics, 12(4):615–628, ISSN 1077-2626. doi:
10.1109/TVCG.2006.78.
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. (2012)
VOC/voc2012/workshop
Martin A Fischler and Robert C Bolles (1981). Communications of the ACM, 24(6):381–395,
A Hampapur, L. Brown, J. Connell, A. Ekin, N. Haas, M. Lu, H. Merkl, and S. Pankanti. (March
2005) IEEE Signal Processing Magazine.
Anders Heyden and Marc Pollefeys. (2005) Multiple view geometry. The Journal of physiology.
17