0% found this document useful (0 votes)
31 views50 pages

Pattern Recognition

The document discusses pattern recognition, which is the classification of data into categories based on knowledge gained from patterns or statistical information, and it provides examples of applications of pattern recognition such as machine vision, speech recognition, character recognition, computer-aided diagnosis, and content-based image and video retrieval.

Uploaded by

Ram Chenna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views50 pages

Pattern Recognition

The document discusses pattern recognition, which is the classification of data into categories based on knowledge gained from patterns or statistical information, and it provides examples of applications of pattern recognition such as machine vision, speech recognition, character recognition, computer-aided diagnosis, and content-based image and video retrieval.

Uploaded by

Ram Chenna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 50

Pattern Recognition

• we recognize a face
• understand spoken words
• read handwritten characters
• identify our car keys in our pocket by feel,
• decide whether an apple is ripe by its smell
• ……
• Behind all these (complex) processes underlie the acts of PR
• the act of taking in raw data and taking an action based on the “category” of the
pattern
• this is crucial for our survival since ages
• Now, we have evolved with highly sophisticated neural and cognitive systems for
such tasks.
Pattern Recognition (PR)
• Definition
• Classification of data, based on the knowledge already gained or on statistical
information extracted from patterns or their representations

• Scientific discipline whose goal is the classification of objects into a number of


categories or classes known a priori
• Objects/Patterns
• Images, text, signal waveforms, video ….
• any type of measurements that need to be classified

• PR deals with classification and clustering


History

• Before 1960s it was the output of theoretical research in the area of


statistics

• Advent of computers increased the demand for practical applications of PR


• Has set new demands for further theoretical developments

• Applications
• need for information handling and retrieval are increasing
• PR has become an integral part of most machine intelligent systems built for decision
making
Applications
• Machine vision - Automated visual inspection
• A machine vision system captures images via a camera and analyzes them to produce
descriptions of what is imaged
• images have to be analyzed online, and a pattern recognition system has to classify
the objects into the “defect” or “nondefect” class

• Speech recognition – STT, ASR


• Speech is the most natural means by which humans communicate and exchange
information.
• Potential applications of such machines are numerous.
• Alexa, Siri ….
Applications - Character recognition -
Optical character recognition (OCR)
• Systems are already commercially available
• An OCR system has a “front-end” device consisting of a light source, a scan lens, a
document transport, and a detector
• At the output of the light-sensitive detector, light-intensity variation is translated into
“numbers” and an image array is formed.
• A series of image processing techniques are applied leading to line and character
segmentation.
• PR algorithms will recognize the characters
• Classifies each character to the correct “letter, number, punctuation” class.
Applications - Character recognition -
Optical character recognition (OCR)
• Printed character recognition systems
• Handwritten character recognition systems
• machine reading of bank checks
• The machine must be able to recognize the amounts in figures and digits and match
them.
• Automatic mail-sorting machines for postal code identification in post offices.
• Online handwriting recognition systems
• will accompany pen computers, with which the entry of data will be done not via the
keyboard but by writing.
• This complies with today’s tendency to develop machines and computers with
interfaces acquiring human-like skills
Applications - Computer-aided diagnosis
• A tool that aims at assisting doctors in making diagnostic decisions.
• has been applied to and is of interest for a variety of medical data
• X-rays,computed tomographic images, ultrasound images, electrocardiograms (ECGs),
and electroencephalograms (EEGs).
• Why Computer-aided diagnosis ?
• medical data are often not easily interpretable, and the interpretation can depend
very much on the skill of the doctor.
Applications - Data mining and knowledge
discovery in databases (KDD)
• Data mining has a wide range of applications
• medicine and biology, market and financial analysis, business management, science
exploration, image and music retrieval.
• In the age of information and knowledge society there is an ever increasing
demand for retrieving information and turning it into knowledge.
• information exists in huge amounts of data in various forms including, text, images,
audio and video, stored in different places distributed all over the world.
• The traditional way of searching information in databases was the description-based
model where object retrieval was based on keyword description and subsequent
word matching.
Applications - CBIR
• information is sought based on “similarity” between an object, which is
presented into the system, and objects stored in sites all over the world.
• For the image given as input to the system, the system returns “similar” images based
on a measured “signature,”
• for example, information related to color, texture and shape.

• Music content-based retrieval system


• an example (i.e., an extract from a music piece), is presented to a system
and the system returns “similar” music pieces.
• similarity is based on certain (automatically) measured cues that characterize a music
piece,
• music meter, the music tempo, and the location of certain repeated patterns.
Applications
• Content-based video information retrieval
• find all video scenes in a digital library showing person “X” laughing

• To facilitate human–machine interaction and further enhance the role of


computers in office automation, automatic personalization of environments
• fingerprint identification
• signature authentication
• text retrieval
• face and gesture recognition

• To achieve the final goals in all of these applications


• PR is closely linked with other scientific disciplines
• linguistics, computer graphics, machine vision, database design. ….
Advantages of developing PR applications
• In solving the myriad problems required to build PR systems
• we gain deeper understanding and appreciation for PR systems in the natural world
• most particularly in humans

• For example,
• speech and visual recognition
• PR systems may be influenced by the knowledge of how these are solved in nature
• both in the algorithms we employ and the design of special purpose hardware

• PR overlaps with machine learning, artificial intelligence and data mining


Supervised learning/Classification – learning from examples
• Assigns an appropriate class/category label to a pattern based on an abstraction that
is generated using a set of training patterns or domain knowledge
• Data is available as (sample, label) pair
• Training data of known class labels
• a priori information

• Types of Classification
• Based on the number of classes
• Binary v/s Multi-class classifiers
• Based on the number of labels assigned to a sample
• Single label v/s Multi label
Unsupervised learning/Clustering – learning from
observations
• Generates a partition of the data which helps in decision making
• For example, classification
• Training data of known class labels, are not available
• We are given a set of feature vectors and the goal is to unravel the
underlying similarities and cluster similar vectors together
• Major issues
• Defining similarity between two vectors and Choosing an appropriate measure for
similarity
• Choosing an algorithm that will group the vectors based on the choosen similarity
measure
• Applications
• Speech coding
• Image segmentation
Semisupervised learning
• Data - a set of patterns of unlabeled data and labeled data
• For Supervised learning
• Limited/less number of labeled data
• recovering additional information from the unlabeled samples, related to the general structure of
the data at hand, can be useful in improving the system design.
• For Clustering tasks
• labeled data are used as constraints in the form of must-links and cannot-links
• clustering task is constrained to assign certain points in the same cluster or to exclude certain
points of being assigned in the same cluster.
• provides an a priori knowledge that the clustering algorithm has to respect.
Fish packing unit
• Wants to automate the process of sorting incoming fish on a conveyor
belt according to species
• Let us try to separate sea bass from salmon using optical sensing
• Set up a camera, take some sample images
• There may be noise or variations in the images - preprocessing
• variations in lighting, position of the fish on the conveyor, even “static”
due to the electronics of the camera itself.
• note some physical differences between the two types of fish
• length, lightness, width, number and shape of fins, position of the mouth,
and so on
• Features
• The camera captures an image of the fish
• The camera’s pre- signals are preprocessed to
simplify subsequent operations without loosing
relevant processing information
• segmentation operation in which the images of different
fish are somehow isolated from one another and from the
background
• The information from a single fish is then sent to a
feature extractor
• Measure certain “features” or “properties”
• length, lightness, width, number and shape of fins,
position of the mouth
• These features are then passed to a classifier for
evaluation
Features, Feature Vectors and Classifiers
• Medical Image Classification
• Database of images classified as Benign and Malignant
• Binary classification
• Two images, each having a distinct region inside it
• Two regions are also themselves visually different

Benign lesion Malignant one (cancer)


Features, Feature Vectors and Classifiers
• Identify the measurable quantities that make these two regions
distinct from each other
Plot of the mean value of the
intensity in each region of
interest versus the corresponding
standard deviation around this
mean for a number of different
images originating from class A
(o) and class B (+).
A straight line separates the two
classes.

• Given a new image with a region in it - to which class it belongs


• measure the mean intensity and standard deviation in the region of interest
• plot the corresponding point (*)
• the unknown pattern is more likely to belong to class A than class B.
Features, Feature Vectors and Classifiers
• The preceding artificial classification task has outlined the rationale behind a
large class of PR problems.
• The measurements used for the classification
• the mean value and the standard deviation – features
• In general, n features xi, i = 1, 2, ... , n,
• X = x = [x1, x2, ... , xn]T where T denotes transposition – feature vector
• Each of the feature vectors identifies uniquely a single pattern (object)
• This is natural, as the measurements resulting from different patterns exhibit a
random variation
• Partly due to the measurement noise of the measuring devices and
• partly to the distinct characteristics of each pattern.
• For example, in X-ray imaging large variations are expected because of the
differences in physiology among individuals.
Features, Feature Vectors and Classifiers
• Classifier
• Straight line - decision line/ classifier
• Divide the feature space into regions that correspond to either class A or class B
• If a feature vector x, corresponding to an unknown pattern, falls in the class A region, it is classified as class A,
otherwise as class B.
• If it is not correct - misclassification
• Training patterns /Training set / Train set /Train data
• The patterns /feature vectors whose true class is known
• used for the design of the classifier
• Test patterns/ Test set/ Test data
• The patterns /feature vectors whose true class is not known
• Unlabeled sample
• used for the evaluation of the classifier
Basic questions arising in a classification task
• How are the features generated?
• the mean and the standard deviation features
• It is problem dependent
• Feature Extraction – extracts the features

• How many features ? What is the best value of ‘n’ ?


• Feature selection stage
• In practice, a larger than necessary number of feature candidates are generated, and
then the “best” of them is adopted.
• X = [x1, x2, ... , xn]T, which the best features among these n features
• Strategies to select the features
Basic questions arising in a classification task
• How to design a classifier?
• In practice, the straight line or hyperplane should be drawn in the n-dimensional space, with
respect to an optimality criterion
• Linear classifier
• In general, it should be the surfaces dividing the space in the various class regions
• Nonlinear problems
• What type of nonlinearity must one adopt?
• What type of optimizing criterion must be used in order to locate a surface in the right
place in the n-dimensional feature space?

• How to assess the performance of the designed classifier?


• what is the classification error rate?
• System evaluation stage
Basic stages involved in the design of a classification system

• Stages are not independent


• they are interrelated and depending on the results, one may go back to
redesign earlier stages in order to improve the overall performance.
• There are some methods that combine stages
• For example, the feature selection and the classifier design stage, in a
common optimization task.
Different Paradigms for Pattern Recognition

• Statistical PR
• More popular and has received majority of attention in literature
• Most practical problems deal with noisy data and uncertainty
• Statistics and Probability are good tools to deal with such problems

• Syntactical PR
• Formal language theory provides the background
• Linguistic tools are not ideally suited to deal with noisy environments
Course Coverage
• Features

• Feature Selection

• Classifiers
Patterns
• A physical object or an abstract notion
• Patterns are represented by a set of descriptions/distinguishing features
• Attributes - A pattern is the representation of an object by the values taken by the attributes

• Classification Problem
• <Object, class/category/label>
• We have a set of objects for which the values of the attributes are known
• We have a set of predefined classes and the objects belongs to one of these classes
• Single label classification
• Classification
• Given a new pattern, the class of the pattern has to be determined

• The representation of patterns and the choice of attributes is very important


• A good representation is one which makes use of discriminating attributes
• Reduces the computational burden
Pattern Representation
• Patterns as feature vectors/vector space/ feature space
• Probability density/ distribution of points in multi-dimensional space
• X = [x1, x2, ... , xn]T
• For example, if a pattern has 3 features:
• <0.2, 0.1, 0.5>, <0.3, 0.12, 0.43>…….

• Patterns as Strings
• A string may be viewed as a sentence in a language
• For example, a DNA sequence, a protein sequence
• GACTTCAGG…..
• News articles, social media text, email content, plain text ……
Pattern Representation
• Patterns as Logical Descriptors
• Patterns can be represented as a logical description of the form
• x1 ^ x2, where x1 and x2 are the attributes of the pattern x1 = (a1, a2, a3…an) and
x2 =(b1, b2, b3,…bn)
• For example: (color = red V white) ^ (make = leather) ^ (shape =sphere)

• Patterns as
• Fuzzy and Rough sets
• Trees and Graphs
• Minimum spanning tree, binary tree
• Frequent pattern trees
Creating Features
• “Good” features are the key to accurate generalization

• Domain knowledge can be used to generate a feature set


• Medical Example: results of blood tests, age, smoking history
• Game Playing example: number of pieces on the board, control of the center of the
board

• Data might not be in vector form


• Example: spam classification
• “Bag of words”: throw out order, keep count of how many times each word appears.
• Sequence: one feature for first letter in the email, one for second letter, etc.
• Ngrams: one feature for every unique string of n features
Feature Selection
• In many applications, we often encounter a very large number of potential features that can be used
• Which subset of features should be used for the best classification?
• Need for a small number of discriminative features
• To avoid “curse of dimensionality” or reduce dimensionality
• To reduce feature measurement cost
• To reduce computational burden
• To visualize the data for model selection
• To improve performance (in terms of speed, predictive power, simplicity of the model)
• To remove noise

• Given an mxn pattern matrix (m patterns in n-dimensional feature space), generate an mxk pattern
matrix, where n << k

• Feature Selection is a process that chooses an optimal subset of features


according to a certain criterion.
Feature Selection
• Reasons for performing FS may include:
• removing irrelevant data
• increasing predictive accuracy of learned models
• reducing the cost of the data
• improving learning efficiency, such as reducing storage requirements and
computational cost
• reducing the complexity of the resulting model description, improving the
understanding of the data and the model
Reasons for Feature Selection
• Want to find which features are relevant
• Domain specialist not sure which factors are predictive of disease
• Common practice: throw in every feature you can think of, let feature selection get rid of
useless ones

• Want to maximize accuracy, by removing irrelevant and noisy features


• For Spam, create a feature for each of ~105 English words
• Training with all features computationally expensive
• Irrelevant features hurt generalization

• Features have associated costs, want to optimize accuracy with least expensive features
• Embedded systems with limited resources
• Voice recognition on a cell phone
Done …..
Feature Selection vs. Extraction
• Both are collectively known as dimensionality reduction
• Selection: choose a best subset of size k from the available n features
• Extraction: given n features (set Y), extract k new features (set X) by linear or non-
linear combination of all the n features
• Linear feature extraction: X = TY, where T is a mxn matrix
• Non-linear feature extraction: X = f(Y)
• New features by extraction may not have physical interpretation/meaning
• Examples of linear feature extraction
• Unsupervised
• PCA – Principal Component Analysis
• Supervised
• LDA - Linear Discriminant Analysis
• MDA - Multiple Discriminant Analysis
• Criteria for selection/extraction: either improve or maintain the classification
accuracy, simplify classifier complexity
Feature Selection
• How to find the best subset of size m?
• Recall, best means the classifier based on these n features has the lowest probability of
error of all such classifiers
• Simplest approach is to do an exhaustive search; computationally prohibitive
• For n=24 and k=12, there are about 2.7 million possible feature subsets!
• To guarantee the best subset of size k from the available set of size n, one must examine all possible
subsets of size k
• C(n, k)= n!/[k!(n-k)!]
• Heuristics have been used to avoid exhaustive search
• How to evaluate the subsets?
• Error rate; but then which classifier should be used?
• Distance measure; Mahalanobis, City Block, cosine, Euclidean, Earth Mover’s Distance ….
• Feature selection is an optimization problem
Feature Selection Methods

• Univariate method: considers one variable (feature) at a time

• Multivariate method: considers subsets of variables (features) together

• Filter method: ranks features or feature subsets independently of the predictor


(classifier)

• Wrapper method: uses a classifier to assess features or feature subsets

• Embedded method: embed the feature selection process in the learning or the
model building phase.
Introduction to Machine Learning and Data Mining, Carla Brodley
Filter Method
Feature Model Building/
All features Filter Classifier/
subset
Predictor

• A choice of feature selection methods depends on the nature of:


• the variables and the target (binary, categorical, continuous)
• the problem (dependencies between variables, linear/non-linear relationships
between variables and target)
• the available data (number of examples and number of variables, noise in data)
• Regressions predictive modeling problem
• Numerical /Categorical Input, Numerical Output
• The most common techniques are correlation based
• Pearson’s correlation coefficient (linear).
• Spearman’s rank coefficient (nonlinear)

• Classification predictive modeling problem


• Categorical/Numerical Input, Categorical Output
• Most common example of a classification problem
• The most common techniques are correlation based
• ANOVA correlation coefficient (linear).
• Kendall’s rank coefficient (nonlinear).
• Kendall does assume that the categorical variable is ordinal.

• categorical input variables.


• The most common correlation measure for categorical data
• chi-squared test.
• mutual information (information gain) from the field of information theory.
• mutual information is a powerful method that may prove useful for both categorical and numerical data
• e.g. it is agnostic to the data types.
Feature Ranking Techniques:
• return the relevance of the features
• output  a ranked list of features which are ordered according to the
evaluation measure
• For performing actual FS, the simplest way is to choose the top k
features for the task at hand
Filter Methods

• Filter methods are usually faster


• Measuring uncertainty, distances, dependence or consistency is usually cheaper
than measuring the accuracy of a learning process.

• Does not rely on a particular learning algorithm


• The selected features can be used to learn different models from different learning
techniques.

• Can handle larger sized data, due to the simplicity and low time
complexity of the evaluation measures

• Doesn’t take into account correlations between features, just correlation of


each feature to the class label
Wrapper Methods

Feature Classifier/
All features Wrapper
subset Predictor

Feature Classifier Feature


All Features Supervised Selected
Subset Evaluation
Search Learning Features
Algorithm Criterion

Criterion Value
Embedded/Intrinsic methods

Embedded Classifier/
All features Feature
method subset Predictor
Wrappers for feature selection

Kohavi-John, 1997

N features, 2N possible feature subsets!


Feature Selection
• How to find the best subset of size m?
• Recall, best means classifier based on these n features has the lowest probability of
error of all such classifiers
• Simplest approach is to do an exhaustive search; computationally prohibitive
• For n=24 and k=12, there are about 2.7 million possible feature subsets!
• To guarantee the best subset of size k from the available set of size n, one must examine all possible
subsets of size k
• C(n, k)= n!/[k!(n-k)!]
• Heuristics have been used to avoid exhaustive search
• How to evaluate the subsets?
• Error rate; but then which classifier should be used?
• Distance measure; Mahalanobis, divergence,…
• Feature selection is an optimization problem
Feature Selection

1. Searching for the best subset of features.

2. Criteria for evaluating different subsets.

3. Principle for selecting, adding, removing or changing new features


during the search.
• https://
www.izen.ai/blog-posts/feature-selection-filter-method-wrapper-met
hod-and-embedded-method
• https://www.analyticsvidhya.com/blog/2016/12/introduction-to-feat
ure-selection-methods-with-an-example-or-how-to-select-the-right-va
riables
/
• https://machinelearningmastery.com/feature-selection-with-real-
and-categorical-data/

You might also like