Pattern Recognition
Pattern Recognition
• we recognize a face
• understand spoken words
• read handwritten characters
• identify our car keys in our pocket by feel,
• decide whether an apple is ripe by its smell
• ……
• Behind all these (complex) processes underlie the acts of PR
• the act of taking in raw data and taking an action based on the “category” of the
pattern
• this is crucial for our survival since ages
• Now, we have evolved with highly sophisticated neural and cognitive systems for
such tasks.
Pattern Recognition (PR)
• Definition
• Classification of data, based on the knowledge already gained or on statistical
information extracted from patterns or their representations
• Applications
• need for information handling and retrieval are increasing
• PR has become an integral part of most machine intelligent systems built for decision
making
Applications
• Machine vision - Automated visual inspection
• A machine vision system captures images via a camera and analyzes them to produce
descriptions of what is imaged
• images have to be analyzed online, and a pattern recognition system has to classify
the objects into the “defect” or “nondefect” class
• For example,
• speech and visual recognition
• PR systems may be influenced by the knowledge of how these are solved in nature
• both in the algorithms we employ and the design of special purpose hardware
• Types of Classification
• Based on the number of classes
• Binary v/s Multi-class classifiers
• Based on the number of labels assigned to a sample
• Single label v/s Multi label
Unsupervised learning/Clustering – learning from
observations
• Generates a partition of the data which helps in decision making
• For example, classification
• Training data of known class labels, are not available
• We are given a set of feature vectors and the goal is to unravel the
underlying similarities and cluster similar vectors together
• Major issues
• Defining similarity between two vectors and Choosing an appropriate measure for
similarity
• Choosing an algorithm that will group the vectors based on the choosen similarity
measure
• Applications
• Speech coding
• Image segmentation
Semisupervised learning
• Data - a set of patterns of unlabeled data and labeled data
• For Supervised learning
• Limited/less number of labeled data
• recovering additional information from the unlabeled samples, related to the general structure of
the data at hand, can be useful in improving the system design.
• For Clustering tasks
• labeled data are used as constraints in the form of must-links and cannot-links
• clustering task is constrained to assign certain points in the same cluster or to exclude certain
points of being assigned in the same cluster.
• provides an a priori knowledge that the clustering algorithm has to respect.
Fish packing unit
• Wants to automate the process of sorting incoming fish on a conveyor
belt according to species
• Let us try to separate sea bass from salmon using optical sensing
• Set up a camera, take some sample images
• There may be noise or variations in the images - preprocessing
• variations in lighting, position of the fish on the conveyor, even “static”
due to the electronics of the camera itself.
• note some physical differences between the two types of fish
• length, lightness, width, number and shape of fins, position of the mouth,
and so on
• Features
• The camera captures an image of the fish
• The camera’s pre- signals are preprocessed to
simplify subsequent operations without loosing
relevant processing information
• segmentation operation in which the images of different
fish are somehow isolated from one another and from the
background
• The information from a single fish is then sent to a
feature extractor
• Measure certain “features” or “properties”
• length, lightness, width, number and shape of fins,
position of the mouth
• These features are then passed to a classifier for
evaluation
Features, Feature Vectors and Classifiers
• Medical Image Classification
• Database of images classified as Benign and Malignant
• Binary classification
• Two images, each having a distinct region inside it
• Two regions are also themselves visually different
• Statistical PR
• More popular and has received majority of attention in literature
• Most practical problems deal with noisy data and uncertainty
• Statistics and Probability are good tools to deal with such problems
• Syntactical PR
• Formal language theory provides the background
• Linguistic tools are not ideally suited to deal with noisy environments
Course Coverage
• Features
• Feature Selection
• Classifiers
Patterns
• A physical object or an abstract notion
• Patterns are represented by a set of descriptions/distinguishing features
• Attributes - A pattern is the representation of an object by the values taken by the attributes
• Classification Problem
• <Object, class/category/label>
• We have a set of objects for which the values of the attributes are known
• We have a set of predefined classes and the objects belongs to one of these classes
• Single label classification
• Classification
• Given a new pattern, the class of the pattern has to be determined
• Patterns as Strings
• A string may be viewed as a sentence in a language
• For example, a DNA sequence, a protein sequence
• GACTTCAGG…..
• News articles, social media text, email content, plain text ……
Pattern Representation
• Patterns as Logical Descriptors
• Patterns can be represented as a logical description of the form
• x1 ^ x2, where x1 and x2 are the attributes of the pattern x1 = (a1, a2, a3…an) and
x2 =(b1, b2, b3,…bn)
• For example: (color = red V white) ^ (make = leather) ^ (shape =sphere)
• Patterns as
• Fuzzy and Rough sets
• Trees and Graphs
• Minimum spanning tree, binary tree
• Frequent pattern trees
Creating Features
• “Good” features are the key to accurate generalization
• Given an mxn pattern matrix (m patterns in n-dimensional feature space), generate an mxk pattern
matrix, where n << k
• Features have associated costs, want to optimize accuracy with least expensive features
• Embedded systems with limited resources
• Voice recognition on a cell phone
Done …..
Feature Selection vs. Extraction
• Both are collectively known as dimensionality reduction
• Selection: choose a best subset of size k from the available n features
• Extraction: given n features (set Y), extract k new features (set X) by linear or non-
linear combination of all the n features
• Linear feature extraction: X = TY, where T is a mxn matrix
• Non-linear feature extraction: X = f(Y)
• New features by extraction may not have physical interpretation/meaning
• Examples of linear feature extraction
• Unsupervised
• PCA – Principal Component Analysis
• Supervised
• LDA - Linear Discriminant Analysis
• MDA - Multiple Discriminant Analysis
• Criteria for selection/extraction: either improve or maintain the classification
accuracy, simplify classifier complexity
Feature Selection
• How to find the best subset of size m?
• Recall, best means the classifier based on these n features has the lowest probability of
error of all such classifiers
• Simplest approach is to do an exhaustive search; computationally prohibitive
• For n=24 and k=12, there are about 2.7 million possible feature subsets!
• To guarantee the best subset of size k from the available set of size n, one must examine all possible
subsets of size k
• C(n, k)= n!/[k!(n-k)!]
• Heuristics have been used to avoid exhaustive search
• How to evaluate the subsets?
• Error rate; but then which classifier should be used?
• Distance measure; Mahalanobis, City Block, cosine, Euclidean, Earth Mover’s Distance ….
• Feature selection is an optimization problem
Feature Selection Methods
• Embedded method: embed the feature selection process in the learning or the
model building phase.
Introduction to Machine Learning and Data Mining, Carla Brodley
Filter Method
Feature Model Building/
All features Filter Classifier/
subset
Predictor
• Can handle larger sized data, due to the simplicity and low time
complexity of the evaluation measures
Feature Classifier/
All features Wrapper
subset Predictor
Criterion Value
Embedded/Intrinsic methods
Embedded Classifier/
All features Feature
method subset Predictor
Wrappers for feature selection
Kohavi-John, 1997