0% found this document useful (0 votes)
2 views

10.Classification2022

Uploaded by

23310020
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

10.Classification2022

Uploaded by

23310020
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Classification and Prediction

Classification and Prediction

Two forms of data analysis

- extract models describing important data classes or to predict future data trends

⮚ Such analysis can help us with a better understanding of the data at large

⮚ Classification predicts - categorical (discrete, unordered) labels,

⮚ Prediction models - continuous valued functions

⮚ Classification and Prediction have numerous applications

including fraud detection, target marketing, performance prediction, manufacturing,

and medical diagnosis


Classification and Prediction

Classification: task of assigning objects to one of several predefined categories


Eg. Spam mails, MRI scans,
Galaxies

Input Output
Classification
Attribute set → Model → Class labels
X Y
Classification as the task of mapping an input attribute x into its class y.

Input data for classification-collection of records/ instance


characterized by tuple (x, y)
x is the attribute set
y is a special attribute - class label/ category/ target attribute
Sample Data Set - Classifying vertebrates

⮚Attributes set are mostly discrete, can also contain continuous features
The class label, on the other hand, must be a discrete attribute

⮚This is a key characteristic distinguishes classification from regression, a


predictive modeling task in which y is a continuous attribute

Name Body Temp Skin cover Gives birth Aquatic creature Aerial Has legs Hibernates Class
creature
human warm-blooded hair yes no no yes no mammal
python cold-blooded scales no no no no yes reptile
salmon cold-blooded scales no yes no no no fish
whale warm-blooded hair yes yes no no no mammal
frog cold-blooded none no semi no yes yes amphibian
komodo cold-blooded scales no no no yes no reptile
dragon warm-blooded hair yes no yes yes yes mammal
bat warm-blooded feather no no yes yes no bird
pigeon warm-blooded fur yes no no yes no mammal
cat cold-blooded scales yes yes no no no fish
leopard cold-blooded scales no semi no yes no reptile
shark warm-blooded feather no semi no yes no bird
turtle warm-blooded quills yes no no yes yes mammal
penguin cold-blooded scales no yes no no no fish
porcupine cold-blooded none no semi no yes yes amphibian
Classification vs. Prediction

⮚ Assumption
After data preparation, in a data set where, each record has attributes
X1,…,Xn, and Y

⮚ Goal
Learn a function f:(X1,…,Xn) → Y,
then use function f to predict y for a given input record (x1,…,xn)

⮚ Classification: Y is a discrete attribute, called the class label


Usually a categorical attribute with small domain

⮚ Prediction: Y is a continuous attribute

Called supervised learning, because true labels (Y- values) are known for
initially provided data
What is classification ?

⮚ Classification – task of learning a target function “f ” that maps each attribute


set x to one of the predefined class label y

⮚ The target function informally called as Classification Model

Classification model used for

⮚ Descriptive modeling

⮚Predictive Modeling
Descriptive Modeling

⮚ Classification Model serve an explanatory tool to distinguish between objects


of different classes. The descriptive model summarizes the data given below

Vertebrate Data Set


Name Body Temp Skin Gives Aquatic Aerial Has Hiberna Class
cover birth creature creature legs tes
human warm-blooded hair yes no no yes no mammal
python cold-blooded scales no no no no yes reptile
salmon cold-blooded scales no yes no no no fish
whale warm-blooded hair yes yes no no no mammal
frog cold-blooded none no semi no yes yes amphibian
komodo cold-blooded scales no no no yes no reptile
dragon warm-blooded hair yes no yes yes yes mammal
bat warm-blooded feather no no yes yes no bird
pigeon warm-blooded fur yes no no yes no mammal
cat cold-blooded scales yes yes no no no fish
leopard cold-blooded scales no semi no yes no reptile
shark warm-blooded feather no semi no yes no bird
turtle warm-blooded quills yes no no yes yes mammal
penguin cold-blooded scales no yes no no no fish
porcupine cold-blooded none no semi no yes yes amphibian
Predictive Modelling

⮚ A classification model also be used to predict the class label of unknown records

⮚As a black box that automatically assigns a class label when presented
with the attribute set of an unknown record

following characteristics of a creature - gila monster:

Name Body Temp Skin Gives Aquatic Aerial Has Hibern Class
cover birth creature creature legs ates

Gila Monster Cold blooded scales no no no yes yes ?


How Does Classification Work

Data classification - two step process

1. A classifier is built describing a predetermined set of data


classes/concepts- Learning step or Training Phase.

A classification algorithm builds the classifier by analyzing or


learning from a training set (DB tuples and associated class labels)

The individual tuples making up training set are training tuples and
selected from the database under analysis (also known as supervised learning)
How Does Classification Work

The tuple X is represented by n -dimensional attribute Vector


X=[x1,x2,x3…...xn]

X-assumed to belong to predefined class as determined by another


attribute class label attribute.

Data tupels are referred as samples, examples, instances, data


points
or object.
This step is learning of a mapping or function,
y = f(x), predict associated class label y of a given tuple x
How does classification work cont…

 This step is learning of a mapping or function,


y = f(x), predict associated class label y of a given tuple x

 Mapping is represented in the form of classification rules,


decision trees or mathematical formula

 Example of classification rules that identify loan applications as


being either safe or risky

⮚ Rules used to categorize future data tuples, as well as provide


deeper insight into the database contents
Examples - Classification/ Prediction

⮚ Build a classification model to predict the expenditures to categorize bank


loan applications as either safe or risky

⮚ A bank - analyze the loan data in order to learn which loan applicants are
safe/ risky

⮚ The Prediction model to predict the expenditure in dollars of potential


customers on computer equipment by analyzing the income and occupation
What is Classification
In two examples,

The officer analyze the data to learn which loan applicants are “safe” or “risky” for the
bank.

A medical researcher analyze breast cancer data to predict which one of the three
specific treatments a patient should receive.

Here the data analysis task required is classification, Where a model or classifier
is constructed to predict categorical labels, such as

“safe” or “risky” for loan application data


“Treatment A” or “treatment B” or “treatment C” for the medical data.

These categories represented by discrete values no ordering among values has no


meaning.

eg: 1, 2,3 may be used to represent treatments A/ B/C where there is no ordering
among this group
What is Prediction

Marketing Manager -how much a given customer will spend during sale,
then the data analysis is numeric prediction.

The model constructed predicts a continuous-valued function, or ordered


value as opposed to categorical label. The model is Predictor.

Hence classification and numeric prediction - two major types of prediction


problems

Numeric prediction is simply called Prediction.


Data Classification Process

Learning: Training data are analyzed by a classification algorithm


class label attribute is loan decision
learned model or classifier is represented in the form of classification rules

Induction: Model Construction


Classification Accuracy

2. In the second step, the model is used for classification

 First the predictive accuracy of the classifier is estimated

 The training set is used to measure the accuracy

 Therefore, a test set used is made up of test tuples and associated class labels

 These tuples are randomly selected from the general data set, independent of
the training tuples, meaning that they are not used to construct the classifier

 The accuracy of a classifier on a given test set is the percentage of test tuples
that are correctly classified by the classifier
Classification
⮚Test data are used to estimate the accuracy of the classification rules

⮚If accuracy is acceptable, rules can be applied to the classification of new data

Deduction: Using the Model


⮚ The associated class label of each test tuple is compared with
the learned classifiers class prediction for that tuple

⮚ If the accuracy of the classifier is considered acceptable, the


classifier can be used to classify future data tuples for which
the class label is not known

⮚ For ex, the classification rules obtained in the first stage is


used to approve or reject new or future loan applications
How is prediction different from classification cont…
 Data prediction-a two step process, similar to data classification

 For prediction, the class label attribute is lost,


b’s the attribute for which values are being predicted
is continuous-valued(ordered) rather than categorical (unordered)

 The attribute is simply referred as predicted attribute

 Prediction can also be viewed as a mapping or function,


y = f(X), where X-input, and the output y is a continuous or
ordered value

 Prediction and classification also differ in the methods


that are used to build their respective models
How is prediction different from classification cont…

 As with classification, the training set used to build a predictor


should not be used to asses its accuracy

 An independent test set should be used instead

 The accuracy of a predictor is estimated by computing an error


based on the difference between the predicted value and the
actual known value of y each of the test tuples, X

You might also like