10.Classification2022
10.Classification2022
- extract models describing important data classes or to predict future data trends
⮚ Such analysis can help us with a better understanding of the data at large
Input Output
Classification
Attribute set → Model → Class labels
X Y
Classification as the task of mapping an input attribute x into its class y.
⮚Attributes set are mostly discrete, can also contain continuous features
The class label, on the other hand, must be a discrete attribute
Name Body Temp Skin cover Gives birth Aquatic creature Aerial Has legs Hibernates Class
creature
human warm-blooded hair yes no no yes no mammal
python cold-blooded scales no no no no yes reptile
salmon cold-blooded scales no yes no no no fish
whale warm-blooded hair yes yes no no no mammal
frog cold-blooded none no semi no yes yes amphibian
komodo cold-blooded scales no no no yes no reptile
dragon warm-blooded hair yes no yes yes yes mammal
bat warm-blooded feather no no yes yes no bird
pigeon warm-blooded fur yes no no yes no mammal
cat cold-blooded scales yes yes no no no fish
leopard cold-blooded scales no semi no yes no reptile
shark warm-blooded feather no semi no yes no bird
turtle warm-blooded quills yes no no yes yes mammal
penguin cold-blooded scales no yes no no no fish
porcupine cold-blooded none no semi no yes yes amphibian
Classification vs. Prediction
⮚ Assumption
After data preparation, in a data set where, each record has attributes
X1,…,Xn, and Y
⮚ Goal
Learn a function f:(X1,…,Xn) → Y,
then use function f to predict y for a given input record (x1,…,xn)
Called supervised learning, because true labels (Y- values) are known for
initially provided data
What is classification ?
⮚ Descriptive modeling
⮚Predictive Modeling
Descriptive Modeling
⮚ A classification model also be used to predict the class label of unknown records
⮚As a black box that automatically assigns a class label when presented
with the attribute set of an unknown record
Name Body Temp Skin Gives Aquatic Aerial Has Hibern Class
cover birth creature creature legs ates
The individual tuples making up training set are training tuples and
selected from the database under analysis (also known as supervised learning)
How Does Classification Work
⮚ A bank - analyze the loan data in order to learn which loan applicants are
safe/ risky
The officer analyze the data to learn which loan applicants are “safe” or “risky” for the
bank.
A medical researcher analyze breast cancer data to predict which one of the three
specific treatments a patient should receive.
Here the data analysis task required is classification, Where a model or classifier
is constructed to predict categorical labels, such as
eg: 1, 2,3 may be used to represent treatments A/ B/C where there is no ordering
among this group
What is Prediction
Marketing Manager -how much a given customer will spend during sale,
then the data analysis is numeric prediction.
Therefore, a test set used is made up of test tuples and associated class labels
These tuples are randomly selected from the general data set, independent of
the training tuples, meaning that they are not used to construct the classifier
The accuracy of a classifier on a given test set is the percentage of test tuples
that are correctly classified by the classifier
Classification
⮚Test data are used to estimate the accuracy of the classification rules
⮚If accuracy is acceptable, rules can be applied to the classification of new data