0% found this document useful (0 votes)
24 views

Machine Learning Models

Uploaded by

sskad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Machine Learning Models

Uploaded by

sskad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 54

2.

Machine
Learning Models
1

1
Topics to be covered

► 2.1 Type of Learning- Supervised, Unsupervised and Semi-Supervised

Learning

► 2.2 Components of Generalization Error (Bias, Variance, underfitting,

overfitting)

► 2.3 A Learning System Cycle and Design Cycle

► 2.4 Metrics for evaluation viz. accuracy, scalability, squared error, precision

and recall, likelihood, posterior probability

2
► 2.5 Classification Accuracy and Performance

2
Types of Learning

3
Supervised Learning

4
Supervised Learning

► The supervised learning approach is similar to human learning under the


supervision of a teacher.
 The teacher provides good examples for the student to memorize (learn), and
the student then derives general rules from these specific examples to use
on a new example.
► In other words, this algorithm learns from example data (training data) and
associated response (target). It is done to predict the correct response when
given a new example (test data).
► Some might also call this a “spoonfed” approach of machine learning. You
select the kind and extent of data that needs to be inputted (fed) to the
algorithm.
► Supervised learning is often described as task-oriented
y MayEiurxBahamndapir les: Email Spam classification, Stock price
B 5

pediction

5
Supervised Learning: Example 1

6
Supervised Learning: Example 1

7
Unsupervised Learning

► As the name suggests, unsupervised learning is the opposite of supervised


learning.
► In this case, you don’t provide the machine with any training data.
► The machine has to reach conclusions without any labeled data.
► Unsupervised learning algorithms can adapt to the data by dynamically
changing hidden structures
► It is used for clustering data and for finding anomalies.
► It is also quite popular as it is data-driven.
► Examples: Recommender Systems,

8
Unsupervised Learning: Example

9
Unsupervised Learning

10

10
Unsupervised Learning: Examples

Anomaly Detection Customer Segmentation


11

11
Unsupervised Learning: Examples

Recommender engine Market Basket


Analysis

12

12
Reinforcement learning

► In Reinforcement learning, the relation between data and machine is quite


different from other machine learning types.
 In reinforcement learning, the machine learns by its mistakes.
► It is the ability of an agent to interact with the environment and find out
what is the best outcome.
► It follows the concept of hit and trial method.
► The agent is rewarded or penalized with a point for a correct or a wrong
answer, and on the basis of the positive reward points gained the model
trains itself.
► And again once trained it gets ready to predict the new data presented
to it.
► Reinforcement learning is very behavior driven
B
y MayEiurxBahamndapir les: Video Games, Industrial 13

Simulation

13
Reinforcement Learning: Example

14
Reinforcement Learning: Applications

Gaming Robotics

15
Semi-supervised Learning

► A semi-supervised learning problem


starts with a series of labeled data
points as well as some data point for
which labels are not known.
► The goal of a semi-supervised model is
to classify some of the unlabeled data
using the labeled information set.
► The goal of a semi-supervised learning
model is to make effective use of all of
the available data, not just the labelled
data like in supervised learning.
► Examples: Speech Analysis, Internet
Content Classification

16
Semi-supervised Learning: Example

17
Semi-supervised learning: Application

18
19
Generalization Error

► In supervised learning applications in machine learning and statistical learning


theory, generalization error (also known as the out-of-sample error) is a
measure of how accurately an algorithm is able to predict outcome values
for previously unseen data.
► The gap between predictions and observed data is induced by model
inaccuracy, sampling error, and noise.
► Some of the errors are reducible but some are not.
► Choosing the right algorithm and tuning parameters could improve model
accuracy, but we will never be able to make our predictions 100% accurate.

20
Components of Generalization Error

Bias Variance
Varianc Underfitting Overfitting
e

21
Prediction error

22
Bias

► Bias is defined as the average squared difference between predictions and


true values.
 A model with a high bias is too simple and has a low number of predictors.
► high bias will cause the algorithm to miss a dominant pattern or relationship
between the input and output variables
► It pays very little attention to the training data and oversimplifies the model.
This leads to high errors in training and test data.
► In simple terms, think of bias as the error rate of the training data.
► When the error rate is high, we call it High Bias and when the error rate is
low, we call it Low Bias

23
Variance

► Variance is the variability of model prediction for a given data point or a value
which tells us spread of our data.

Model with high variance pays a lot of attention to training data and does not
 generalize on the data which it has not seen before.

► As a result, such models perform very well on training data but has high error rates
on test data.

► The Variance is when the model takes into account the fluctuations in the data i.e.
the noise as well.

► In simple terms, think of variance as the error rate of the testing data

► When the error rate is high, we call it High Variance and when the error rate is low,
we call it Low Variance
24
Underfitting

► When a model is unable to capture the essence of the training data properly because of a
low number of parameters then this phenomenon is known as Underfitting.

► High bias error, Low variance error

► When the model has a high error rate in the training data, we can say the model is
underfitting

► This can also happen in cases where we have very less amount of data to build an
accurate model.

► Since our model performs badly on the training data, it consequently performs
badly on the testing data as well.

► A high error rate in training data implies a High Bias, therefore In simple terms,
High Bias implies underfitting

25
Overfitting

► When a model is built using so many predictors that it captures noise along with the underlying pattern

► It tries to fit the model too closely to the training data leaving very less scope for generalizability. This
phenomenon is known as Overfitting.

► Low bias error, High variance error

► When the model has a low error rate in training data but a high error rate in testing data, we can say
the model is overfitting.

► This usually occurs when the number of training samples is too high or the hyperparameters have
been tuned to produce a low error rate on the training data.

► A low error rate in training data implies Low Bias whereas a high error rate in testing data implies a
High Variance, therefore, In simple terms, Low Bias and High Variance implies overfitting

26
Bias-Variance, Underfitting overfitting

27
ML System cycle

4 1
Maintenance Ideation

3 2
Production Development

28
1. Ideation

► The following prerequisites are essential for a successful ideation:

1. Clear requirements regarding business objectives and scope

2. Availability of historical data

3. Understanding of end-to-end IT infrastructure requirements

29
2. Development

► Once key metrics that correspond to the business objectives are agreed upon and historical

data is acquired, the data scientist can start developing the initial model.

► Data scientists have a wide array of tools available to solve their puzzles:

1. Transforming data to a more useful format

2. Analysis of data to guide modeling approach

3. Writing of the actual machine learning model code

4. Creating numbers and visuals for initial reports

30
3.Production

► When the development phase is over, the developed model needs to be put in production

to start generating value.

► The complexity of getting a model in production depends on the context of the problem,

the autonomy of data science teams and the overall maturity of the organization.

► The context of the problem consists of a number of factors:

1. Data flow at prediction time

2. Sensitivity of the data

3. Maximum acceptable latency of delivery


31
4. Maintenance

► Once a model is deployed, there are a number of measures that can be taken to improve the
robustness and quality of the machine learning model.

► These measures can be roughly divided into four areas. We call this post- production
process maintenance.

1. Lineage: The lineage of a machine learning model refers to the origins of the model, including
which source code the model uses, which data it was training on, and what parameters were
used.

2. Monitoring: By setting-up proper monitoring, if some erroneous behavior is detected the


development team and relevant stakeholders should be notified.

3. Comparison: It is important to have a framework that allows data scientists to make


comparisons between different options in a live environment

4. Model Drift: Machine learning models can become obsolete when not maintained properly, this
concept is called model drift
32
Evaluation Metrics

33
Confusion Matrix

► Evaluation of the performance of a classification model is based on the counts of test records

correctly and incorrectly predicted by the model.

► The target variable has two values: Positive or Negative

► The columns represent the actual values of the target variable

34
► The rows represent the predicted values of the target variable
Confusion Matrix

► True Positive (TP)


► The predicted value matches the actual value

► The actual value was positive and the model predicted a positive value

► True Negative (TN)


► The predicted value matches the actual value

► The actual value was negative and the model predicted a negative value

► False Positive (FP) – Type 1 error


► The predicted value was falsely predicted

► The actual value was negative but the model predicted a positive value

► False Negative (FN) – Type 2 error


By Mayuri Bhandari 36

► The predicted value was falsely predicted

► The actual value was positive but the model predicted a negative value
35
Accuracy Example

Accuracy = 5/10 = 0.5

36
Accuracy

► Accuracy is a useful metric only when you have an equal distribution of classes in your

classification.

► Accuracy is not a good metric to use when you have a class imbalance.

37
Imbalanced Data Example

► Imagine you are working on the sales data of a website. You know that 99% of website

visitors don’t buy and that only 1% of visitors buy something.

You are building a classification model to predict which website visitors are buyers and

which are just lookers.


► Now imagine a model that doesn’t work very well. It predicts that 100% of your visitors

are just lookers and that 0% of your visitors are buyers. It is clearly a very wrong and

useless model.

► What would happen if we’d use the accuracy formula on this model? Your model has

predicted only 1% wrongly: all the buyers have been misclassified as lookers. The

percentage of correct predictions is therefore 99%. The problem here is that an accuracy

of 99% sounds like a great result, whereas your model performs very38 poorly.
Solving Imbalance In The Data

► You can resample your data set in such a way that the data is not imbalanced anymore. You can then use

accuracy as a metric again.

► There are methods including undersampling, oversampling, and SMOTE data augmentation for

resampling

► Another way to solve class imbalance problems is to use better accuracy metrics like the F1 score,

which take into account not only the number of prediction errors that your model makes, but that also

look at the type of errors that are made.

39
Precision

► With in everything that has been predicted as a positive, precision counts the percentage that is correct

► Precision: Out of all the positive classes, how much we predicted correctly?

► A not precise model may find a lot of the positives, but its selection method is noisy: it also wrongly

detects many positives that aren’t actually positives.

► A precise model is very “pure”: maybe it does not find all the positives, but the ones that the model

does class as positive are very likely to be correct.

40
Recall

► Within everything that actually is positive, how many did the model succeed to find

► Recall: What proportion of actual Positives are correctly classified?

► A model with high recall succeeds well in finding all the positive cases in the data, even

though they may also wrongly identify some negative cases as positive cases.

► A model with low recall is not able to find all (or a large part) of the positive cases in the

data.

41
Precision-Recall Trade-Off

► Unfortunately, you can’t have both precision and recall high.

► If you increase precision, it will reduce recall and vice versa. This is called the

precision/recall tradeoff.

► The Precision-Recall Trade-Off represents the fact that in many cases, you can tweak a

model to increase precision at a cost of a lower recall, or on the other hand increase recall

at the cost of lower precision.

42
Precision and Recall Example

Precision = 4/7 = 0.57


Recall = 4/6 = 0.67

43
F1 Score

► Precision and Recall are the two building blocks of the F1 score.

► The goal of the F1 score is to combine the precision and recall metrics into a single metric.

► The F1 score has been designed to work well on imbalanced data.

► Since the F1 score is an average of Precision and Recall, it means that the F1 score gives equal
weight to Precision and Recall:

► A model will obtain a high F1 score if both Precision and Recall are high

► A model will obtain a low F1 score if both Precision and Recall are low

► A model will obtain a medium F1 score if one of Precision and Recall is low and the other is high

► An F1 score is considered perfect when it’s 1, while the model is a total failure when it’s 0.

44
F1-score

► F1 score is the harmonic mean of precision and recall

► F1 = 2* (precision * recall)/(precision + recall)

► A good F1 score means that you have low false positives and low false negatives, so you’re

correctly identifying real threats, and you are not disturbed by false alarms.

► An F1 score is considered perfect when it’s 1, while the model is a total failure when it’s 0.

45
Mean Squared Error (MSE)

► The Mean Squared Error (MSE) is perhaps the simplest and most common

► To calculate the MSE, you take the difference between your model’s predictions and the ground

truth, square it, and average it out across the whole dataset.

► The MSE will never be negative, since we are always squaring the errors.

► The MSE is formally defined by the following equation:

► Where N is the number of samples we are testing against.

46
Mean Absolute Error
(MAE)
► To calculate the MAE, you take the difference between your model’s
predictions and the ground truth, apply the absolute value to that difference,
and then average it out across the whole dataset.
► The MAE, like the MSE, will never be negative since in this case we are always
taking the absolute value of the errors.
► The MAE is formally defined by the following equation:

► Where, n = total number of data points

47
Root Mean Squared Error (RMSE)

► RMSE is the standard deviation of the errors which occur when a prediction is
made on a dataset.
► This is the same as MSE (Mean Squared Error) but the root of the value is
considered while determining the accuracy of the model.
► The RMSE is formally defined by the following equation:

48
ML Scalability
► ML scalability is scaling ML models to handle massive
data sets and perform many computations in a cost-
effective and time-saving way.

 ML scalability with an example:
► A model built to predict stock prices consumes data from
a large dataset and delivers prediction instantly. These
predictions are relevant for a limited timeframe, and
delayed predictions become meaningless from the user’s
perspective. Stock prices are super dynamic by nature
and hence getting an instant stock prediction is very
important here. Scalability comes as a rescue in such
situations. It allows scaling ML models to serve millions of
users and fits well for big data.
49
Check your understanding

► The confusion matrix is used to: (Multiple answers possible)


A. Understand how attributes are related to each other
B. How data is spread on each dimension
C. Evaluate the performance of classification algorithms within each class
D. It is a visual representation of the actual distribution of predicted values of
target labels in the context of the actual values of the target labels

50
Check your understanding

► Variance errors increases:


A. When the number of independent attributes increase
B. When models become over complex
C. Data is sparse in the given feature space
D. All of the above

51
Check your understanding

► True positives in the confusion matrix are Type I error.


A. True
B. False

► Which of the following would help to increase the value of precision? (Multiple
answers possible)
A. Increasing true positive
B. Increasing true negative
C. Decreasing false positive
D. Decreasing false negative

52
Check your understanding

► Which of the following is FALSE for Bias:


A. It is the error rate on training data
B. It is the result of oversimplified data
C. High value for bias results in underfitting
D. High value for bias results in overfitting

53
References

► https://towardsdatascience.com/the-f1-score-bec2bbc38aa6
► https://medium.com/analytics-vidhya/precision-recall-tradeoff-
79e892d43134

54

You might also like