0% found this document useful (0 votes)
7 views45 pages

Lecture 4 Machine Learning - Bcsc

Uploaded by

Syrüp Crüz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views45 pages

Lecture 4 Machine Learning - Bcsc

Uploaded by

Syrüp Crüz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 45

INTRODUCTION TO

MACHINE LEARNING
WHAT IS MACHINE
LEARNING?
Is a subset of AI that focuses on designing of
systems that are allowed to learn and make
predictions based on experience (which is data in
this case)
Machine learning enables computers to act and
make data driven decisions rather than being
explicitly programmed.
HOW MACHINE
LEARNING WORKS
TYPES OF MACHINE
LEARNING
Supervised Learning
Unsupervised Learning
Semi-supervised Learning
Reinforcement learning
SUPERVISED LEARNING
In supervised learning, the training data fed into
the ML algorithm includes the desired solutions,
called labels.
Supervised learning can be categorized into
Classification
Regression
SUPERVISED LEARNING (CONT.)
Classification: A classification problem is when the
output variable is a category, such as “Red” or “blue”,
“disease” and “no disease”, “spam” or “no spam”.
An example of a classification problem is predicting
the grade of a student
Regression: A regression problem is when the output
variable is a real value, such as “weight”.
An example of a regression problem is predicting exam
score of a student
UNSUPERVISED
LEARNING
Unsupervised learning is the training of a machine
using information that is neither classified nor
labeled and allowing the algorithm to act on that
information without guidance.
 Here the task of the machine is to group unsorted
information according to similarities, patterns, and
differences without any prior training of data.
UNSUPERVISED
LEARNING (CONT.)
Unsupervised learning can be categorized into
Clustering
Association
Clustering finds patterns on data being worked on e.g.
shape, size, colour which can be used to group data or
create cluster.
Examples of algorithm used here are k-means
clustering, hierarchical clustering
UNSUPERVISED
LEARNING (CONT.)
 Association finds the dependencies of one data
item to another data item and map them e.g. one
looking for bread in the super market will be
looking for milk also.
Examples of algorithms used include apriori,
FP Growth
SEMI SUPERVISED
LEARNING
Semi-supervised learning is the type of machine
learning that uses a combination of labeled data
(usually small amount) and unlabeled data (usually
large amount of ) to train models.
This approach to machine learning is a combination
of supervised machine learning, which uses labeled
training data, and unsupervised learning, which uses
unlabeled training data.
REINFORCEMENT LEARNING
 Reinforcement means to establish or create a pattern of
behavior.
 Reinforcement learning is a type of machine learning
where machine learns to behave in an environment by
performing actions and seeing the results
 Agent: RL algorithm that learns through trial and error
 Environment is the world through which an agent moves
 Action: All the possible steps an agent can take
 State: current condition returned by the environment
MACHINE LEARNING
ALGORITHMS
Often ML algorithms are referred to as ML models or
systems.
The model is the core component of machine learning,
and ultimately what we are trying to build
 Rather than being edited by people so that they work
well, machine learning models are shaped by data: they
learn from experience.
MACHINE LEARNING
ALGORITHMS (CONT.)
A model can be thought of as a function that accepts data
as an input and produces an output (estimations and
predictions).
Parameter values are continuously adjusted during the
model training process.
The final parameters found after training determines how
the model will perform on unseen data.
MACHINE LEARNING
ALGORITHMS (CONT.)
Models are trained using data plus two pieces of code:
 the objective function, and the optimizer.
Objective is what is expected of the model to be able to do.
Objective function judges whether the model is doing a
good job or not.
The optimizer is code that changes the model’s parameters
so the model will do a better job next time.
MACHINE LEARNING
ALGORITHMS (CONT.)
MACHINE LEARNING
ALGORITHMS
 NB: Training (CONT.)
a model changes the parameter values inside of a
model but does not change the kind of model is used.
 Using a model means providing inputs and receiving an
estimation or prediction -- this is done when testing the model
and when using it in the real world.
TYPES OF MACHINE LEARNING
ALGORITHMS
There are several machine learning algorithms. Some
common ones include
 Linear regression
 Logistic regression
 Decision tree
 Random forest
 K-Nearest Neighbour
 Naïve Bayes
 Support Vector Machines
LINEAR REGRESSION
 Linear Regression is a supervised Machine Learning algorithm
in which the model finds the linear relationship between the
dependent and independent variable.
LOGISTIC REGRESSION
 Logistic regression is a supervised learning classification
algorithm used to predict the probability of a target variable.
DECISION TREE
Decision Trees are a type of Supervised Machine
Learning algorithm (used to solve both classification
and regression problems) where the data is
continuously split according to a certain parameter.
With DT, little effort is required for data preparation
Can handle both numerical and categorical data
Nonlinear parameters do not affect its performance
DECISION TREE
(DISADVANTAGES)
Overfitting occurs when the algorithm captures
noise in the data
High variance – the model can get unstable due to
small variation in data
Low –bias tree – a highly complicated DT tends to
have a low bias which makes it difficult for the
model to work with new data
RANDOM FOREST
 Random forest is a supervised machine learning algorithm
used for Classification and Regression problems.
 It is based on the concept of ensemble learning which is a
process of combining multiple classifiers to solve a complex
problem and improve the performance of the model.
 Random forest therefore uses multiple decision trees to
predict the output.
 Random Forest Algorithm reduces the risk of overfitting
and the required training time.
 Additionally, it offers a high level of accuracy.
KNN (K NEAREST
NEIGHBOUR) ALGORITHM
While K nearest neighbour algorithm can be used
for either regression or classification problems, it is
typically used as a classification algorithm, working
off the assumption that similar points can be found
near one another.
“k” means the number of nearest neighbors the
model will consider.
NAÏVE BAYES
Naïve Bayes is a probabilistic machine learning
algorithm based on the Bayes Theorem, used in a
wide variety of classification tasks.
CHALLENGES OF MACHINE

LEARNING
The challenges that can arise in selecting and training
machine learning model can be grouped into “bad
model” and “bad data.”
The following can be classified as bad data
Insufficient Quantity of Training Data
Non-representative Training Data
CHALLENGES OF MACHINE
LEARNING(CONT.)

Bad Data:
Poor Quality of Data
Irrelevant Features
Overfitting the Training Data
Underfitting the Training Data
IRRELEVANT FEATURES
Feature engineering is the process of coming up with a good
set of features within a dataset to train a model on.
This process involves
Feature selection: selecting the most useful features to train
on among existing features.
Feature extraction: combining existing features to produce
a more useful one (dimensionality reduction algorithms can
help).
OVERFITTING THE
TRAINING DATA
Overfitting is a concept where a model
performs well on training data, but it does not
generalize well to new unseen data.
Generally speaking, Overfitting happens when
the model is too complex relative to the amount
and noisiness of the data.
OVERFITTING THE
TRAINING DATA
The possible solutions are:
 To simplify the model by selecting one with fewer
parameters, by reducing the number of attributes in the
training data
 To gather more training data
 To reduce the noise in the training data, that is fixing data
errors, remove outliers and reduce the number of instances in
the training set.
UNDERFITTING THE TRAINING DATA
 Underfitting is a situation where the model is too simple for the
data.
 It generally happens when the information available to construct
the exact model is small.
The main options to fix this problem are:
 Selecting a more powerful model, with more parameters
 Feeding better features to the learning algorithm (feature
engineering)
 Reducing the constraints on the model (e.g., reducing the
regularization hyperparameter)
TESTING AND VALIDATING
The only way to know how well a model will generalize
to new cases is to actually try it out on new cases
Data is usually split into training set and the test set where
the training set is used to train the model and the testing
set is used to test the model.
EVALUATION METRICS
FOR REGRESSION
 MAE
ANALYSIS
 MSE
 RMSE
 MAPE – Mean Absolute Percentage Error
 MPE – Mean Percentage Error
 R SQUARED SCORE
 ADJUSTED R SQUARED SCORE
MEAN SQUARED ERROR -MSE
Mean Squared Error (MSE): finds the average of the
squared difference between the target value and the
value predicted by the regression model.
It is given by

 Where
• y_j: actual value
• y_hat: predicted value from the regression model
• N: number of items in the sample
MEAN SQUARED ERROR –
MSE (CONT.)
 It penalizes even small errors by squaring them, which
essentially leads to an overestimation of how bad the
model is.
 Error interpretation has to be done with squaring factor
(scale) in mind.
 Due to the squaring factor, it’s fundamentally more
prone to outliers than other metrics
MEAN ABSOLUTE ERROR -MAE
 Mean Absolute Error is the average of the difference between
the actual values and the predicted values.
 Mathematically, its represented as :

 Where:
• y_j: actual value
• y_hat: predicted value from the regression model
• N: number of items in the sample
MEAN ABSOLUTE ERROR –
MAE (CONT)
• It’s more robust towards outliers than MSE, since it
doesn’t exaggerate errors.
• It gives a measure of how far the predictions were
from the actual output.
• However, since MAE uses absolute value of the
residual, it doesn’t give us an idea of the direction of
the error, i.e. whether there is under-prediction or
over-prediction the data.
ROOT MEAN SQUARED
ERROR (RMSE)
 Root Mean Squared Error (RMSE): It is the root of
MSE i.e Root of the mean difference of Actual and
Predicted values.
 RMSE penalizes the large errors whereas MSE doesn’t.
MAE & RMSE - SIMILARITIES
 Mean Absolute Error (MAE) and Root mean squared error (RMSE) are
two of the most common metrics used to measure accuracy for
continuous variables.

Both MAE and RMSE express average model prediction error in units of
the variable of interest.
Both metrics can range from 0 to ∞ and are indifferent to the direction of
errors.
They are negatively oriented scores, which means lower values are better.
MAE & RMSE - DIFFERENCES

Taking the square root of the average squared errors


has some interesting implications for RMSE.
Since the errors are squared before they are
averaged, the RMSE gives a relatively high weight
to large errors, that is, it is more sensitive to outliers
than the MAE.
 But when outliers are rare, the RMSE performs
very well.
CLASSIFICATION
METRICS
 Accuracy,
 Precision-Recall,
 ROC-AUC,
 Log-Loss
ACCURACY
 Accuracy shows the ratio of correct predictions to all predictions .

 Accuracy works well only if there are equal number of samples


belonging to each class.
CONFUSION MATRIX
 Confusion matrix provides an overview  True Positives (TP): Cases in which
of a model performance on different predicted value was YES and the actual
classes separately. output was also YES.
 True Negatives (TN): Cases in which
predicted value was NO and the actual
output was NO.
 False Positives (FP): Cases in which
predicted value was YES but the actual
output was NO.
 False Negatives (FN): Cases in which
predicted value was NO but the actual
output was YES.
PRECISION AND RECALL

 Accuracy of the positive predictions is A trivial way to have perfect precision is to


called the precision of the classifier. make one single positive prediction and
 ensure it is correct (precision = 1/1 = 100%).
Precision corresponds to the
proportion of positive data points that This would not be very useful since the
are correctly considered as positive, classifier would ignore all but one positive
with respect to all positive data points. instance.
 It is defined as: So precision is typically used along with
another metric named recall, also called
sensitivity or true positive rate (TPR).
Recall is the ratio of positive instances that
are correctly detected by the classifier and is
defined as:
F1 SCORE
F1 Score is the Harmonic Mean between precision and recall.
It tells how precise the classifier is (how many instances it classifies
correctly), as well as how robust it is (it does not miss a significant
number of instances).
High precision but lower recall, gives you an extremely accurate,
but it then misses a large number of instances that are difficult to
classify.
The greater the F1 Score, the better is the performance of our
model. Mathematically, it can be expressed as :

You might also like