Classification & Regression BDMDM Print

The document outlines the fundamentals of machine learning, distinguishing between prediction and classification, as well as supervised and unsupervised learning techniques. It discusses model performance metrics for linear regression, the concepts of bias and variance, and the implications of overfitting and underfitting. Additionally, it covers logistic regression as a classification model, including tuning techniques and performance assessment metrics like AUC-ROC curve.

Uploaded by

p23aswin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views5 pages

Classification & Regression BDMDM Print

Uploaded by

p23aswin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Machine Learning Prediction vs Classification

• “Machine Learning at its most basic is the practice of using algorithms to parse data, learn from it, • Prediction: Prediction is used to determine the numeric-valued outcome. Ex:
and then make a determination or prediction about something in the world.” – Nvidia Predicting how much customer will spend during a sale
• Supervised learning technique analyzes the labelled data and it maps the input to output based
example data points.
• Classification: Model is constructed and used to predict categorical labels. Ex: To
• Unsupervised learning works with unlabeled data in order to find previously undetected patterns predict if loan applications from a customer is safe or risky
or insights from the data set.
• Classification and regression are the types of problems solved using the supervised learning
method. Clustering and associative rule mining techniques are example of unsupervised machine
learning.
https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-deep-
learning-ai/
Machine Learning vs Statistics
Machine Learning Statistics
Ultimate objective is to attain higher predictive power Statistical model’s primary objective is to make
inference about data , finding and understanding
Predictive Techniques relationship among variables.
Sacrifice interpretability over predictive power Models are interpretable
Linear Regression
Assess the model performance on unseen data/test Model is evaluated on the basis of significance and
data and it’s ability to make future predictions robustness of model parameters
Model Performance of Linear Regression Controlling for overfitting
Using Following metrics on the test data: • Overfitting happens when a model is able to give accurate outcome on train data, but
fails to perform well on test data.
• MAE: Mean Absolute Error • Penalizing model for complexity can reduce the overfitting
• Penalizes the extreme observations from the train data , as these might be observed by
• MSE : Mean Square Error chance and less likely to be reflected in model.
• L1 and L2 regularization (Lasso and Ridge) are used for controlling the overfitting in the
• RMSE : Root Mean Square Error model.
• MAPE : Mean Absolute Percentage Error • Regularization technique shrinks the values of beta coefficients in linear regression
model, thus restricting the model learning from train data.
Steps for building a predictive model
Define Problem and Gather Train a Supervised ML model
Linear Regression Data using Train Data
• A linear regression is presented as following:
Data Cleaning and Predict for X_test using the
processing trained Model
Partition Data into X and y Compute error or accuracy
X: Input variables of model using predicted
Y: Target or output and actual y values for test
data
• Finding the model parameters values for which following Mean Square error is minimized.
Partition Data into Train and
Test
N is number of observations in train data, fi is the value predicted by the model for data point i, and yi is the actual value of data point i. (X_Train, X_test) (y_train,
y_test)
Bias vs Variance Overfitting Vs Underfitting
• Bias = E (y_predicted) – y • Overfitting is reflected by a High Variance model or estimator.
• Variance= E (y_predicted – E(y_predicted))^2 • To control overfitting: Reducing the Variance of the estimator, Ex: increase
regularization, obtain larger data set, decrease number of features etc.
• High bias model will have high difference between actual and predicted
values. It means a simplistic model that is build using only few beliefs or • Underfitting relates to having a High Bias model. To fight underfitting, we can use
assumptions from data. High bias lead to underfitting. less regularization, use more features, use complex models etc.
• High variance model means that model is too sensitive to training data and
it’s output varies a lot with change in train data. It also learns from the
noise in data. High variance may also lead to overfitting.
L1 and L2 Regularization
• Regression model using L2 regularization is also called Ridge regression. It uses
following cost function.
• Regression model using L1 regularization is also called lasso regression. It shrinks
the values of smaller coefficients to 0. hence, it is also used in feature selection as
well.
Classification Models
Logistic Regression
• Logistic regression is a linear classifier
𝑓(𝐱) = 𝑏₀ + 𝑏₁𝑥₁ + ⋯ + 𝑏ᵣ𝑥ᵣ --Linear Function
Thank You!! 𝑏₀, 𝑏₁, …, 𝑏ᵣ are the coefficients
𝑝(𝐱) = 1 / (1 + exp(−𝑓(𝐱)) ---- Sigmoid Function
y=1, if p(x)>=threshold
y=0, otherwise
• Minimizes the following cost function:
Cost= Σᵢ -(𝑦ᵢ log(𝑝(𝐱ᵢ)) - (1 − 𝑦ᵢ) log(1 − 𝑝(𝐱ᵢ)))
Tuning Logistic Regression model
• Regularizing the model parameters by using L1 and L2 regularization.
• By default, probability threshold for assigning a data point to positive class is 0.5.
• In case of imbalanced datasets, analyst may validate the model for various
threshold values and chose a threshold value which provides the highest f1-score
Metrics for assessing Logistic regression performance AUC-ROC Curve
Predicted
0 1
• Important metrics for checking any classification model’s performance
0 True False
Actual Negative Positive
1 False True • It tells how much model is capable of distinguishing between classes
Negative Positive
• AUC near the 1 means model is good in separating the classes, AUC equal to 0.5
means that model has no class separation capacity whatsoever.

Huawei: H13-311 - V3.0 Exam
100% (2)
Huawei: H13-311 - V3.0 Exam
93 pages
Neural Networks Desing - Martin T. Hagan - 2nd Edition
100% (2)
Neural Networks Desing - Martin T. Hagan - 2nd Edition
1,013 pages
ML-1
No ratings yet
ML-1
24 pages
Cp4252 Ml Unit-II
No ratings yet
Cp4252 Ml Unit-II
44 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
Machine learning notes
No ratings yet
Machine learning notes
12 pages
ML Summary PDF
No ratings yet
ML Summary PDF
5 pages
Unit I
No ratings yet
Unit I
14 pages
Module 3.3 Classification Models, An Overview
No ratings yet
Module 3.3 Classification Models, An Overview
11 pages
Week 9 - PROG 8510 Week 9
No ratings yet
Week 9 - PROG 8510 Week 9
27 pages
Unit-Vi 2
No ratings yet
Unit-Vi 2
31 pages
Machine learning
No ratings yet
Machine learning
62 pages
Module 5
No ratings yet
Module 5
48 pages
Week - 03 Week04
No ratings yet
Week - 03 Week04
32 pages
Lecture 02
No ratings yet
Lecture 02
43 pages
Module 3
No ratings yet
Module 3
35 pages
d3 It Ml Jan 2023 Part 2
No ratings yet
d3 It Ml Jan 2023 Part 2
32 pages
ML-1-PPT-UNIT-1
No ratings yet
ML-1-PPT-UNIT-1
93 pages
ARTIFICIAL INTELLIGENCE LEC 4
No ratings yet
ARTIFICIAL INTELLIGENCE LEC 4
13 pages
Regression_Questionnaire
No ratings yet
Regression_Questionnaire
10 pages
Linear Regression
No ratings yet
Linear Regression
60 pages
Regression
No ratings yet
Regression
45 pages
Chapter 6 Supervised Learning
No ratings yet
Chapter 6 Supervised Learning
6 pages
Linear Regression Algorithm
No ratings yet
Linear Regression Algorithm
16 pages
ML Solved Endsem
No ratings yet
ML Solved Endsem
16 pages
IDA117V Supervised ML
No ratings yet
IDA117V Supervised ML
39 pages
Unit -3_ML_24
No ratings yet
Unit -3_ML_24
41 pages
UNIT-4 PDA PPT
No ratings yet
UNIT-4 PDA PPT
111 pages
machine learning
No ratings yet
machine learning
37 pages
Ch-2 Supervised Machine Learning
No ratings yet
Ch-2 Supervised Machine Learning
48 pages
Regression
No ratings yet
Regression
24 pages
Supervised Learning
No ratings yet
Supervised Learning
24 pages
Supervised Regression Notes
No ratings yet
Supervised Regression Notes
11 pages
Linear Regression
No ratings yet
Linear Regression
5 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
AI ML 3
No ratings yet
AI ML 3
27 pages
Lec 3 Regression.
No ratings yet
Lec 3 Regression.
20 pages
Unit 2 - NOTES1 - ML
No ratings yet
Unit 2 - NOTES1 - ML
35 pages
Mod 3
No ratings yet
Mod 3
9 pages
ML_Introduction
No ratings yet
ML_Introduction
76 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Linear-Regression ML
No ratings yet
Linear-Regression ML
36 pages
DMML Unit4
No ratings yet
DMML Unit4
77 pages
Logistic Regression
No ratings yet
Logistic Regression
42 pages
ML UNIT II
No ratings yet
ML UNIT II
30 pages
ML - LAB - BE CSE (DS) Final
No ratings yet
ML - LAB - BE CSE (DS) Final
110 pages
Supervised and Unsupervised Learning
No ratings yet
Supervised and Unsupervised Learning
92 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
Cours1 ML
No ratings yet
Cours1 ML
41 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
ML Unit 3
No ratings yet
ML Unit 3
2 pages
02 Regression and Classification Problems
No ratings yet
02 Regression and Classification Problems
7 pages
Csa202 Unit 2
No ratings yet
Csa202 Unit 2
36 pages
6.Classification & Regression
No ratings yet
6.Classification & Regression
45 pages
ML-2
No ratings yet
ML-2
155 pages
Unit-2
No ratings yet
Unit-2
18 pages
module 2 modified
No ratings yet
module 2 modified
67 pages
Unit 2
No ratings yet
Unit 2
8 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
César Pérez López
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
Applied Machine Learning Course Schedule: 1:fundamentals of Programming
No ratings yet
Applied Machine Learning Course Schedule: 1:fundamentals of Programming
33 pages
summer training
No ratings yet
summer training
16 pages
DL Unit 3
No ratings yet
DL Unit 3
59 pages
Neural Machine Translation
100% (1)
Neural Machine Translation
12 pages
Guidelines: DSE3-Machine Learning
No ratings yet
Guidelines: DSE3-Machine Learning
2 pages
ML Assignment 2
No ratings yet
ML Assignment 2
3 pages
Microbial Data Intelligence and Computational Techniques for Sustainable Computing 1st Edition Aditya Khamparia download
100% (3)
Microbial Data Intelligence and Computational Techniques for Sustainable Computing 1st Edition Aditya Khamparia download
69 pages
Bt Seai Ml Dl Prev Question Papers
No ratings yet
Bt Seai Ml Dl Prev Question Papers
11 pages
Zhang Learning Fast Sample Re-Weighting Without Reward Data ICCV 2021 Paper
No ratings yet
Zhang Learning Fast Sample Re-Weighting Without Reward Data ICCV 2021 Paper
10 pages
CS6ML Assignment1
No ratings yet
CS6ML Assignment1
4 pages
2 Machine Learning General
No ratings yet
2 Machine Learning General
56 pages
3 Bayesian Deep Learning
No ratings yet
3 Bayesian Deep Learning
33 pages
ml notes question bank exstraction from notes
No ratings yet
ml notes question bank exstraction from notes
163 pages
Data Science Course
100% (1)
Data Science Course
51 pages
2019-Liu-Machine Learning For Predicting Thermodynamic Properties of Pure Fluids and Their Mixtures
No ratings yet
2019-Liu-Machine Learning For Predicting Thermodynamic Properties of Pure Fluids and Their Mixtures
8 pages
Benchmarking Detection Transfer Learning With Vision Transformers
No ratings yet
Benchmarking Detection Transfer Learning With Vision Transformers
9 pages
ML System Optimization - Lecture 10 - Model Optimization Techniques
No ratings yet
ML System Optimization - Lecture 10 - Model Optimization Techniques
33 pages
Machine Learning Coursera All Exercies
75% (12)
Machine Learning Coursera All Exercies
117 pages
百万单细胞数据时代的多组学整合
No ratings yet
百万单细胞数据时代的多组学整合
15 pages
Deep+Learning+Module-02+Search+Creators
No ratings yet
Deep+Learning+Module-02+Search+Creators
15 pages
Coursera Machine Learning Specialization
No ratings yet
Coursera Machine Learning Specialization
46 pages
Regularization Neural Network For Construction Cost Estimation by Hoijat Adeli and Mingyang Wu
No ratings yet
Regularization Neural Network For Construction Cost Estimation by Hoijat Adeli and Mingyang Wu
7 pages
DL - Assignment 10 Solution
100% (2)
DL - Assignment 10 Solution
6 pages
Collaborative Filtering - Dotx
No ratings yet
Collaborative Filtering - Dotx
36 pages
UNIT-5 part1
No ratings yet
UNIT-5 part1
15 pages
Customer Personality Analysis For Churn Prediction Using Hybrid Ensemble Models and Class Balancing Techniques
No ratings yet
Customer Personality Analysis For Churn Prediction Using Hybrid Ensemble Models and Class Balancing Techniques
15 pages
L3 Linear Regression
No ratings yet
L3 Linear Regression
23 pages
Solutions Problem Set 1
No ratings yet
Solutions Problem Set 1
7 pages

Classification & Regression BDMDM Print

Uploaded by

Classification & Regression BDMDM Print

Uploaded by

Machine Learning Prediction vs Classification

You might also like