Logistic Regression is the first non-linear classification ML algorithm taught in this course. This was one of the lectures of a full course I taught in University of Moratuwa, Sri Lanka on 2023 second half of the year.
This document discusses machine learning algorithms for classification problems, specifically logistic regression. It explains that logistic regression predicts the probability of a binary outcome using a sigmoid function. Unlike linear regression, which is used for continuous outputs, logistic regression is used for classification problems where the output is discrete/categorical. It describes how logistic regression learns model parameters through gradient descent optimization of a likelihood function to minimize error. Regularization techniques can also be used to address overfitting issues that may arise from limited training data.
The document discusses classification algorithms, which are supervised machine learning techniques used to categorize new observations based on patterns learned from training data. Classification algorithms learn from labeled training data to classify future observations into a finite number of classes or categories. The document provides examples of classification including spam detection and categorizing images as cats or dogs. It describes key aspects of classification algorithms like binary and multi-class classification and discusses specific algorithms like logistic regression and support vector machines (SVM).
Difference between logistic regression shallow neural network and deep neura...Chode Amarnath
Logistic regression and shallow neural networks are both supervised learning algorithms used for classification problems. Logistic regression uses a logistic function to output a probability between 0 and 1, while shallow neural networks perform the same logistic calculation multiple times on inputs. Deep neural networks further extend this idea by adding more hidden layers of computation between the input and output layers. Both logistic regression and neural networks are trained using gradient descent to minimize a cost function by updating the model's weights and biases.
Linear regression and logistic regression are two machine learning algorithms that can be implemented in Python. Linear regression is used for predictive analysis to find relationships between variables, while logistic regression is used for classification with binary dependent variables. Support vector machines (SVMs) are another algorithm that finds the optimal hyperplane to separate data points and maximize the margin between the classes. Key terms discussed include cost functions, gradient descent, confusion matrices, and ROC curves. Code examples are provided to demonstrate implementing linear regression, logistic regression, and SVM in Python using scikit-learn.
The document provides an overview of five fundamental machine learning algorithms: linear regression, logistic regression, decision tree learning, k-nearest neighbors, and neural networks. It describes the problem statement, solution, and key aspects of each algorithm. For linear regression, it discusses minimizing the squared error loss to find the optimal regression line. Logistic regression maximizes the likelihood function to find the optimal classification model. Decision tree learning uses an ID3 algorithm to greedily construct a non-parametric model by optimizing the average log-likelihood.
A step-by-step complete guide for Logistic Regression Classifier especially mentioning its Decision/Activation Function, Objective Function and Objective Function Optimization procedures.
1. The document discusses various machine learning classification algorithms including neural networks, support vector machines, logistic regression, and radial basis function networks.
2. It provides examples of using straight lines and complex boundaries to classify data with neural networks. Maximum margin hyperplanes are used for support vector machine classification.
3. Logistic regression is described as useful for binary classification problems by using a sigmoid function and cross entropy loss. Radial basis function networks can perform nonlinear classification with a kernel trick.
CSE357 fa21 (6) Linear Machine Learning11-11.pdfNermeenKamel7
The document discusses various techniques for linear and logistic regression models, including regularized forms that address overfitting. It describes linear regression for modeling continuous variables, logistic regression for classification with binary outcomes, and how regularization techniques like Lasso and Ridge regression address overfitting by penalizing model complexity. Examples are provided to illustrate overfitting and the effects of different regularization approaches.
This document provides an introduction to logistic regression. It begins by explaining how linear regression is not suitable for classification problems and introduces the logistic function which maps linear regression output between 0 and 1. This probability value can then be used for classification by setting a threshold of 0.5. The logistic function models the odds ratio as a linear function, allowing logistic regression to be used for binary classification. It can also be extended to multiclass classification problems. The document demonstrates how logistic regression finds a decision boundary to separate classes and how its syntax works in scikit-learn using common error metrics to evaluate performance.
Logistic regression is a machine learning classification algorithm that predicts the probability of a categorical dependent variable. It models the probability of the dependent variable being in one of two possible categories, as a function of the independent variables. The model transforms the linear combination of the independent variables using the logistic sigmoid function to output a probability between 0 and 1. Logistic regression is optimized using maximum likelihood estimation to find the coefficients that maximize the probability of the observed outcomes in the training data. Like linear regression, it makes assumptions about the data being binary classified with no noise or highly correlated independent variables.
This document provides an overview of supervised machine learning algorithms for classification, including logistic regression, k-nearest neighbors (KNN), support vector machines (SVM), and decision trees. It discusses key concepts like evaluation metrics, performance measures, and use cases. For logistic regression, it covers the mathematics behind maximum likelihood estimation and gradient descent. For KNN, it explains the algorithm and discusses distance metrics and a numerical example. For SVM, it outlines the concept of finding the optimal hyperplane that maximizes the margin between classes.
Logistic regression is a classification algorithm that predicts discrete class labels. It models the probability of each class as a logistic function of the inputs. The algorithm learns parameters θ that optimize a cost function measuring how well the predicted probabilities match the actual classes in the training data. It draws a decision boundary separating the predicted class probability distributions. For problems with more than two classes, logistic regression can be extended to one-vs-all classification by training a separate model for each class.
Logistic regression is a machine learning classification algorithm used to predict the probability of a categorical dependent variable given one or more independent variables. It uses a logit link function to transform the probability values into odds ratios between 0 and infinity. The model is trained by minimizing a cost function called logistic loss using gradient descent optimization. Model performance is evaluated using metrics like accuracy, precision, recall, and the confusion matrix, and can be optimized by adjusting the probability threshold for classifications.
15-Data Analytics in IoT - Supervised Learning-04-09-2024.pdfDharanshNeema
Group the similar data or features from the given data samples
•
The number of clusters can be chosen based on
•
Heuristics like Elbow method or
•
Some criteria such as
•
Akaike Information Criterion (AIC),
•
Bayesian Information Criterion (BIC), or
•
Deviance Information Criterion (DIC)
This document provides a summary of supervised learning techniques including linear regression, logistic regression, support vector machines, naive Bayes classification, and decision trees. It defines key concepts such as hypothesis, loss functions, cost functions, and gradient descent. It also covers generative models like Gaussian discriminant analysis, and ensemble methods such as random forests and boosting. Finally, it discusses learning theory concepts such as the VC dimension, PAC learning, and generalization error bounds.
This chapter discusses classification methods including linear discriminant functions and probabilistic generative and discriminative models. It covers linear decision boundaries, perceptrons, Fisher's linear discriminant, logistic regression, and the use of sigmoid and softmax activation functions. The key points are:
1) Classification involves dividing the input space into decision regions using linear or nonlinear boundaries.
2) Perceptrons and Fisher's linear discriminant find linear decision boundaries by updating weights to minimize misclassification.
3) Generative models like naive Bayes estimate joint probabilities while discriminative models like logistic regression directly model posterior probabilities.
4) Sigmoid and softmax functions are used to transform linear outputs into probabilities for binary and multiclass classification respectively.
Lecture - 10 Transformer Model, Motivation to Transformers, Principles, and ...Maninda Edirisooriya
Learn about the limitations of earlier Deep Sequence Models like RNNs, GRUs and LSTMs; Evolution of Attention Model as the Transformer Model with the paper, "Attention is All You Need". This was one of the lectures of a full course I taught in University of Moratuwa, Sri Lanka on 2024 first half of the year.
Learn End-to-End Learning, Multi-Task Learning, Transfer Learning and Meta Learning. This was one of the lectures of a full course I taught in University of Moratuwa, Sri Lanka on 2024 first half of the year.
More Related Content
Similar to Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Machine Learning (20)
The document provides an overview of five fundamental machine learning algorithms: linear regression, logistic regression, decision tree learning, k-nearest neighbors, and neural networks. It describes the problem statement, solution, and key aspects of each algorithm. For linear regression, it discusses minimizing the squared error loss to find the optimal regression line. Logistic regression maximizes the likelihood function to find the optimal classification model. Decision tree learning uses an ID3 algorithm to greedily construct a non-parametric model by optimizing the average log-likelihood.
A step-by-step complete guide for Logistic Regression Classifier especially mentioning its Decision/Activation Function, Objective Function and Objective Function Optimization procedures.
1. The document discusses various machine learning classification algorithms including neural networks, support vector machines, logistic regression, and radial basis function networks.
2. It provides examples of using straight lines and complex boundaries to classify data with neural networks. Maximum margin hyperplanes are used for support vector machine classification.
3. Logistic regression is described as useful for binary classification problems by using a sigmoid function and cross entropy loss. Radial basis function networks can perform nonlinear classification with a kernel trick.
CSE357 fa21 (6) Linear Machine Learning11-11.pdfNermeenKamel7
The document discusses various techniques for linear and logistic regression models, including regularized forms that address overfitting. It describes linear regression for modeling continuous variables, logistic regression for classification with binary outcomes, and how regularization techniques like Lasso and Ridge regression address overfitting by penalizing model complexity. Examples are provided to illustrate overfitting and the effects of different regularization approaches.
This document provides an introduction to logistic regression. It begins by explaining how linear regression is not suitable for classification problems and introduces the logistic function which maps linear regression output between 0 and 1. This probability value can then be used for classification by setting a threshold of 0.5. The logistic function models the odds ratio as a linear function, allowing logistic regression to be used for binary classification. It can also be extended to multiclass classification problems. The document demonstrates how logistic regression finds a decision boundary to separate classes and how its syntax works in scikit-learn using common error metrics to evaluate performance.
Logistic regression is a machine learning classification algorithm that predicts the probability of a categorical dependent variable. It models the probability of the dependent variable being in one of two possible categories, as a function of the independent variables. The model transforms the linear combination of the independent variables using the logistic sigmoid function to output a probability between 0 and 1. Logistic regression is optimized using maximum likelihood estimation to find the coefficients that maximize the probability of the observed outcomes in the training data. Like linear regression, it makes assumptions about the data being binary classified with no noise or highly correlated independent variables.
This document provides an overview of supervised machine learning algorithms for classification, including logistic regression, k-nearest neighbors (KNN), support vector machines (SVM), and decision trees. It discusses key concepts like evaluation metrics, performance measures, and use cases. For logistic regression, it covers the mathematics behind maximum likelihood estimation and gradient descent. For KNN, it explains the algorithm and discusses distance metrics and a numerical example. For SVM, it outlines the concept of finding the optimal hyperplane that maximizes the margin between classes.
Logistic regression is a classification algorithm that predicts discrete class labels. It models the probability of each class as a logistic function of the inputs. The algorithm learns parameters θ that optimize a cost function measuring how well the predicted probabilities match the actual classes in the training data. It draws a decision boundary separating the predicted class probability distributions. For problems with more than two classes, logistic regression can be extended to one-vs-all classification by training a separate model for each class.
Logistic regression is a machine learning classification algorithm used to predict the probability of a categorical dependent variable given one or more independent variables. It uses a logit link function to transform the probability values into odds ratios between 0 and infinity. The model is trained by minimizing a cost function called logistic loss using gradient descent optimization. Model performance is evaluated using metrics like accuracy, precision, recall, and the confusion matrix, and can be optimized by adjusting the probability threshold for classifications.
15-Data Analytics in IoT - Supervised Learning-04-09-2024.pdfDharanshNeema
Group the similar data or features from the given data samples
•
The number of clusters can be chosen based on
•
Heuristics like Elbow method or
•
Some criteria such as
•
Akaike Information Criterion (AIC),
•
Bayesian Information Criterion (BIC), or
•
Deviance Information Criterion (DIC)
This document provides a summary of supervised learning techniques including linear regression, logistic regression, support vector machines, naive Bayes classification, and decision trees. It defines key concepts such as hypothesis, loss functions, cost functions, and gradient descent. It also covers generative models like Gaussian discriminant analysis, and ensemble methods such as random forests and boosting. Finally, it discusses learning theory concepts such as the VC dimension, PAC learning, and generalization error bounds.
This chapter discusses classification methods including linear discriminant functions and probabilistic generative and discriminative models. It covers linear decision boundaries, perceptrons, Fisher's linear discriminant, logistic regression, and the use of sigmoid and softmax activation functions. The key points are:
1) Classification involves dividing the input space into decision regions using linear or nonlinear boundaries.
2) Perceptrons and Fisher's linear discriminant find linear decision boundaries by updating weights to minimize misclassification.
3) Generative models like naive Bayes estimate joint probabilities while discriminative models like logistic regression directly model posterior probabilities.
4) Sigmoid and softmax functions are used to transform linear outputs into probabilities for binary and multiclass classification respectively.
Lecture - 10 Transformer Model, Motivation to Transformers, Principles, and ...Maninda Edirisooriya
Learn about the limitations of earlier Deep Sequence Models like RNNs, GRUs and LSTMs; Evolution of Attention Model as the Transformer Model with the paper, "Attention is All You Need". This was one of the lectures of a full course I taught in University of Moratuwa, Sri Lanka on 2024 first half of the year.
Learn End-to-End Learning, Multi-Task Learning, Transfer Learning and Meta Learning. This was one of the lectures of a full course I taught in University of Moratuwa, Sri Lanka on 2024 first half of the year.
Learn Recurrent Neural Networks (RNN), GRU and LSTM networks and their architecture. This was one of the lectures of a full course I taught in University of Moratuwa, Sri Lanka on 2024 first half of the year.
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...Maninda Edirisooriya
Support Vector Machines are one of the main tool in classical Machine Learning toolbox. This was one of the lectures of a full course I taught in University of Moratuwa, Sri Lanka on 2023 second half of the year.
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Maninda Edirisooriya
Supervised ML technique, K-Nearest Neighbor and Unsupervised Clustering techniques are learnt in this lesson. This was one of the lectures of a full course I taught in University of Moratuwa, Sri Lanka on 2023 second half of the year.
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Maninda Edirisooriya
Model Testing and Evaluation is a lesson where you learn how to train different ML models with changes and evaluating them to select the best model out of them. This was one of the lectures of a full course I taught in University of Moratuwa, Sri Lanka on 2023 second half of the year.
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Maninda Edirisooriya
Decision Trees and Ensemble Methods is a different form of Machine Learning algorithm classes. This was one of the lectures of a full course I taught in University of Moratuwa, Sri Lanka on 2023 second half of the year.
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...Maninda Edirisooriya
This lesson covers the core data science related content required for applying ML. This was one of the lectures of a full course I taught in University of Moratuwa, Sri Lanka on 2023 second half of the year.
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Maninda Edirisooriya
Bias and Variance are the deepest concepts in ML which drives the decision making of a ML project. Regularization is a solution for the high variance problem. This was one of the lectures of a full course I taught in University of Moratuwa, Sri Lanka on 2023 second half of the year.
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...Maninda Edirisooriya
Gradient Descent is the most commonly used learning algorithm for learning, including Deep Neural Networks with Back Propagation. This was one of the lectures of a full course I taught in University of Moratuwa, Sri Lanka on 2023 second half of the year.
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Maninda Edirisooriya
Simplest Machine Learning algorithm or one of the most fundamental Statistical Learning technique is Linear Regression. This was one of the lectures of a full course I taught in University of Moratuwa, Sri Lanka on 2023 second half of the year.
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...Maninda Edirisooriya
Exploratory Data Analytics (EDA) is a data Pre-Processing, manual data summarization and visualization related discipline which is an earlier phase of data processing. This was one of the lectures of a full course I taught in University of Moratuwa, Sri Lanka on 2023 second half of the year.
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...Maninda Edirisooriya
Introduction to Statistical and Machine Learning. Explains basics of ML, fundamental concepts of ML, Statistical Learning and Deep Learning. Recommends the learning sources and techniques of Machine Learning. This was one of the lectures of a full course I taught in University of Moratuwa, Sri Lanka on 2023 second half of the year.
Analyzing the effectiveness of mobile and web channels using WSO2 BAMManinda Edirisooriya
This document summarizes a presentation about using WSO2 BAM to analyze the effectiveness of mobile and web channels for e-commerce. It discusses how both channels have advantages and are growing in popularity for business applications. WSO2 BAM is presented as a solution to monitor usage patterns of both channels, including user behavior, interactions, and preferences. A demo is shown of WSO2 BAM monitoring an online ticket booking system with both a web app and mobile app to analyze and compare usage of each channel.
WSO2 BAM is a big data analytics and monitoring tool that provides scalable data flow, storage, and processing. It allows users to publish data, analyze and summarize it using Hadoop clusters and Cassandra storage, and then visualize the results. The document discusses WSO2 BAM's architecture and configuration options. It also describes its out-of-the-box monitoring and analytics solutions for services, APIs, applications, and platforms.
The document provides an overview of the training organization Zone24x7. It describes Zone24x7 as a technological company that provides hardware and software solutions. It details Zone24x7's organizational structure, products and services, partners and clients, and an assessment of its current position including strengths, weaknesses and suggestions. The training experience involved working on various software development projects at Zone24x7 to gain exposure to tools, technologies and company practices.
The document is a final project report submitted by four students for their Bachelor's degree. It presents the Geo-Data Visualization Framework (GViz) developed as part of the project. The framework enables visualization of geospatial data on the web using existing JavaScript APIs and libraries. It describes the design and implementation of GViz over multiple iterations to address common challenges in visualizing geographic data.
This document provides an overview of a remotely operated toy car project. It outlines the main requirements, functionality, features, implementation challenges, production process, and marketing plan. The key requirements are for the car to be operated remotely via a computer and wireless camera. Functionality includes transmitting control signals from the computer to a receiver and microcontroller in the car. Challenges include minimizing circuit size and integrating components. The production process involves specialized team roles, programming, and interfacing. Marketing targets children and emphasizes the affordable price and attractive design.
This document describes an encryption Chrome extension for online chat. The extension encrypts chat text using 128-bit AES encryption with a common password between users. It was created to provide a cheap, private chat solution without third parties analyzing conversations or filtering keywords. The extension works by encrypting text on one end, sending the encrypted ciphertext over the network, and decrypting it on the other end. It sets passwords by hashing them to generate an encryption key and encrypts/decrypts text by breaking it into blocks and applying the AES cipher. The document demonstrates how to use the extension for encrypted chat and discusses its limitations, such as an inability to send emojis or a key sharing mechanism.
Digital Crime – Substantive Criminal Law – General Conditions – Offenses – In...ManiMaran230751
Digital Crime – Substantive Criminal Law – General Conditions – Offenses – Investigation Methods for
Collecting Digital Evidence – International Cooperation to Collect Digital Evidence.
"The Enigmas of the Riemann Hypothesis" by Julio ChaiJulio Chai
In the vast tapestry of the history of mathematics, where the brightest minds have woven with threads of logical reasoning and flash-es of intuition, the Riemann Hypothesis emerges as a mystery that chal-lenges the limits of human understanding. To grasp its origin and signif-icance, it is necessary to return to the dawn of a discipline that, like an incomplete map, sought to decipher the hidden patterns in numbers. This journey, comparable to an exploration into the unknown, takes us to a time when mathematicians were just beginning to glimpse order in the apparent chaos of prime numbers.
Centuries ago, when the ancient Greeks contemplated the stars and sought answers to the deepest questions in the sky, they also turned their attention to the mysteries of numbers. Pythagoras and his followers revered numbers as if they were divine entities, bearers of a universal harmony. Among them, prime numbers stood out as the cornerstones of an infinite cathedral—indivisible and enigmatic—hiding their ar-rangement beneath a veil of apparent randomness. Yet, their importance in building the edifice of number theory was already evident.
The Middle Ages, a period in which the light of knowledge flick-ered in rhythm with the storms of history, did not significantly advance this quest. It was the Renaissance that restored lost splendor to mathe-matical thought. In this context, great thinkers like Pierre de Fermat and Leonhard Euler took up the torch, illuminating the path toward a deeper understanding of prime numbers. Fermat, with his sharp intuition and ability to find patterns where others saw disorder, and Euler, whose overflowing genius connected number theory with other branches of mathematics, were the architects of a new era of exploration. Like build-ers designing a bridge over an unknown abyss, their contributions laid the groundwork for later discoveries.
Video Games and Artificial-Realities.pptxHadiBadri1
🕹️ #GameDevs, #AIteams, #DesignStudios — I’d love for you to check it out.
This is where play meets precision. Let’s break the fourth wall of slides, together.
Structural Health and Factors affecting.pptxgunjalsachin
Structural Health- Factors affecting Health of Structures,
Causes of deterioration in RC structures-Permeability of concrete, capillary porosity, air voids, Micro cracks and macro cracks, corrosion of reinforcing bars, sulphate attack, alkali silica reaction
Causes of deterioration in Steel Structures: corrosion, Uniform deterioration, pitting, crevice, galvanic, laminar, Erosion, cavitations, fretting, Exfoliation, Stress, causes of defects in connection
Maintenance and inspection of structures.
Tesia Dobrydnia brings her many talents to her career as a chemical engineer in the oil and gas industry. With the same enthusiasm she puts into her work, she engages in hobbies and activities including watching movies and television shows, reading, backpacking, and snowboarding. She is a Relief Senior Engineer for Chevron and has been employed by the company since 2007. Tesia is considered a leader in her industry and is known to for her grasp of relief design standards.
May 2025: Top 10 Cited Articles in Software Engineering & Applications Intern...sebastianku31
The International Journal of Software Engineering & Applications (IJSEA) is a bi-monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Software Engineering & Applications. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on understanding Modern software engineering concepts & establishing new collaborations in these areas.
MODULE 5 BUILDING PLANNING AND DESIGN SY BTECH ACOUSTICS SYSTEM IN BUILDINGDr. BASWESHWAR JIRWANKAR
: Introduction to Acoustics & Green Building -
Absorption of sound, various materials, Sabine’s formula, optimum reverberation time, conditions for good acoustics Sound insulation:
Acceptable noise levels, noise prevention at its source, transmission of noise, Noise control-general considerations
Green Building: Concept, Principles, Materials, Characteristics, Applications
Bituminous binders are sticky, black substances derived from the refining of crude oil. They are used to bind and coat aggregate materials in asphalt mixes, providing cohesion and strength to the pavement.
UNIT-1-PPT-Introduction about Power System Operation and ControlSridhar191373
Power scenario in Indian grid – National and Regional load dispatching centers –requirements of good power system - necessity of voltage and frequency regulation – real power vs frequency and reactive power vs voltage control loops - system load variation, load curves and basic concepts of load dispatching - load forecasting - Basics of speed governing mechanisms and modeling - speed load characteristics - regulation of two generators in parallel.
This presentation provides a comprehensive overview of a specialized test rig designed in accordance with ISO 4548-7, the international standard for evaluating the vibration fatigue resistance of full-flow lubricating oil filters used in internal combustion engines.
Key features include:
May 2025 - Top 10 Read Articles in Artificial Intelligence and Applications (...gerogepatton
The International Journal of Artificial Intelligence & Applications (IJAIA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the Artificial Intelligence & Applications (IJAIA). It is an international journal intended for professionals and researchers in all fields of AI for researchers, programmers, and software and hardware manufacturers. The journal also aims to publish new attempts in the form of special issues on emerging areas in Artificial Intelligence and applications.
2. Classification
• When the Y variable of a Supervised Learning problem is of several
discreate classes (e.g.: Color, Age groups) the problem is known as a
Classification problem
• A Classification problem has to predict/select a certain Category (or a
Class) as the dependent variable
• When there are only 2 classes to be classified, it is known as a Binary
Classification problem
E.g.: Predicting a person’s gender (either as male or female) by testosterone
concentration in blood, height and bone density
3. Binary Classification
• Output classes of a binary classification can be represented by either
• Boolean values, True or False (or Positive or Negative)
• Numbers 1 or 0
• True or 1 value is used for the Positive Class for one class which is
generally the class we want to analyze
• False or 0 value is used for the Negative Class for the other class
• E.g.: For classifying a tumor as malignant (a cancer) or benign (not a
cancer) by the tumor size, being malignant can be taken as the
Positive class and the benign class as the Negative class
5. Binary Classification – with Linear Regression
0 (Benign)
1 (Malignant)
X
Y Linear
Regression
Classifier
0.5
Malignant
Benign
6. Binary Classification – Problem with LR
0 (Benign)
1 (Malignant)
X
Y
Linear
Regression
Classifier
0.5
Malignant
Benign
Misclassified
7. Binary Classification – Requirement
0 (Benign)
1 (Malignant)
X
Y
Linear
Regression
Classifier
0.5
Malignant
Benign
Required Regression Classifier
(Variant of Unit Step Function)
8. Binary Classification – Requirement
0 (Benign)
1 (Malignant)
X
Y
Linear
Regression
Classifier
0.5
Malignant
Benign
Not Differentiable here
for Gradient Descent
Required Regression Classifier
(Variant of Unit Step Function)
9. Binary Classification – Requirement
0 (Benign)
1 (Malignant)
X
Y
Linear
Regression
Classifier
0.5
Malignant
Benign
Continuous
Regression
Classifier
10. Logistic/Sigmoid Function
• Sigmoid function: 𝐟 𝐳 =
𝟏
𝟏+𝐞−𝐳
Z = 0 ⇒ f(Z) = 0.5
0 < f(Z) < 1
• A Non-linear function
• This is a continuous alternative
for the Unit Step Function
Z
f(Z)
11. Logistic Regression
Like Linear Regression say, Z = β0 + β1*X1 + β2*X2 + ... + βn*Xn
Logistic Function, f Z =
1
1+e−z
f X =
1
1 + e−(β0 + β1∗X1 + β2∗X2 + ... + βn∗Xn)
In vector form,
f X =
1
1 + e−βTX
where β0 = β0*X0 taking X0 = 1
This is the function of Logistic Regression.
12. Logistic Regression - Prediction
Let’s take predictions as f(X) = ቊ
1 (or Positive) if, f x ≥ 0.5
0 (or Negative) if, f x < 0.5
f(X) = ൞
Positive ⇒ f X ≥ 0.5 ⇒
1
1+e−βTX
≥ 0.5 ⇒ βTX ≥ 0
Negative ⇒ f X < 0.5 ⇒
1
1+e−βTX
< 0.5 ⇒ βTX < 0
Here, βTX = β0 + β1*X1 + β2*X2 + ... + βn*Xn
13. Prediction Example
Take a classification problem with 2 independent variables where,
f(X) =
1
1+e−(β0 + β1∗X1 + β2∗X2)
Negative
Positive
X2
Z = β0 + β1*X1 + β2*X2
(Decision boundary)
Z > 0
Positive
Z < 0
Negative
X1
14. Non-linear Classification
Taking polynomials of X values (as discussed in Polynomial Regression)
can classify non-linear data points with non-linear decision boundaries
E.g.:
f(X) =
1
1+e− (β0 + β1∗X1
2 + β2∗X2
2)
Negative Positive
X2
Z = β0 + β1∗X1
2 + β2∗X2
2
(Decision boundary)
Z > 0
Positive
Z < 0
Negative
X1
15. Binary Logistic Regression – Cost Function
Cost for a single data point is known as the Loss
Take the Loss Function of Logistic Regression as L{f(X)}
L f X , Y = ቊ
− log f(X) if Y = 1
− log 1 − f(X) if Y = 0
L f X , Y = −Y log f(X) −(1 − Y) log 1 − f(X)
Cost function: J(β) =
1
n
σ𝑖=1
n
L f x , Y
J(β) =
1
n
𝑖=1
n
[−Y log f(X) − (1 − Y) log 1 − f(X) ]
This Cost Function is Convex (has a Global Minimum)
16. Multiclass Logistic Regression
• Up to now we have looked at Binary Classification problems where
there can be only two outcomes/categories/classes as the Y variable
• When there are more than 2 classes available (only one of them is
positive for any given data point) the problem becomes a Multiclass
Classification problem
• One way to handle Multiclass Classification is using the Binary
Classifiers known as One-vs-All (OvA), also known as one-vs-rest
(OvR)
• It trains multiple binary classifiers, each one predicting the confidence
(probability) of one class against the rest, and the highest class is selected
17. Multiclass Logistic Regression
• OvA can be used
• When you want to use different binary classifiers (e.g., SVMs or logistic
regression) for each class
• When available memory is limited or need to highly parallelize
• There is another technique for Multiclass Logistic Regression by
simply generalizing the binary classification problem of the Logistic
Regression
• This General form of Classifier is known as the Softmax Classifier
• There, the Softmax Function is used instead of the Sigmoid function
when there are multiple classes
18. Softmax Function
• The name Softmax is used, as it is a continuous function
approximation to the Maximum Function, where only one class
(maximum) is allowed to be considered as Positive
• Softmax function is used instead of the Maximum Function to make
the function differentiable
• Softmax Function: S(Xi) =
𝐞𝐱𝐢
𝐣=𝟏
𝐧
𝐞
𝐱𝐣
where i is any data point and j is the index of the dimension of the vector Xi
19. Softmax Function
• Softmax function exponentially highlights the value in the dimension
where the value is maximum, while suppressing all other dimensions
• Output values of a vector from a Softmax function sums to 1
• E.g.: Input Vector Output Vector
Softmax Function
20. Softmax Regression
• Like Z = βTX is the used for binary classification, Zk = βk
TX is used for
Multiclass classification, where k is the index of the class
• Note that there K number of β vectors exists as model parameters
• Like Y is used for binary classification where there is only a single
dependent variables, Multiclass classification has K dependent
variables, each denoted by Yk and its estimator
𝐘𝐤
𝐘𝐤 =
𝐞𝐙𝒌
𝐣=𝟏
𝐊
𝐞
𝐙𝐣
21. Softmax Regression
Loss function:
L f X , Y = -log(
Yk) = -log(
eZ𝑘
j=1
K
e
Zj
) = -log(
eβk
TX
j=1
K
e
βj
TX
)
Cost function (Cross Entropy Loss):
J(β) = − ා
𝑖=1
N
Σk=1
K
I[Yi = k]log(
eβk
TX
j=1
K
e
βj
TX
)
22. One Hour Homework
• Officially we have one more hour to do after the end of the lectures
• Therefore, for this week’s extra hour you have a homework
• Logistic Regression is the basic building block of Deep Neural Networks (DNN).
Softmax classifiers are used as it is in DNNs as the final classification layer
• Go through the slides and get a clear understanding on Logistic and Softmax
Regressions
• Refer external sources to clarify all the ambiguities related to it
• Good Luck!