0% found this document useful (0 votes)

32 views43 pages

Multimedia Application L9

The document discusses logistic regression as a foundational supervised machine learning tool for classification, highlighting its importance in natural and social sciences. It explains the differences between generative and discriminative classifiers, the components of a probabilistic machine learning classifier, and the process of training and testing in logistic regression. Additionally, it covers the use of the sigmoid function for probability estimation and the cross-entropy loss function for optimizing model performance.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views43 pages

Multimedia Application L9

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Multimedia

Application
By

Minhaz Uddin Ahmed, PhD

Department of Computer Engineering
Inha University Tashkent.
Email: [Link]@[Link]
Content
 The sigmoid function
 Classification with Logistic Regression .
 Multinomial logistic regression
 Learning in Logistic Regression
 The cross-entropy loss function
Logistic Regression

 Important analytic tool in natural and social sciences

 Baseline supervised machine learning tool for classification
 Is also the foundation of neural networks
Generative and Discriminative
Classifiers
 Naive Bayes is a generative classifier

 by contrast:

 Logistic regression is a discriminative classifier

Generative and Discriminative
Classifiers
Suppose we're distinguishing cat from dog images

imagenet imagenet
Generative Classifier:

• Build a model of what's in a cat image

• Knows about whiskers, ears, eyes
• Assigns a probability to any image:
• how cat-y is this image?

Also build a model for dog images

Now given a new image:

Run both models and see which one fits better
Discriminative Classifier

 Just try to distinguish dogs from cats

Oh look, dogs have collars!

Let's ignore everything else
Finding the correct class c from a document d
in
Generative vs Discriminative Classifiers
 Naive Bayes

 Logistic Regression
posterior

P(c|d)
Components of a probabilistic
machine learning classifier
 Given m input/output pairs (x(i),y(i)):
1. A feature representation of the input. For each input
observation x(i), a vector of features [x1, x2, ... , xn]. Feature j for
input x(i) is xj, more completely xj(i), or sometimes fj(x).
2. A classification function that computes , the estimated class,
via p(y|x), like the sigmoid or softmax functions.
3. An objective function for learning, like cross-entropy loss.
4. An algorithm for optimizing the objective function: stochastic
gradient descent.
The two phases of logistic
regression
 Training: we learn weights w and b using stochastic
gradient descent and cross-entropy loss.

 Test: Given a test example x we compute p(y|x) using learned

weights w and b, and return whichever label (y = 1 or y = 0) is
higher probability
Classification in Logistic Regression

 Positive/negative
sentiment
 Spam/not spam
 Authorship
attribution Alexander Hamilton
(Hamilton or
Madison?)
Text Classification: definition

 Input:
 a document x
 a fixed set of classes C = {c1, c2,…, cJ}

 Output: a predicted class C

Binary Classification in Logistic
Regression
 Given a series of input/output pairs:

(x(i), y(i))
 For each observation x(i)

We represent x(i) by a feature vector [x1, x2,…, xn]

We compute an output: a predicted class (i)  {0,1}
Features in logistic regression

• For feature xi, weight wi tells is how important is xi

• xi ="review contains ‘awesome’": wi = +10
• xj ="review contains ‘abysmal’": wj = -10
• xk =“review contains ‘mediocre’": wk = -2
Features in logistic regression

 The weight wi represents how important that input feature is to the classification
decision, and can be positive (providing evidence that the instance being
classified belongs in the positive class) or negative (providing evidence that the
instance being classified belongs in the negative class). Thus we might expect in a
sentiment task the word awesome to have a high positive weight, and abysmal to
have a very negative weight.
Logistic Regression for one
observation x
 Input observation: vector x = [x1, x2,…, xn]
 Weights: one per feature: W = [w1, w2,…, wn]
 Sometimes we call the weights θ = [θ1, θ2,…, θn]

 Output: a predicted class  {0,1}

(multinomial logistic regression:  {0, 1, 2, 3, 4})

How to do classification

 For each feature xi, weight wi tells us importance of xi

 (Plus we'll have a bias b)
 We'll sum up all the weighted features and the bias

 If this sum is high, we say y=1; if low, then y=0

But we want a probabilistic classifier

 We need to formalize “sum is high”.

 We’d like a principled classifier that
gives us a probability, just like Naive
Bayes did
 We want a model that can tell us:
p(y=1|x; θ)
p(y=0|x; θ)
The problem: z isn't a probability,
it's just a number!

 Solution: use a function of z that goes from 0

to 1
The very useful sigmoid or logistic
function
Idea of logistic regression

We’ll compute w∙x+b

And then we’ll pass it through the sigmoid function:
σ(w∙x+b)
And we'll just treat it as a probability
Making probabilities with sigmoids
Making probabilities with sigmoids

 If we apply the sigmoid to the sum of the weighted features, we get a number
between 0 and 1. To make it a probability, we just need to make sure that the
two cases, p(y = 1) and p(y = 0), sum to 1.
Making probabilities with sigmoids

Because

The sigmoid function has the property

Turning a probability into a classifier

0.5 here is called the decision boundary

Turning a probability into a classifier

P(y=1)

wx + b
Turning a probability into a classifier

if w∙x+b > 0
if w∙x+b ≤ 0

We've seen how logistic regression uses the sigmoid function to take weighted
features for an input example x and assign it to the class 1 or 0.
Logistic Regression: a text example on
sentiment classification

 Sentiment example: does y=1 or y=0?

It's hokey . There are virtually no surprises , and the writing is second-rate .

So why was it so enjoyable ? For one thing , the cast is

great . Another nice touch is the music . I was overcome with the urge to
get off the couch and start dancing . It sucked me in , and it'll do the same
to you .
Classifying sentiment for input x

Suppose w =
b = 0.1
Classifying sentiment for input x

S=
sigma
We can build features for logistic regression for
any classification task: period disambiguation

End of sentence

This ends in a period.

The house at 465 Main St. is new.
Classification in (binary) logistic
regression: summary
 Given:
a set of classes: (+ sentiment,- sentiment)
a vector x of features [x1, x2, …, xn]
x1= count( "awesome")
x2 = log(number of words in review)
A vector w of weights [w1, w2, …, wn]
w for each feature fi
i
Learning: Cross-Entropy Loss

 where did the W’s come from

Supervised classification:
• We know the correct label y (either 0 or 1) for
each x.
• But what the system produces is an estimate,
We want to set w and b to minimize the distance
between our estimate (i) and the true y(i).
• We need a distance estimator: a loss function
or a cost function
• We need an optimization algorithm to update w
and b to minimize the loss.
Learning components

A loss function:
◦ cross-entropy loss

An optimization algorithm:
◦ stochastic gradient descent
Learning components

 This requires two components. The first is a metric for how close the current label
(yˆ) is to the true gold label y. Rather than measure similarity, we usually talk about
the opposite of this: the distance between the system output and the gold output,
and we call this distance
 the loss function or the cost function. We'll introduce the loss function that is
commonly used for logistic regression and also for neural networks, the cross-
entropy loss.
The second thing we need is an optimization algorithm for iteratively updating
 the weights so as to minimize this loss function. The standard algorithm for this is
gradient descent;
The distance between and y

We want to know how far is the classifier output:

= σ(w∙x+b)

from the true output:

y [= either 0 or 1]

We'll call this difference:

L(,y) = how much differs from the true y
Intuition of negative log likelihood
loss
= cross-entropy loss
 A case of conditional maximum likelihood
estimation
 We choose the parameters w,b that maximize
• the log probability
• of the true y labels in the training data
• given the observations x
Deriving cross-entropy loss for a
single observation x
Goal: maximize probability of the correct label p(y|x)
Since there are only 2 discrete outcomes (0 or 1) we can express
the probability p(y|x) from our classifier (the thing we want to
maximize) as

noting:
if y=1, this simplifies to
if y=0, this simplifies to 1-
Deriving cross-entropy loss for a
single observation x
Goal: maximize probability of the correct label p(y|x)
Maximize:

 Now take the log of both sides (mathematically handy)

Maximize:

 Whatever values maximize log p(y|x) will also maximize p(y|x)

Reference

Chapter 5
Question
Thank you

Logistic Regression
No ratings yet
Logistic Regression
78 pages
Logisticregression 2021
No ratings yet
Logisticregression 2021
78 pages
Ch03 LogisticRegression
No ratings yet
Ch03 LogisticRegression
79 pages
Logistic Regression for NLP
No ratings yet
Logistic Regression for NLP
64 pages
Logistic Regression
No ratings yet
Logistic Regression
79 pages
Logistic Regression
No ratings yet
Logistic Regression
91 pages
5 LR Apr 7 2021
No ratings yet
5 LR Apr 7 2021
94 pages
5 LR Apr 7 2021
No ratings yet
5 LR Apr 7 2021
93 pages
Lecture 5 - Logistic Regression
No ratings yet
Lecture 5 - Logistic Regression
28 pages
Logistic Regression: "And How Do You Know That These Fine Begonias Are Not of Equal Importance?"
No ratings yet
Logistic Regression: "And How Do You Know That These Fine Begonias Are Not of Equal Importance?"
25 pages
Deep Learning Week 204-4
No ratings yet
Deep Learning Week 204-4
1 page
Ed3book - Jan72023 87 110
No ratings yet
Ed3book - Jan72023 87 110
24 pages
Logistic Regression: "And How Do You Know That These Fine Begonias Are Not of Equal Importance?"
No ratings yet
Logistic Regression: "And How Do You Know That These Fine Begonias Are Not of Equal Importance?"
21 pages
ML - MU - Unit - 2 - Supervised Learning-Classification Techniques
No ratings yet
ML - MU - Unit - 2 - Supervised Learning-Classification Techniques
153 pages
Logistic Regression: Some Slides Adapted From Dan Jurfasky and Brendan O'Connor
No ratings yet
Logistic Regression: Some Slides Adapted From Dan Jurfasky and Brendan O'Connor
53 pages
Logistic Regression Notes
No ratings yet
Logistic Regression Notes
25 pages
Logistic Regression: "And How Do You Know That These Fine Begonias Are Not of Equal Importance?"
No ratings yet
Logistic Regression: "And How Do You Know That These Fine Begonias Are Not of Equal Importance?"
25 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
L14 Logistic Regression
No ratings yet
L14 Logistic Regression
22 pages
04 - Linear-Classification-2024
No ratings yet
04 - Linear-Classification-2024
65 pages
Lecture 05 - Logistic Regression
No ratings yet
Lecture 05 - Logistic Regression
10 pages
06 LogisticRegression
No ratings yet
06 LogisticRegression
29 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
Intro to Logistic Regression
No ratings yet
Intro to Logistic Regression
4 pages
7 Logistic-Regression
No ratings yet
7 Logistic-Regression
63 pages
Logistic Regression Explained
No ratings yet
Logistic Regression Explained
15 pages
Logistic Regression
No ratings yet
Logistic Regression
34 pages
Slide 2
No ratings yet
Slide 2
30 pages
W8 - Logistic Regression
No ratings yet
W8 - Logistic Regression
18 pages
Week 4 Logistic
No ratings yet
Week 4 Logistic
21 pages
Logistic Regression Overview
No ratings yet
Logistic Regression Overview
54 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
Logistic Regression Explained
No ratings yet
Logistic Regression Explained
25 pages
M02Logistic Regression Logistic RegressioLogistic Regressionn
No ratings yet
M02Logistic Regression Logistic RegressioLogistic Regressionn
19 pages
Lecture 11 Logistic
No ratings yet
Lecture 11 Logistic
19 pages
CSCI-43646364 S25 - Lecture 4
No ratings yet
CSCI-43646364 S25 - Lecture 4
92 pages
Logistic Regression - Byimran
No ratings yet
Logistic Regression - Byimran
35 pages
AML AfterMid Merged
No ratings yet
AML AfterMid Merged
389 pages
3-LG Eval
No ratings yet
3-LG Eval
52 pages
Logistic Regression and Sigmoid Function
No ratings yet
Logistic Regression and Sigmoid Function
32 pages
Logistic Regression Explained
No ratings yet
Logistic Regression Explained
10 pages
09 23ECE216 LogisticRegression
No ratings yet
09 23ECE216 LogisticRegression
40 pages
Lec 02 LogisticReg
No ratings yet
Lec 02 LogisticReg
33 pages
2021 Logistic Regression
No ratings yet
2021 Logistic Regression
33 pages
4.logistic Regression
No ratings yet
4.logistic Regression
16 pages
Understanding Logistic Regression
No ratings yet
Understanding Logistic Regression
41 pages
Logistic Regressions
No ratings yet
Logistic Regressions
11 pages
Classification
100% (2)
Classification
105 pages
23 LogisticRegression
No ratings yet
23 LogisticRegression
67 pages
04 LogisticRegression
No ratings yet
04 LogisticRegression
29 pages
Exp 2
No ratings yet
Exp 2
7 pages
Logistic Regression Basics
No ratings yet
Logistic Regression Basics
18 pages
04 Probability and Learning PDF
No ratings yet
04 Probability and Learning PDF
34 pages
Regression vs Classification Algorithms
100% (1)
Regression vs Classification Algorithms
13 pages
CS229 Supplemental Lecture Notes: 1 Binary Classification
No ratings yet
CS229 Supplemental Lecture Notes: 1 Binary Classification
7 pages
Week 7
No ratings yet
Week 7
21 pages
Logistic Regression
No ratings yet
Logistic Regression
26 pages
Understanding Logistic Regression Techniques
No ratings yet
Understanding Logistic Regression Techniques
19 pages
ML Lec 3
No ratings yet
ML Lec 3
4 pages
Unified Video Action Model: Shuang Li Yihuai Gao Dorsa Sadigh Shuran Song
No ratings yet
Unified Video Action Model: Shuang Li Yihuai Gao Dorsa Sadigh Shuran Song
16 pages
Block Cipher
No ratings yet
Block Cipher
17 pages
Week 2
No ratings yet
Week 2
31 pages
Multimedia Application L3
No ratings yet
Multimedia Application L3
49 pages
Multimedia Application L4
No ratings yet
Multimedia Application L4
42 pages
Multimedia Application L2
No ratings yet
Multimedia Application L2
47 pages
Ca 12
No ratings yet
Ca 12
64 pages
Multimedia Application L6
No ratings yet
Multimedia Application L6
63 pages
v06 Oi Mediaclave en
0% (1)
v06 Oi Mediaclave en
87 pages
Blank Job Sheet
No ratings yet
Blank Job Sheet
1 page
APFO/2-M0: Key Protection & Control Functions
No ratings yet
APFO/2-M0: Key Protection & Control Functions
16 pages
APC Back Ups ES-700 User Manual
100% (1)
APC Back Ups ES-700 User Manual
4 pages
Fast & Fluid Management: Manual TM280
No ratings yet
Fast & Fluid Management: Manual TM280
17 pages
Shear Flow in Unsymmetrical Sections
No ratings yet
Shear Flow in Unsymmetrical Sections
3 pages
Dark Manufacturing Hackathon Template Final
No ratings yet
Dark Manufacturing Hackathon Template Final
5 pages
MEC2811 LabLogW1
No ratings yet
MEC2811 LabLogW1
18 pages
Report
No ratings yet
Report
4 pages
(Student) Database Themepark
No ratings yet
(Student) Database Themepark
8 pages
Application Example: Quality Control Turbines: Quality Assurance and Product Definition For The Power Generation Industry
No ratings yet
Application Example: Quality Control Turbines: Quality Assurance and Product Definition For The Power Generation Industry
4 pages
CMake Configuration for RF24 Library
No ratings yet
CMake Configuration for RF24 Library
4 pages
9 - Statefull& Stateless
No ratings yet
9 - Statefull& Stateless
9 pages
AI Transformation in Business
No ratings yet
AI Transformation in Business
9 pages
Word VBA Resize Pictures - VBA and VB - Net Tutorial
No ratings yet
Word VBA Resize Pictures - VBA and VB - Net Tutorial
23 pages
Splunk Enterprise Security
No ratings yet
Splunk Enterprise Security
2 pages
Skills Inventory
No ratings yet
Skills Inventory
2 pages
Dell Pro Wireless Keyboard and Mouse Km5221w Data Sheet
No ratings yet
Dell Pro Wireless Keyboard and Mouse Km5221w Data Sheet
4 pages
Introduction To Drafting and Design
No ratings yet
Introduction To Drafting and Design
23 pages
Paypal Credit Cashout Guide
100% (2)
Paypal Credit Cashout Guide
6 pages
Oracle: Question & Answers
No ratings yet
Oracle: Question & Answers
11 pages
WinCC Advanced V13.0 SP1 - Create A Report
No ratings yet
WinCC Advanced V13.0 SP1 - Create A Report
2 pages
BLK 3D Data Sheet UK LQ
No ratings yet
BLK 3D Data Sheet UK LQ
2 pages
Cit202 Applications of Computer in Business Nounelearn
No ratings yet
Cit202 Applications of Computer in Business Nounelearn
48 pages
Person With Disability Registration
No ratings yet
Person With Disability Registration
2 pages
17.8.2 Packet Tracer Skills Integration Challenge
No ratings yet
17.8.2 Packet Tracer Skills Integration Challenge
10 pages
Contempo M1 Elements and Principles of Arts
No ratings yet
Contempo M1 Elements and Principles of Arts
48 pages
Baldev 2018
No ratings yet
Baldev 2018
9 pages
Essay Writing 2 Course Guide
No ratings yet
Essay Writing 2 Course Guide
5 pages
Slide 02 - The System Unit Processing and Memory
No ratings yet
Slide 02 - The System Unit Processing and Memory
30 pages

Multimedia Application L9

Uploaded by

Multimedia Application L9

Uploaded by

Multimedia

Minhaz Uddin Ahmed, PhD

 Important analytic tool in natural and social sciences

 Logistic regression is a discriminative classifier

• Build a model of what's in a cat image

Also build a model for dog images

Now given a new image:

 Just try to distinguish dogs from cats

Oh look, dogs have collars!

 Test: Given a test example x we compute p(y|x) using learned

 Output: a predicted class C

We represent x(i) by a feature vector [x1, x2,…, xn]

• For feature xi, weight wi tells is how important is xi

 Output: a predicted class  {0,1}

(multinomial logistic regression:  {0, 1, 2, 3, 4})

 For each feature xi, weight wi tells us importance of xi

 If this sum is high, we say y=1; if low, then y=0

 We need to formalize “sum is high”.

 Solution: use a function of z that goes from 0

We’ll compute w∙x+b

The sigmoid function has the property

0.5 here is called the decision boundary

 Sentiment example: does y=1 or y=0?

So why was it so enjoyable ? For one thing , the cast is

This ends in a period.

 where did the W’s come from

We want to know how far is the classifier output:

from the true output:

We'll call this difference:

 Now take the log of both sides (mathematically handy)

 Whatever values maximize log p(y|x) will also maximize p(y|x)

You might also like