Probit_Logit_Models
Probit_Logit_Models
Outline
• Linear probability model
• Probit and logit models
• Maximum likelihood estimation
• Coefficients
• Predicted probabilities
• Marginal effects
• Marginal effect at the means
• Average marginal effect
• Goodness of fit measures
• Pseudo R-squared
• Percent correctly predicted
Binary dependent variable
• A binary dependent variable has two outcomes: 0 or 1.
• Examples: working or not working, has insurance or does not have
insurance, etc.
• The outcome of interest is denoted as 1.
• 𝑦𝑦 = 1 if working, 𝑦𝑦 = 0 if not working.
• If the outcome of not working is of interest, then it would be denoted
as 1.
• 𝑦𝑦 = 1 if not working, 𝑦𝑦 = 0 if working.
• There are typically fewer outcomes of interest, i.e. fewer 1s in the
data.
Linear probability model (LPM)
• A linear probability model is a linear regression model where the dependent
variable is a binary variable.
• Linear probability model with binary dependent variable 𝑦𝑦 = 0 or 1.
• 𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥1 + 𝛽𝛽2 𝑥𝑥2 +. . +𝛽𝛽𝑘𝑘 𝑥𝑥𝑘𝑘 + 𝑢𝑢 = 𝑥𝑥𝑥𝑥 + 𝑢𝑢
• where 𝑥𝑥𝑥𝑥 is expressed in a matrix form.
• Expected value of 𝑦𝑦 is 𝐸𝐸(𝑦𝑦) = 𝑥𝑥𝑥𝑥.
• Because the binary variable 𝑦𝑦 has two outcomes 0 or 1, the expected value for 𝑦𝑦
is the probability of 𝑦𝑦 being 1, 𝑃𝑃(𝑦𝑦 = 1).
• 𝐸𝐸(𝑦𝑦) = 1 ∗ 𝑃𝑃(𝑦𝑦 = 1) + 0 ∗ 𝑃𝑃(𝑦𝑦 = 0) = 𝑃𝑃(𝑦𝑦 = 1)
• Example: if 30% of 𝑦𝑦 are 1 and the rest are zero, then 𝐸𝐸 𝑦𝑦 = 𝑃𝑃 𝑦𝑦 = 1 = 0.3
• The linear probability model for the probability of the outcome 𝑦𝑦 = 1 is
𝑃𝑃(𝑦𝑦 = 1) = 𝑥𝑥𝛽𝛽
Advantages and disadvantages of LPM
• Advantages of LPM
• Easy to estimate and interpret (coefficients are marginal effects)
• The coefficients and predictions are reasonably good
• Disadvantages of LPM
• Not the best model for binary dependent variable (probit or logit models are
better)
• Predicted probabilities can be less than 0 or greater than 1
• Marginal effects are the coefficients, which are constant/do not vary with 𝑥𝑥
• Heteroscedasticity because the variance is not constant
• 𝑣𝑣𝑣𝑣𝑣𝑣 𝑦𝑦 = 𝑃𝑃 𝑦𝑦 = 1 ∗ [1 − 𝑃𝑃(𝑦𝑦 = 1)]
Linear versus non-linear probability models
• The linear probability model estimate the probability of 𝑦𝑦 = 1 as a
linear function of the independent variables.
• 𝑃𝑃 𝑦𝑦 = 1 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥1 + 𝛽𝛽2 𝑥𝑥2 +. . +𝛽𝛽𝑘𝑘 𝑥𝑥𝑘𝑘 = 𝑥𝑥𝑥𝑥
• The probit and logit models estimate the probability of 𝑦𝑦 = 1 as a
non-linear function 𝐺𝐺 of the independent variables.
• 𝑃𝑃 𝑦𝑦 = 1 = 𝐺𝐺 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥1 + 𝛽𝛽2 𝑥𝑥2 +. . +𝛽𝛽𝑘𝑘 𝑥𝑥𝑘𝑘 = 𝐺𝐺(𝑥𝑥𝑥𝑥)
• 𝐺𝐺 is a non-linear function that transforms 𝑥𝑥𝑥𝑥 to be between 0 and 1 because
𝑃𝑃(𝑦𝑦 = 1) is a probability.
Normal distribution – pdf and cdf
• The probability density function (pdf) of the normal distribution 𝜙𝜙
shows the probability that 𝑦𝑦 is between two numbers.
• The cumulative density function (cdf) of the normal distribution Φ
shows the probability that 𝑦𝑦 is less than a given number.
pdf cdf
0.45 1.2
0.4
1
0.35
0.8
0.3
0.25 0.6
0.2
0.4
0.15
0.2
0.1
0.05 0
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
0
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
Probit model
• The probit model uses the cumulative density function (cdf) of the
normal distribution Φ.
𝑥𝑥𝑥𝑥
• 𝑃𝑃 𝑦𝑦 = 1 = Φ 𝑥𝑥𝑥𝑥 = ∫−∞ 𝜙𝜙 𝑧𝑧 𝑑𝑑𝑑𝑑
• 𝑃𝑃(𝑦𝑦 = 1) will be a number between 0 and 1 because the cdf of the
normal distribution is a number between 0 and 1.
Logit model
• The logit model uses the logistic function:
exp 𝑥𝑥𝑥𝑥 𝑒𝑒 𝑥𝑥𝑥𝑥
• 𝑃𝑃 𝑦𝑦 = 1 = 𝐺𝐺 𝑥𝑥𝑥𝑥 = =
1+exp 𝑥𝑥𝑥𝑥 1+𝑒𝑒 𝑥𝑥𝑥𝑥
• 𝑃𝑃(𝑦𝑦 = 1) will be a number between 0 and 1 because exp(𝑥𝑥𝑥𝑥) is
positive.
• The probability of 𝑦𝑦 = 0 is:
exp 𝑥𝑥𝑥𝑥 1
• 𝑃𝑃 𝑦𝑦 = 0 = 1 − 𝑃𝑃 𝑦𝑦 = 1 = 1 − =
1+exp 𝑥𝑥𝑥𝑥 1+exp 𝑥𝑥𝑥𝑥
Likelihood function
• The likelihood is the probability that the outcome for observation 𝑖𝑖 is
𝑦𝑦𝑖𝑖 .
• The likelihood of 𝑦𝑦𝑖𝑖 = 1 is 𝑃𝑃 𝑦𝑦𝑖𝑖 = 1 .
• The likelihood of 𝑦𝑦𝑖𝑖 = 0 is 𝑃𝑃 𝑦𝑦𝑖𝑖 = 0 .
𝑦𝑦𝑖𝑖 1−𝑦𝑦𝑖𝑖
• The likelihood function is defined as: 𝑃𝑃 𝑦𝑦𝑖𝑖 = 1 𝑃𝑃 𝑦𝑦𝑖𝑖 = 0
• The likelihood of 𝑦𝑦𝑖𝑖 = 1 is 𝑃𝑃 𝑦𝑦𝑖𝑖 = 1 1 𝑃𝑃 𝑦𝑦𝑖𝑖 = 0 1−1 = 𝑃𝑃 𝑦𝑦𝑖𝑖 = 1
• The likelihood of 𝑦𝑦𝑖𝑖 = 0 is 𝑃𝑃 𝑦𝑦𝑖𝑖 = 1 0 𝑃𝑃 𝑦𝑦 = 0 1−0 = 𝑃𝑃 𝑦𝑦 = 0
𝑖𝑖 𝑖𝑖
Maximum likelihood estimation
• The likelihood function is: 𝑃𝑃 𝑦𝑦𝑖𝑖 = 1 𝑦𝑦𝑖𝑖 𝑃𝑃 𝑦𝑦𝑖𝑖 = 0 1−𝑦𝑦𝑖𝑖
• Taking logs and summing up over all observations 𝑖𝑖.
• The log likelihood function is:
• ∑𝑛𝑛𝑖𝑖=1( 𝑦𝑦𝑖𝑖 ∗ 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑦𝑦𝑖𝑖 = 1 + 1 − 𝑦𝑦𝑖𝑖 ∗ 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑦𝑦𝑖𝑖 = 0 )
• Substituting 𝑃𝑃(𝑦𝑦 = 1) = 𝐺𝐺(𝑥𝑥𝑥𝑥) into the log likelihood function.
• ∑𝑛𝑛𝑖𝑖=1( 𝑦𝑦𝑖𝑖 ∗ log(𝐺𝐺 𝑥𝑥𝑥𝑥 ) + 1 − 𝑦𝑦𝑖𝑖 ∗ log(1 − 𝐺𝐺 𝑥𝑥𝑥𝑥 )
• The 𝛽𝛽 coefficients are obtained by maximizing the log likelihood
function.
Maximum likelihood estimation
• The probit and logit model coefficients are obtained by maximizing
the log likelihood function.
• max ∑𝑛𝑛𝑖𝑖=1( 𝑦𝑦𝑖𝑖 ∗ 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑦𝑦𝑖𝑖 = 1 + 1 − 𝑦𝑦𝑖𝑖 ∗ 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑦𝑦𝑖𝑖 = 0 )
• If the outcome 𝑦𝑦𝑖𝑖 = 1, the predicted probability 𝑃𝑃 𝑦𝑦𝑖𝑖 = 1 is maximized (e.g.
0.8 or 0.9).
• If the outcome 𝑦𝑦𝑖𝑖 = 0, 𝑃𝑃 𝑦𝑦𝑖𝑖 = 0 is maximized or equivalently the predicted
probability 𝑃𝑃 𝑦𝑦𝑖𝑖 = 1 is minimized (e.g. 0.1 or 0.2).
• The maximum likelihood estimators are consistent, asymptotically
normal, and asymptotically efficient if the assumptions hold.
Maximum likelihood estimation versus OLS
estimation
• The probit and logit model coefficients are obtained by maximizing
the log likelihood function (if the outcome 𝑦𝑦 = 1, the predicted
probability 𝑃𝑃 𝑦𝑦 = 1 is maximized)
• max ∑𝑛𝑛𝑖𝑖=1( 𝑦𝑦𝑖𝑖 ∗ 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑦𝑦𝑖𝑖 = 1 + 1 − 𝑦𝑦𝑖𝑖 ∗ 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝑦𝑦𝑖𝑖 = 0 )