0% found this document useful (0 votes)
9 views19 pages

Binary Data

The document discusses the analysis of discrimination in mortgage applications using a linear regression model, focusing on how race may affect the likelihood of application denial. It highlights the importance of comparing denial rates between minority and white applicants while controlling for other factors, specifically using a binary dependent variable. The findings suggest that African-American applicants have a higher probability of denial compared to white applicants, but caution is advised as other influencing factors may exist.

Uploaded by

dunsscoto24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views19 pages

Binary Data

The document discusses the analysis of discrimination in mortgage applications using a linear regression model, focusing on how race may affect the likelihood of application denial. It highlights the importance of comparing denial rates between minority and white applicants while controlling for other factors, specifically using a binary dependent variable. The findings suggest that African-American applicants have a higher probability of denial compared to white applicants, but caution is advised as other influencing factors may exist.

Uploaded by

dunsscoto24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

ANALYSING DATA WHEN THE OUTCOME

OF INTEREST IS BINARY (i)

Anna Conte – Applied Economics

Learning outcomes:
Discrimination in mortgage application; linear regression model

1/15
Preamble

• Two people, identical but for their race, walk into a bank and apply
for a mortgage, a large loan so that each can buy an identical house.
• Does the bank treat them the same way?
• Are they both equally likely to have their mortgage application
accepted?
• By law, they must receive identical treatment.
• But whether or not they do is a matter of great concern among
bank regulators.

2/15
Do banks discriminate?

• Loans are made and denied for many legitimate reasons.


• For example, if the proposed loan payments take up most or all of
the applicant’s monthly income, then a loan officer might justifiably
deny the loan.
• Also, even loan officers are human and they can make honest
mistakes, so the denial of a single minority applicant does not prove
anything about discrimination.
• Many studies of discrimination thus look for statistical evidence of
discrimination, that is, evidence contained in large data sets showing
that whites and minorities are treated differently.

3/15
How can we test for discrimination?

• How, precisely, should one check for statistical evidence of


discrimination in the mortgage market?

4/15
How can we test for discrimination?

• How, precisely, should one check for statistical evidence of


discrimination in the mortgage market?
• A start is to compare the fraction of minority and white applicants
who were denied a mortgage. How?

4/15
How can we test for discrimination?

• How, precisely, should one check for statistical evidence of


discrimination in the mortgage market?
• A start is to compare the fraction of minority and white applicants
who were denied a mortgage. How?

4/15
How can we test for discrimination?

• How, precisely, should one check for statistical evidence of


discrimination in the mortgage market?
• A start is to compare the fraction of minority and white applicants
who were denied a mortgage. How?

• But this comparison does not really answer the question of interest,
because the black and white applicants are not necessarily “identical
but for their race”.

4/15
How can we test for discrimination?

• How, precisely, should one check for statistical evidence of


discrimination in the mortgage market?
• A start is to compare the fraction of minority and white applicants
who were denied a mortgage. How?

• But this comparison does not really answer the question of interest,
because the black and white applicants are not necessarily “identical
but for their race”.
• Instead, we need a method for comparing rates of denial, holding
4/15
other applicant characteristics constant.
How do we deal with a binary dependent variable?

• This sounds like a job for multiple regression analysis—and it is, but
with a twist.
• The twist is that the dependent variable—whether or not the
applicant is denied—is binary.
• Using binary variables as regressors do not cause particular problems.
• But when the dependent variable is binary, things are more difficult:
what does it mean to fit a line to a dependent variable that can
take on only two values, zero and one?
• The answer to this question is to interpret the regression function as
a predicted probability.

5/15
Binary Dependent Variables and the Linear Regression Model

• The application examined in this chapter is whether race is a factor


in denying a mortgage application.
• The binary dependent variable is whether or not a mortgage
application is denied.
• The data set is simulated mimicking the “Boston HMDA data”,
data set compiled by researchers at the Federal Reserve Bank of
Boston under the Home Mortgage Disclosure Act (HMDA), and
relate to mortgage applications filed in the Boston, Massachusetts,
area.

6/15
Data description

Summary statistics and description of Boston HMDA data


Variable Description Mean
deny 1 if mortgage application denied, 0 otherwise 0.222
PIratio Ratio of total monthly debt payments to total monthly income 0.324
HIratio Ratio of monthly housing expenses to total monthly income 0.257
LVratio Ratio of size of loan to assessed value of property 0.735
SelfEmployed 1 if self-employed, 0 otherwise 0.113
single 1 if applicant reported being single, 0 otherwise 0.390
black 1 if applicant is black, 0 if white 0.142
HSdiploma 1 if applicant graduated from high school, 0 otherwise 0.947
Number of observations 2380

7/15
Relevant information

• To concede a loan, a loan officer must forecast whether or not the


applicant can afford his or her loan payments.
• One important piece of information is, for example, the size of the
required loan payments relative to the applicant’s income (it is much
easier to make payments that are 10% of your income than 50%!)
• We therefore begin by looking at the relationship between two
variables:
• the binary dependent variable ‘deny’, which equals one if the
mortgage application was denied and equals zero if it was
accepted;
• the continuous variable ‘PIratio’, which is the ratio of the
applicant’s anticipated total monthly loan payments to his or
her monthly income.

8/15
Scatterplot of Mortgage Application Denial and the Payment-
to-Income Ratio

The superimposed line represents the prediction of the linear regression


model of deny against PIratio. 9/15
The linear regression model

• The plot is not that clear, in that it does not show a clear pattern.
However, the superimposed line does.
• The line plots the predicted value of deny as a function of the
regressor, the payment-to-income ratio, using a linear regression
model.
• For example, when P/I ratio = 0.3, the predicted value of deny is
0.2. But what, precisely, does it mean for the predicted value of the
binary variable deny to be 0.2?
• The key to answering this question is to interpret the regression as
modelling the probability that the dependent variable equals one.
• Thus, the predicted value of 0.2 is interpreted as meaning that,
when P/I ratio is 0.3, the probability of denial is estimated to be
20%. Said differently, if there were many applications with PIratio
= 0.3, then 20% of them would be denied. 10/15
Application to the Boston HMDA data

• The OLS regression of the binary dependent variable, deny, against


the payment-to-income ratio, PIratio, estimated using all 2,380
observations in our data set is
d = − 0.211 + 1.335 × PIratio
deny (1)
(0.027) (0.078)

• The estimated coefficient on PIratio is positive and statistically


significantly different from zero at the 1% level (t-statistic=17.02).
Thus, applicants with higher debt payments as a fraction of income
are more likely to have their application denied.
• This coefficient can be used to compute the predicted change in the
probability of denial, given a change in the regressor. For example,
according to Equation (1), if the PIratio increases by 0.1, then the
probability of denial increases by 1.335 × 0.1 = 0.133, that is, by
13.3 percentage points.
11/15
Predicting denial probabilities via the linear regression model

• The estimated linear regression model can be used to compute


predicted denial probabilities as a function of the PIratio.
• For example, if projected debt payments are 30% of an applicant’s
income, then the PIratio is 0.3 and the predicted value from
Equation (1) is −0.211 + 1.335 × 0.3 = 0.189. That is, according to
this linear probability model, an applicant whose projected debt
payments are 30% of income has a probability of 18.9% that their
application will be denied.
• What is the effect of race on the probability of denial, holding
constant the PIratio?

12/15
The effect of race on the probability of denial

• To keep things simple, we focus on differences between black and


white applicants. To estimate the effect of race, holding constant
the PI ratio, we augment Equation (1) with a binary regressor that
equals one if the applicant is black and zero if white.
• The estimated linear regression model is
d = − 0.221 + 1.335 × PIratio + 0.073 × black
deny (2)
(0.027) (0.078) (0.023)

• The coefficient on black indicates that an African-American


applicant has a 7.3% higher probability of having a mortgage
application denied than a white, holding constant their PIratio. This
coefficient is significant at the 1% level (t-statistic=3.17).
• Taken literally, this estimate suggests that there might be racial bias
in mortgage decisions, but such a conclusion would be premature,
because there are many other factors that may influence the
decision. 13/15
Shortcomings of the linear regression model

• The linearity that makes the linear regression model easy to use is
also its major flaw.
• Looking again at the figure, we see that the estimated line
representing the predicted probabilities drops below zero for very
low values of the PIratio and exceeds one for high values!
• But this is nonsense: a probability cannot be less than zero or
greater than one.
• This nonsensical feature is an inevitable consequence of the linear
regression.
• To address this problem, we use nonlinear models specifically
designed for binary dependent variables, the probit and logit
regression models.

14/15
Associated files

• Data sets:
• “HDMA.dta”
• Do files:
• “STATA200303.do”

15/15

You might also like