0% found this document useful (0 votes)
128 views

Logistic Regression - Exercises

The document contains instructions for several exercises using logistic regression analysis. Exercise 1 involves interpreting logistic regression coefficients to calculate probabilities and compare odds. Exercise 2 involves splitting a variable into categories and building a logistic regression model to predict lymph node cancer probability. Exercise 3 builds a model to predict customer payment behavior using demographic and financial variables. Exercise 4 analyzes factors influencing death penalty outcomes for murder cases. Exercise 5 models informal cash payments to avoid taxes using European Social Survey data. Exercise 6 profiles Pokémon Go players using a student survey dataset.

Uploaded by

Filbertha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
128 views

Logistic Regression - Exercises

The document contains instructions for several exercises using logistic regression analysis. Exercise 1 involves interpreting logistic regression coefficients to calculate probabilities and compare odds. Exercise 2 involves splitting a variable into categories and building a logistic regression model to predict lymph node cancer probability. Exercise 3 builds a model to predict customer payment behavior using demographic and financial variables. Exercise 4 analyzes factors influencing death penalty outcomes for murder cases. Exercise 5 models informal cash payments to avoid taxes using European Social Survey data. Exercise 6 profiles Pokémon Go players using a student survey dataset.

Uploaded by

Filbertha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Exercises on logistic regression

EXERCISE 1

Use the following information to interpret the regression coefficients:

Dependent variable

Y: continue with a family company (FC) with 1 = wants to continue the FC

and 0 = doesn’t want to continue FC

Explanatory variables

X1: FC in the near family: 1 = yes, 0 = no.

X2: often talk about financial affairs in the family: 1 = yes, 0 = no

X3: family expects that I will continue the business: 1 = yes, 0 = no

Coefficients in the logistic regression model: constant -1.8370

X1 0.5793

X2 0.6235

X3 1.5112

(1) Calculate the probability that a person continues the family company if he has a FC in the near family,
often talks about financial affairs with his family and for which the family doesn’t expect that he will
continue the FC.

(2) Compare the odds for continuing a FC for someone who has a FC in the near family and someone who
doesn’t have a FC in the near family.

(3) Is the difference that you found in (2) significant? Use the following table in which you find the
confidence intervals for the odds ratios.

95% confidence intervals

FC in the near family 0.879 3.623

Often talk about financial affairs with the 0.917 3.795


family

Family expects that I will continue the FC 2.132 9.635


Exercise 2: prcancer.sav

(1)Split up the variable ‘acid’ in 3 groups in SPSS. Give the new variable “acidcat” the value 1 when acid
is lower than or equal to 50. Acidcat is 3 when Acid is greater than or equal to 75. Otherwise acidcat is 2.

(2) Build a logistic regression model to estimate the probability that the lymph nodes of the patient are
cancerous. Use age, xray, grade, stage en acidcat as explanatory variables. Use acidcat=1 as reference
category. Estimate this model.

(3) Interpret the coefficients of acidcat in terms of odds.


EXERCISE 3: data credit_cat.sav

A leasing company has the following information about its customers:

Dependent variable:

• Good: payment behavior: 1 if the customer pays off the leasing contract as was stated in the
contract; 0 otherwise.

Explanatory variables:

• Estate: owner of real estate: 1 when owner; 0 when no owner,


• marital status: 1 = married, 2 = living together, 3 = single
à dummy variables m1:1 when married, 0 otherwise

m2: 1 when living together, 0 otherwise

reference category: single

• age: age in years

(1) Research question: which characteristics of customers have an impact on their payment behavior?
Build a logistic regression model.

Use the appropriate statistical techniques to determine whether these variables significantly
influence the dependent variable. Is your final model a useful model? (you can let SPSS create
dummies for categorical variables OR you can make the dummies yourself – try both options)

(2) Give a clear interpretation of the coefficients of your model in (1) (in terms of odds).

(3) (a) Does your model in (1) lead to good classifications? Discuss the classification table and the ROC
curve.

(b) Adjust the cut off value so that the specificity becomes at least 70%. The classification plot and
boxplots of the predicted probabilities for good=0 and good=1 might help you.

(c) Also compare your results with the classifications you would have when there would be no
explanatory variables in your model. What can you conclude from this?

(d) Perform the Hosmer and Lemeshow test to verify whether there is a difference between the
observed and predicted probabilities.

(4) Discuss the assumptions for your model in (1) (outliers, QMC, QCS)
EXERCISE 4: death penalty.sav

There are 147 murder cases in New Jersey for which the public prosecutor recommends the death penalty.
In all the cases, the suspect was convicted of first-degree murder with a recommendation by the
prosecutor that a death sentence be imposed. Then a penalty trial was conducted to determine whether
the suspect would receive a death sentence or life imprisonment.

Logistic regression is applied on the following variables:

Dependent variable:

-death penalty: 1=death penalty, 0= life imprisonment

Possible explanatory variables:

- race_suspect: race of the suspect (1=black, 0: other race)

- race_victim: race of the victim (1=black, 0: other race)

- culpability: culpability on a scale from 1 to 5 for which aggravating and extenuating circumstances are
taken into account (1 is the lowest culpability)

- serious: a rating of the seriousness of the crime, as evaluated by a panel of judges

(1) Research question: which characteristics of suspects, victims and crimes have an impact on the
penalty that the suspect receives? The analysis was performed for the non-black suspects only. Can you
conclude from the output below whether the variables ‘culpability’ (entered as dummy variables) and
‘serious’ influence the fact whether the non-black suspects get the death penalty or not?

Model Summary

-2 Log Cox & Snell Nagelkerke


Step likelihood R Square R Square
1 43.599a .466 .662
a. Estimation terminated at iteration number
20 because maximum iterations has been
reached. Final solution cannot be found.
(2) Explain the problem for this model.

(3) A possible solution for this problem is to exclude the cases with the lowest culpability from the
analysis. 44 observations remain. The results can be found in the output below. Discuss these results.

(4) Another solution for the problem is to join two groups. The groups culpability1 and culpability2 can
be joined. This leads to the results in the following output. Discuss.
Exercise 5: The black economy: ESS (European Social Survey), wave 2

For this exercise we use data from the second wave of the European Social Survey. Estimate a logistic
regression model to determine which characteristics have an impact on the fact whether people have in
the past 5 years, paid something in cash with no receipt so as to avoid paying VAT or other taxes. We
work with a limited dataset: only data for Belgium, the Netherlands and Luxembourg are included.

Dependent variable:
- Transform the variable “payavtx” (variable 234 in the dataset) into a dummy variable. This
dummy variable has the value 1 if the respondent admits he has, in the past 5 years, paid
something in cash with no receipt so as to avoid paying VAT or other taxes (informalDummy).

Explanatory variables:
- Age, divided into 3 categories (at most 20, from 21 to 60, older than 60). The middle category is
the reference category (dummies in model: Age61up and Age20down, 21 to 60 is reference
point)
- Gender (female is the reference category)
- Job status, divided into 3 categories (not-working, employee, employer or self-employed = not
employee)
- Highest degree in education, divided into 5 categories (edulvla)

Estimate the model, discuss the significant variables. Check the quality and the assumptions of the
model.
Extra: Exercise 6: pokemon go (student survey 2016-2017.sav)

For this exercise we use the data that were gathered through the student survey that was held two years
ago. We use the data to build a profile of a pokemon-go player through logistic regression analysis.

The dependent variable of our model will be the fact whether someone has installed pokemon-go on his
or her smartphone or not (pok_go). The possible explanatory variables that we have selected are sex,
age, are you a student, where do you live, are you member of a youth movement or sports club, did
you watch pokemon cartoons, in which store do you buy your apps and have you installed snapchat.

(1) Take a look at the variables that you want to include in the model. Do you need to create or
adjust variables? Prepare the dataset and variables so that they are ready to use for logistic
regression.

(2) Estimate the logistic regression model. Is it a useful model? Which variables have a significant
impact? Interpret the significant variables in terms of odds.

(3) Discuss the quality of the model.

(4) Check whether the assumptions hold for this model.

You might also like