Logistic Regression - Exercises
Logistic Regression - Exercises
EXERCISE 1
Dependent variable
Explanatory variables
X1 0.5793
X2 0.6235
X3 1.5112
(1) Calculate the probability that a person continues the family company if he has a FC in the near family,
often talks about financial affairs with his family and for which the family doesn’t expect that he will
continue the FC.
(2) Compare the odds for continuing a FC for someone who has a FC in the near family and someone who
doesn’t have a FC in the near family.
(3) Is the difference that you found in (2) significant? Use the following table in which you find the
confidence intervals for the odds ratios.
(1)Split up the variable ‘acid’ in 3 groups in SPSS. Give the new variable “acidcat” the value 1 when acid
is lower than or equal to 50. Acidcat is 3 when Acid is greater than or equal to 75. Otherwise acidcat is 2.
(2) Build a logistic regression model to estimate the probability that the lymph nodes of the patient are
cancerous. Use age, xray, grade, stage en acidcat as explanatory variables. Use acidcat=1 as reference
category. Estimate this model.
Dependent variable:
• Good: payment behavior: 1 if the customer pays off the leasing contract as was stated in the
contract; 0 otherwise.
Explanatory variables:
(1) Research question: which characteristics of customers have an impact on their payment behavior?
Build a logistic regression model.
Use the appropriate statistical techniques to determine whether these variables significantly
influence the dependent variable. Is your final model a useful model? (you can let SPSS create
dummies for categorical variables OR you can make the dummies yourself – try both options)
(2) Give a clear interpretation of the coefficients of your model in (1) (in terms of odds).
(3) (a) Does your model in (1) lead to good classifications? Discuss the classification table and the ROC
curve.
(b) Adjust the cut off value so that the specificity becomes at least 70%. The classification plot and
boxplots of the predicted probabilities for good=0 and good=1 might help you.
(c) Also compare your results with the classifications you would have when there would be no
explanatory variables in your model. What can you conclude from this?
(d) Perform the Hosmer and Lemeshow test to verify whether there is a difference between the
observed and predicted probabilities.
(4) Discuss the assumptions for your model in (1) (outliers, QMC, QCS)
EXERCISE 4: death penalty.sav
There are 147 murder cases in New Jersey for which the public prosecutor recommends the death penalty.
In all the cases, the suspect was convicted of first-degree murder with a recommendation by the
prosecutor that a death sentence be imposed. Then a penalty trial was conducted to determine whether
the suspect would receive a death sentence or life imprisonment.
Dependent variable:
- culpability: culpability on a scale from 1 to 5 for which aggravating and extenuating circumstances are
taken into account (1 is the lowest culpability)
(1) Research question: which characteristics of suspects, victims and crimes have an impact on the
penalty that the suspect receives? The analysis was performed for the non-black suspects only. Can you
conclude from the output below whether the variables ‘culpability’ (entered as dummy variables) and
‘serious’ influence the fact whether the non-black suspects get the death penalty or not?
Model Summary
(3) A possible solution for this problem is to exclude the cases with the lowest culpability from the
analysis. 44 observations remain. The results can be found in the output below. Discuss these results.
(4) Another solution for the problem is to join two groups. The groups culpability1 and culpability2 can
be joined. This leads to the results in the following output. Discuss.
Exercise 5: The black economy: ESS (European Social Survey), wave 2
For this exercise we use data from the second wave of the European Social Survey. Estimate a logistic
regression model to determine which characteristics have an impact on the fact whether people have in
the past 5 years, paid something in cash with no receipt so as to avoid paying VAT or other taxes. We
work with a limited dataset: only data for Belgium, the Netherlands and Luxembourg are included.
Dependent variable:
- Transform the variable “payavtx” (variable 234 in the dataset) into a dummy variable. This
dummy variable has the value 1 if the respondent admits he has, in the past 5 years, paid
something in cash with no receipt so as to avoid paying VAT or other taxes (informalDummy).
Explanatory variables:
- Age, divided into 3 categories (at most 20, from 21 to 60, older than 60). The middle category is
the reference category (dummies in model: Age61up and Age20down, 21 to 60 is reference
point)
- Gender (female is the reference category)
- Job status, divided into 3 categories (not-working, employee, employer or self-employed = not
employee)
- Highest degree in education, divided into 5 categories (edulvla)
Estimate the model, discuss the significant variables. Check the quality and the assumptions of the
model.
Extra: Exercise 6: pokemon go (student survey 2016-2017.sav)
For this exercise we use the data that were gathered through the student survey that was held two years
ago. We use the data to build a profile of a pokemon-go player through logistic regression analysis.
The dependent variable of our model will be the fact whether someone has installed pokemon-go on his
or her smartphone or not (pok_go). The possible explanatory variables that we have selected are sex,
age, are you a student, where do you live, are you member of a youth movement or sports club, did
you watch pokemon cartoons, in which store do you buy your apps and have you installed snapchat.
(1) Take a look at the variables that you want to include in the model. Do you need to create or
adjust variables? Prepare the dataset and variables so that they are ready to use for logistic
regression.
(2) Estimate the logistic regression model. Is it a useful model? Which variables have a significant
impact? Interpret the significant variables in terms of odds.