MULTICATEGORY
LOGIT MODELS
Del Rosario, RP | Perez, JJ
Nominal Responses
One response variable Y with J levels
One or more explanatory or predictor variables
quantitative, qualitative or both
Logistic Regression
Forming Logits
When J = 2, Y is dichotomous
log of success odds that an event occurs or does not
occur:
logit () = log
When J > 2,
Multicategory or Polytomous response variable
(1)
2
There are
logits that can be formed but only
(J 1) are non-redundant
Categorical Logit Models
Nominal response
Multinomial logistic regression/Baseline Logits
Ordinal response
Ordinal logistic regression
Cumulative Logits/Proportional Odds Model
Adjacent Categories
Continuous Ratio
Multicategory Logits
Model simultaneously all relationships between
probabilities for pairs of categories (vs Binary Logistic
Regression)
Optimal efficiency
Estimates of the model parameters smaller SE than the
estimates obtained by fitting the equations separately.
For simultaneous fitting, the same parameter estimates occur for a
pair of categories no matter which category is baseline.
They describe the odds of response in one category rather
than another.
Baseline Category Logits
Logit models for nominal responses pair each response
category with a baseline category.
The choice of baseline category is arbitrary.
If the last category (J) is the baseline, the baseline
category logits are:
, = 1, ,
Given that the response falls in category j or J, this is the
log odds that the response is j.
For J = 3, for instance, the logit model uses log ( 1/2 )
and log (2 /3)
Baseline Category Logits
The logit models using the baseline-category logits with a predictor
x has
log
= + , = 1, ,
Parameters in the (J 1) equations determine parameters for
logits using all other pairs of response categories.
For instance, for an arbitrary pair of categories a and b
/
log
= log
= log
log
= + +
= +
Example 1: Alligator Food Choice
A study looking into factors influencing the primary food
choice of alligators in the wild
59 alligators were sampled, and the data shows the
alligator length (in meters) and the primary food type, in
volume, found in the alligators stomach
Food type has three categories: Fish (1), Invertebrate
(2), and Other (3)
Example 1: Alligator Food Choice
Table 1. Alligator size (meter) and Primary food choice
. mlogit food size, b(3)
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
4:
5:
log
log
log
log
log
log
likelihood
likelihood
likelihood
likelihood
likelihood
likelihood
=
=
=
=
=
=
-57.570928
-49.97414
-49.186349
-49.170647
-49.170622
-49.170622
Using STATA
Multinomial logistic regression
Number of obs
LR chi2(2)
Prob > chi2
Pseudo R2
Log likelihood = -49.170622
Std. Err.
P>|z|
=
=
=
=
59
16.80
0.0002
0.1459
food
Coef.
[95% Conf. Interval]
size
_cons
-.110109
1.617731
.517082
1.307275
-0.21
1.24
0.831
0.216
-1.123571
-.9444801
.9033531
4.179943
size
_cons
-2.465446
5.697444
.8996503
1.793809
-2.74
3.18
0.006
0.001
-4.228728
2.181644
-.702164
9.213244
(base outcome)
. estat ic
Model
Obs
ll(null)
ll(model)
df
AIC
BIC
59
-57.57093
-49.17062
106.3412
114.6514
Using R
Example 1: Alligator Food Choice
Y = primary food choice ; X = length of alligator
Estimated log odds that primary food choice of alligators is fish
rather than other types:
1
log
= 1.618 0.110
3
Estimated log odds that primary food choice of alligators is
invertebrate rather than other types:
2
log
= 5.697 2.465
3
Example 1: Alligator Food Choice
What about estimated log odds that primary food choice of
alligators is fish rather than invertebrate?
1
log
= 1.618 5.697 + 0.110 (2.465)
2
log
1
2
= 4.080 + 2.355
For every 1 meter increase in length of the alligator, the odds of
choosing fish rather than an invertebrate as primary food increase
by 2.355 = 10.54 times.
Example 1: Alligator Food Choice
Hypothesis testing on the effect of length as predictor:
Ho : j = 0 for j = 1, 2
LR = 16.8, p = 0.0002
Strong effect of length of alligator on food choice
Estimated Probabilities
=
( + )
=1 (
+ )
Denominator = same for each probability
Numerator = various j sum to the denominator
Parameters = zero for whichever the category is baseline in the
logit expression
Estimated Probabilities
1.62 0.011
1 =
1 + 1.62 0.011 + (5.70 2.47)
(5.70 2.47)
2 =
1 + 1.62 0.011 + (5.70 2.47)
1
3 =
1 + 1.62 0.011 + (5.70 2.47)
Example 2: Job Satisfaction and Income
The researchers seek to find the relationship between Y = job
satisfaction and X1 = income, stratied by X2= gender (1=F, 2=M),
for black Americans
Iteration
1:
log likelihood
= -103.35145
. mlogit satisfaction
income gender
[weight=count], b(1)
Iteration
2:
log
likelihood
= -102.92608
(frequency weights assumed)
Iteration 3:
log likelihood = -102.91365
Iteration
4:
log
Iteration 0:
log likelihood
likelihood == -102.91362
-107.39082
Iteration
log
Iteration 5:
1:
log likelihood
likelihood == -102.91362
-103.35145
Iteration 2:
log likelihood = -102.92608
Multinomial
regression= -102.91365
Number of obs
=
104
Iteration 3:logistic
log likelihood
LR chi2(6)
=
8.95
Iteration 4:
log likelihood = -102.91362
Iteration 5:
log likelihood = -102.91362
Prob > chi2
=
0.1762
Log likelihood = -102.91362
Pseudo R2
=
0.0417
Multinomial logistic regression
Number of obs
=
104
LR chi2(6)
=
8.95
Prob
>
chi2
=
0.1762
satisfaction
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
Log likelihood = -102.91362
Pseudo R2
=
0.0417
1
(base outcome)
2satisfaction
income
1
gender
_cons
2
income
gender
income
_cons
gender
_cons
3
income
4
gender
income
_cons
gender
4
_cons
income
gender
. estat _cons
ic
. estat
ic
Model
.
Model
Coef.
Std. Err.
P>|z|
[95% Conf. Interval]
.9239423
.7752856
(base outcome)
.1239678
1.317757
-.583335
1.990687
1.19
0.09
-0.29
0.233
0.925
0.769
-.5955895
-2.458788
-4.485009
2.443474
2.706724
3.318339
.9239423
.1239678
1.157282
-.583335
.005601
.5385145
1.157282
.005601
1.560782
.5385145
.1884805
-1.81048
1.560782
.1884805
-1.81048
1.19
0.09
1.57
-0.29
0.00
0.29
1.57
0.00
2.04
0.29
0.15
-0.92
2.04
0.15
-0.92
0.233
0.925
0.117
0.769
0.996
0.770
0.117
0.996
0.042
0.770
0.883
0.360
0.042
0.883
0.360
-.5955895
-2.458788
-.2907792
-4.485009
-2.390357
-3.072897
-.2907792
-2.390357
.0595581
-3.072897
-2.332134
-5.685582
.0595581
-2.332134
-5.685582
2.443474
2.706724
2.605344
3.318339
2.401559
4.149926
2.605344
2.401559
3.062005
4.149926
2.709095
2.064621
3.062005
2.709095
2.064621
.7752856
1.317757
.7388206
1.990687
1.22245
1.842591
.7388206
1.22245
.7659445
1.842591
1.286052
1.977129
.7659445
1.286052
1.977129
Obs
ll(null)
ll(model)
df
AIC
BIC
104
Obs
-107.3908
ll(null)
-102.9136
ll(model)
9
df
223.8272
AIC
247.6267
BIC
Example 2: Job Satisfaction and Income
log
= + 1 + 2 , = 1,2, , 1
1
I = Income
is the conditional log odds ratio between income and job
satisfaction categories 2 & 1 (3&1,4&1), given gender
G = Gender
is the conditional log odds ratio between gender and job
satisfaction categories 2 & 1 (3&1,4&1), given income
Models for Ordinal Responses
Cumulative Logit Models
Logits can utilize ordered categories
results in models with simpler interpretations and potentially greater
power than baseline-category logit models.
A cumulative probability for Y is the probability that Y is less than or
equal to a certain value. In notation, for j = 1, 2, , J,
= = 1 + = 2 + + =
= 1 + 2 + +
The cumulative probabilities reflect the ordering.
1 2 =1
Models for cum. prob. do not use P(Y1) because P(Y1) = 1
Cumulative Logit Models
The logits of the first J-1 cumulative probabilities are
These are called cumulative logits.
Logit[P(Y j)] is like an ordinary logit model with binary
response , i.e. 1 to j combines to form the first category,
and j+1 to J form the second category.
Each cumulative logit uses all response categories.
For J = 3, both logit[P(Y 1)] = log[1/(2+ 3)] and logit[P(Y2)] =
log[(1+ 2)/ 3] are used
Proportional Odds Property
For a predictor X, the cumulative logit model is given by:
Notice that does not have a subscript j which implies that the
value of is constant for all J-1 cumulative logits.
When the model fits well, a single parameter instead of J-1
parameters is enough to describe the effect of x.
The curves of each cumulative probability have the same
shape/slope/rate of change but different start and end points
depending on j.
Proportional Odds Property
At any fixed value, the
ordering is retained, with P(Y
1) being the lowest.
This is the case when > 0
When < 0, the curves are
descending.
When = 0, the graph has a horizontal line for each
cumulative probability.
Implies X and Y are statistically independent
Proportional Odds Property
P(Y = j) = P(Y j) P(Y j-1)
Probabilities are graphed
figure on the right.
This graph is when > 0.
As x increases, the probability
to fall on a lower category
increases as well
This is against the usual
interpretation that positive
slope implies positive
association
When < 0, the labels on the
figure on the right are
reversed.
Proportional Odds Property
Consider the odds ratio
Get the logarithm on both sides and simplify
log
( | = 2
( | = 1
log
( > | = 2
( > | = 1
= + 2 + 1 = (2 -1 )
Thus, the log OR is the difference between the cumulative logits at
those two values of x, and is equal to (2 -1 )
This is the proportional odds assumption.
The log OR is proportional to the distance between any x values
For x2 x1 = 1, the odds of response below any given category
multiply by exp{} for every unit increase in x.
Estimated Probabilities
The model expression for the cumulative probabilities is:
To estimate the category probabilities,
= = = 1
For example,
Example: All explanatory variables are
categorical
A study looks at factors that influence the decision of whether
college juniors will apply to graduate school. The response is
ordinal with VL at the highest end of the scale.
Because all variables are categorical, data can be entered in a
contingency table.
Apply to Grad School
Parental Undergrad Very Somewhat
Very Likely
Education institution Unlikely Unlikely
Private
175
98
20
Low
Public
25
12
7
Private
14
26
10
High
Public
6
4
3
Example: Cont.
Ensure that dataset is in case or expanded form before using
polr.
Example: Cont. (Using polr)
R command is polr (proportional odds logistic regression) from the
nnet package. Format of dataset should be in case form.
Example: Cont. (using polr)
The coefficients of the last output are called proportional odds ratios.
For pared, the odds of "very likely" applying versus "somewhat likely" or "unlikely"
applying combined are 3.07 greater among students from public than private
colleges, given that all the other variables in the model are held constant
Likewise, the odds of "very likely" or "somewhat likely" applying versus "unlikely"
applying is 3.07
times greater among students with high parental education,
given that all of the other variables in the model are held constant
Example: Cont. (using vglm from VGAM package)
Example: Cont. (in Stata)
Example: w/ continuous predictor (Using polr)
Example: Cont.
The coefficients of the last output are called proportional odds ratios.
For pared, the odds of "very likely" applying versus "somewhat likely" or "unlikely"
applying combined are 2.85 greater among students from public than private
colleges, given that all the other variables in the model are held constant
Likewise, the odds of "very likely" or "somewhat likely" applying versus "unlikely"
applying is 2.85 times greater among students with high parental education,
given that all of the other variables in the model are held constant
For gpa, when a student's gpa moves 1 unit, the odds of moving from "unlikely"
applying to "somewhat likely" or "very likley" applying (or from the lower and
middle categories to the high category) are multiplied by 1.85.
Example: Cont. (using vglm)
Example: Cont. (using stata)
Inference on Model Parameters
Testing for independence (Ho: = 0)
Test statistic to be used is the difference between the deviance
value for the independence model and the model allowing an
explanatory variable.
If p-value < LOS, Ho is rejected and we conclude that an
association exists
Tests of independence on an ordinal scale considers the
ordering of response categories.
When the model is fit, it is more powerful that tests of
independence for nominal data, because
it focuses on a restricted alternative, P(Y j)
it has only a single degree of freedom (Recall that beta is the
same for all J-1 cum logits)
Inference on Model Parameters
Testing Ho: = 0
Inference on Model Parameters
Testing the assumption on proportional odds
Our model where is constant will only hold if the proportional odds
assumptions is not violated. If it is violated, it would be better to get
individual estimates for each j.
Agresti suggested to get the LR test between the vglm model with
(Parallel=TRUE) for simultaneous fitting of , i.e. only one will be
the outcome, and with (Parallel=FALSE) for individual fitting of , i.e.
there will be separate estimates for .
Cases when assumption of proportionality: when the cumulative
probability curves intersect (recall graph earlier)
Occurs , for example, when Males tend to be on the moderate response
of the ordinal scale, whereas Females tend to be both on the extreme
responses of the ordinal scale.
Inference on Model Parameters
Ho: The model without the additional parameters j is sufficient
P-value does not reject the null hypothesis. There is no need to
estimate for individual js. The single is enough.
Alternatives if the proportional odds assumptions is violated:
Run the model with individual js. (Issues: increase in SE, decrease in
power)
Run the model using baseline-category logits and use the ordinality in
an informal way to interpret the association. (Issue: Increase in number
of parameters, less parsimonious)
Collapse multicategory responses to binary. (Issue: loss of efficiency,
loss of data)
Invariance
Invariance to choice of response categories
Situation: Researcher A used a 5-point likert scale (SD, D, N, A,
SA). Researcher B conducted a similar study but used a 3-point
likert scale (D, N, A). If the proportional odds assumption is not
violated, the parameters for the effect of a predictor are roughly
the same.
This feature of the model makes it possible to compare estimates
from studies using different response scales.
Paired-Category Ordinal Logits
ADJACENT-CATEGORIES LOGITS
The adjacent-category logits are:
For J = 3, the logits are log(2/ 1) and log(3/ 2)
The corresponding models is
Paired-Category Ordinal Logits
A simpler proportional odds version of the model is
For it, the effects {= j} of x, on the odds of making the
higher instead of the lower response are identical for each
pair of adjacent response categories.
Example
Stem Cell Research and Religious Fundamentalism
Example: Cont.
Paired-Category Ordinal Logits
CONTINUATION-RATIO LOGITS
Another approach forms logits for ordered response categories in a
sequential manner. The models apply simultaneously to:
These are called continuation-ratio logits.
They refer to a binary response that contrasts each category with a
grouping of categories from lower levels of the response scale.
They refer to a binary response that contrasts each category with a
grouping of categories from lower levels of the response scale.
Example
Tonsil Size and Streptococcus
Example: Cont.