Discriminant Analysis Chapter- Seven
Q-1. Define discriminant analysis. What are the objectives of discriminant analysis?
Answer-
Discriminant Analysis
Discriminant analysis is a technique for analyzing business research data when the
criterion or dependent variable is categorical and the predictor or independent variables
are metric, i.e. measured on at least interval scales. Example- the dependent variable may
be the choice of a brand of personal computer (brand A, B and C) and the dependent
variables may be ratings of attributes of PCs on a7 –point Likert scale.
Objectives of Discriminant Analysis
The objectives of discriminant analysis are as follows-
1) Develop discriminant functions or linear combinations of the predictor or
independent variables, which will best discriminant between the categories of the
criterion or dependent variable (groups).
2) Examination of whether significant differences exist among the groups, in terms of
the predictor variables.
3) Determination of which predictor variables contribute to most of the intergroup
differences.
4) Classification of cases to one of the groups based on the values of the predictor
variables.
5) Evaluation of the accuracy of classification.
Discriminant analysis techniques are described by the number of categories possessed by
the criterion variable.
Q-2.What is the main difference between two-group and multiple discriminant analysis?
Answer-
Two-Group Discriminant Analysis Multiple Discriminant Analysis
(1.) When the criterion or dependent (1.) Discriminant analysis technique where
variable has two categories, the the criterion or dependent variable
technique is known as two-group involves three or more categories.
discriminant analysis.
(2) In two-group case, it is possible to (2.) In multiple discriminant analysis,
derive only one discriminant function. more than one function can be computed.
Khayrul Alam (Lecturer-MIST) Page 1
Discriminant Analysis Chapter- Seven
Q-3. Describe the relationship of discriminant analysis to regression and ANOVA.
Answer-
Relationship of Discriminant, ANOVA and Regression Analysis
The relationship among discriminant analysis, analysis of variance (ANOVA) and
regression analysis is shown in the following table-
ANOVA Regression Discriminant Analysis
(1.) Number of dependent (1.) Number of dependent (1.) Number of dependent
variables (one). variables (one). variables (one).
(2.) Number of (2.) Number of independent (2.) Number of
independent variables variables (multiple independent variables
(multiple). (multiple).
(3.) Nature of the (3.) Nature of the dependent (3.) Nature of the
dependent variables variables (metric). dependent variables
(metric). (Categorical/binary)).
(4.) Nature of independent (4.) Nature of independent (4.) Nature of independent
variables (Categorical). variables (Metric). variables (Metric).
We explain this relationship with an example in which the researcher is attempting
to explain the amount of life insurance purchased in terms of age and income. All
three procedures involve a single criterion or dependent variable and multiple
predictor or independent variables.
The nature of these variables differs. In analysis of variance and regression analysis,
the dependent variable is metric or interval scaled (amount of life insurance
purchased in currency),
Whereas in discriminant analysis, it is categorical (amount of life insurance
purchased classified as high, medium and low). The independent variables are
categorical in the case of analysis of variance (age and income are each classified as
high, medium and low) but metric in the case of regression and discriminant
analysis (age in years and income in currency, i.e. both measured on a ratio scale).
Khayrul Alam (Lecturer-MIST) Page 2
Discriminant Analysis Chapter- Seven
Q-4.What are the steps involved in conducting discriminant analysis?
Answer-
Conducting Discriminant Analysis
The steps involved in conducting discriminant analysis consist of formulation, estimation,
determination of significance, interpretation and validation. These steps are given in the
context of two-group discriminant analysis.
Formulate the Problem
Estimate the discriminant function
coefficients
Determine the significance of the
discriminant function
Interpret the results
Assess the validity of discriminant
analysis
Figure: Conducting Discriminant Analysis
1) Formulate the Problem: The first step in discriminant analysis is to formulate the
problem by identifying the objectives, the criterion variable and the independent
variables. The criterion variable must consist of two or more mutually exclusively and
collectively exhaustive categories. When the dependent variable is interval or ratio
scaled, it must be first converted into categories. Example- attitude toward the brand,
measured on a7 point scale, could be categorized as unfavorable (1, 2, 3), neutral (4)
and favorable (5, 6, 7).
Khayrul Alam (Lecturer-MIST) Page 3
Discriminant Analysis Chapter- Seven
2) Estimate the Discriminant Function Coefficients: We can estimate the discriminant
function coefficients. Two broad approaches are available-
Direct Method: An approach to discriminant analysis that involves
estimating the discriminant function so that all the predictors are included
simultaneously. This method is appropriate when, based on previous
research or a theoretical model, the researcher wants the discrimination to
be based on all the predictors.
Stepwise Discriminant Analysis: Discriminant analysis in which the
predictors are entered sequentially based on their ability to discriminate
between the groups. This method is appropriate when the researcher wants
to select a subset of the predictors or independent variables for inclusion in
the discriminant function.
3) Determine the Significance of the Discriminant Function: The null hypothesis that , in
the population, the means of all discriminant function in all groups are equal can be
statistically tested. This test is based on Wikis 𝞴. If several functions are tested
simultaneously, the Wikis’ 𝞴 statistic is the product of the univariate 𝞴 for each function.
The significance level is estimate based on a chi-square transformation of the statistic. If
the null hypothesis is rejected, indicating significant discrimination, one can proceed to
interpret the results.
4) Interpret the Results: The interpretation of the discriminant weights, or coefficients,
is similar to that in multiple regression analysis. The value of the coefficient for a
particular predictor depends on the other predictors included in the discriminant
function. The sign of the coefficients are arbitrary, but they indicate which variable
values result in large and small functions values and associate them with particular
groups.
5) Assess Validity of Discriminant Analysis: In this option, the discriminant model is
reestimated as many times as there are respondents in the sample. Each reestimated
model leaves out one respondent and the model is used to predict for that respondent.
Validation sample is used for developing the classification matrix. The percentage of
cases correctly classified by discriminant analysis is referred to as hit ratio.
Khayrul Alam (Lecturer-MIST) Page 4
Discriminant Analysis Chapter- Seven
Statistics Associated with Discriminant Analysis
Q-5. What is Wilkis’ 𝞴? For what purpose is it used?
Answer-
Wilkis’ 𝞴: Sometimes also called the U statistic, Wilkis’ 𝞴 for each predictor is the ratio of
the within-group sum of squares to the total sum of squares. Its value varies between 0
and 1. Large values of 𝞴(near 1) indicate that group means do not seem to be different.
Small values of 𝞴 (near 0) indicate that the group means seem to be different.
Q-6.What is a classification matrix?
Answer-
Classification Matrix
Sometimes also called confusion or prediction matrix, the classification matrix contains the
number of correctly classified and misclassified cases. The correctly classified cases appear
on the diagonal, because the predicted and actual groups are the same. The off-diagonal
elements represent cases that have been incorrectly classified. The sum of the diagonal
elements divided by the total number of cases represents the hit ratio.
Q-7. Define the following statistical term of discriminant analysis.
Answer-
a) Discriminant Score: The unstandardized coefficients are multiplied by the values of
the variables. These products are summed and added to the constant term to obtain
the discriminant scores.
b) Eigenvalue: For each discriminant function, the eigenvalue is the ratio of between –
groups to within-group sum of squares. Large eigenvalues imply superior functions.
c) Structure Correlation: Also referred to as discriminant loadings, the structure
correlations represent the simple correlations between the predictors or
independent variables and discriminant function.
Q-8. Explain Discriminant Analysis Model.
Answer-
Khayrul Alam (Lecturer-MIST) Page 5
Discriminant Analysis Chapter- Seven
Discriminant Analysis Model
The statistical model on which discriminant analysis is based. The discriminant analysis
model involves linear combinations of the following form:
D=b0+b1 X1 +b2X2 +b3 X3 +……………..+biXk
Where,
D= discriminant score
b’s= discriminant coefficient or weight
X’s= predictor or independent variable
The coefficients or weights (b), are estimated so that the groups differ as much as possible
on the values of the discriminant function. This occurs when the ratio of between-group
sum of squares to within-group sum of squares for the discriminant scores is at a
maximum. Any other linear combination of the predictors will results in a smaller ratio.
X2 G1 G2
111 222
111 222
G1
X1
G2
Figure: A Geometric Interpretation of Two-Group Discriminant Analysis
Khayrul Alam (Lecturer-MIST) Page 6
Discriminant Analysis Chapter- Seven
We give a brief geometrical exposition of two-group discriminant analysis. Suppose
we had two groups, G1 and G2 and each member of these groups was measured on
two variables X1 and X2. A scatter diagram of the two groups is shown in the above
figure.
Where X1 and X2 are two axes. Members of G1 are denoted by 1 and members of G2
by 2. The resultant ellipses encompass some specified percentage of the points
where the ellipses intersect and then projected to anew axis, D. The overlap
between the univariate distributions G1’and G2’, represented by the shaded area is
smaller than would be obtained by any other line drawn through the ellipses
representing the scatter plots. Thus the groups differ as much as possible on the D
axis.
Khayrul Alam (Lecturer-MIST) Page 7