Factor Analysis
-PROF.CHITVAN MEHROTRA
Factor analysis
Factor analysis is a class of procedures used for data reduction and
summarization.
It is an interdependence technique: no distinction between
dependent and independent variables.
Factor analysis is used:
Factor analysis is a statistical technique that reduces a large number of variables
into a few data sets that are more manageable and understandable. It can be an
efficient way to simplify complex data sets with many variables.
Types of factor analysis
Exploratory Factor Analysis :- Researchers are not aware of how many
underlying dimensions( Factors) can be found from the variables that are under
study. Depending upon the co linearity between the variables limited number of
factors are derived.
Confirmatory Factor Analysis :- In this analysis researchers test the hypothesis
whether the variables under study based on theoretical support actually conform
to the factor structure or not.
Steps to run factor analysis in SPSS
Analyze/Dimension Reduction/Factor
Mention variables(in our example vehicle type…fuel efficiency)
Descriptives (initial solution, coefficients, KMO & Bartlett Test) CONTINUE
Extraction (method: principal components, correlation matrix, unrotated factor
solution, Scree Plot) CONTINUE
Rotation (method: Varimax, display rotated solution) CONTINUE
Scores (Save As Variables, regression) CONTINUE)
OK
Interpretations of factor analysis in SPSS
• The next output from the analysis is the correlation coefficient. A correlation matrix is simply a rectangular array of
numbers that gives the correlation coefficients between a single variable and every other variable in the
investigation.
• The correlation coefficient between a variable and itself is always 1, hence the principal diagonal of the correlation
matrix contains 1s. The correlation coefficients above and below the principal diagonal are the same. If the
correlation coefficients show higher numbers the data is good for factor analysis.
KMO and Barlett’s Test
Kaiser-Meyer-Olkin Measure of Sampling Adequacy – It is an index used to examine
appropriateness of factor analysis. This measure varies between 0 and 1, and values closer to
1 are better.
Values greater than 0.5 indicate that factor analysis is appropriate.
Bartlett’s Test of Sphericity – This tests the null hypothesis that the correlation matrix is an
identity matrix.
(H0) ( An identity matrix is matrix in which all of the diagonal elements are 1 and all off
diagonal elements are 0 )that is the variables are uncorrelated in the population .
H1:Alternate hypothesis: The variable are corelated in the population
You want to reject this null hypothesis.
the significance value should be is less than 0.05 to reject null hypothesis and conclude
that the variables are correlated in the population
Communalities
The values in the extraction column indicate the proportion of each variable’s variance that can be explained by
the retained factors. Variables with high values (>0.5) are well represented in the common factor space, while
variables with low values are not well represented. Here we can see that all extraction values are high. Variables
with small extraction values are eliminated.
Total variance explained
Eigen Value: The eigenvalue represents the total variance explained by each factor. Factors having eigenvalues
over 1 are selected for further study.
This table shows all the factors extractable from the analysis along with their eigenvalues, the percent of variance
attributable to each factor, and the cumulative variance of the factors.
Here we can three factors are selected because three factors have eigen values greater than one(>1) 5.994,1.654 and
1.123(look at the second column Total)
Scree plot
The scree plot is a graph of the eigenvalues against all the factors. The graph is useful for determining how
many factors to retain.
One rule is to consider only those points with eigenvalues over 1.
Component matrix
The elements of the Component Matrix are correlations of the item with each component(factor).
• Summing the squared component loadings(correlations) across the components (columns) gives you the
communality estimates for each item,
• and summing each squared loading down the items (rows) gives you the eigenvalue for each
component.
Rotated component matrix
Why do we rotate?
The idea of rotation is to reduce the number factors on which the variables under investigation have high loadings.
(factor loadings is the correlation between a variable and a factor)
The maximum of each row(excluding the sign) shows that the particular variable belongs to the respective component
Example: vehicle type has the maximum value 0.954 under factor 3 so it belongs to factor three.
Factor1- engine size, horsepower
Factor2- wheel base, length, width,
Factor-3 Vehicle type, curb_ weight ,fuel capacity, fuel _efficiency.
We can name the factors according to our choice.
Try it yourself:
Example-2(Toothpaste data file)
To determine benefits from toothpaste
Responses were obtained on 6 variables:
V1: It is imp to buy toothpaste to prevent cavities
V2: I like a toothpaste that gives shiny teeth
V3: A toothpaste should strengthen your gums
V4: I prefer a toothpaste that freshens breath
V5: Prevention of tooth decay is not imp
V6: The most imp consideration is attractive teeth
Responses on a 7-pt scale (1=strongly disagree; 7=strongly agree)
Thank you…