Factor Analysis: Factors Influencing The Academic Performance of Students
Factor Analysis: Factors Influencing The Academic Performance of Students
1.1 Introduction
There are a large number of higher learning institutions in Bangladesh that are governed by and under supervision of Ministry of Education (MOE), Bangladesh. As to date, Bangladesh has 26 public universities (http://www.moedu.gov.bd) and University of Dhaka (DU) is the largest university in Bangladesh, with a student population of over 115000. Students are main assets of universities. The students performance (academic achievement) plays an important role in producing the best quality graduates who will become great leader and manpower for the country thus responsible for the countries economic and social development. The performance of students in universities should be a concern not only to the administrators and educators, but also to corporations in the labor market. Academic achievement is one of the main factors considered by the employer in recruiting workers especially the fresh graduates. Thus, students have to place the greatest effort in their study to obtain a good grade in order to fulfill the employers demand. Students academic achievement is measured by the Cumulative Grade Point Average (CGPA). CGPA shows the overall students academic performance where it considers the average of all examinations grade for all semesters/years during the tenure in university. Many factors could act as barrier and catalyst to students achieving a high CGPA that reflects their overall academic performance. There are several ways to determine student academic performance which are cumulative grade point average (CGPA), grade point average (GPA), tests and others. In Bangladesh, researchers evaluate the student academic performance based on CGPA. In addition, a study in the United States by Nonis and Wright (2003) also evaluate student performance based on CGPA. Most of the researches done in other countries used GPA as a measurement of academic performance. They used GPA because they are studying the student performance for that particular semester/year. Some other researcher used test results since they are studying performance for the specific subject.
1.2
Review of literature
Many studies have been developed concerning the factors influence students performance such as demographic, active learning, student attendance, extracurricular activities, peers influence and course assessment. Studies have shown that
demographic characteristics can influence academic excellence. Among these characteristics are parents income, parents education and English results.
Hossain (1994) in his work A study of Factor Analysis and its Application" discussed about its background, advantages limitations, uses factor model, method of analysis, uses of SPSS for factor analysis. He also gave an example of motivation measures and students attitude on the basis of his study.
Nasri and Ahmed (2007) in their study on business students (national students and non-national students) in United Arab Emirates indicate that non-national students had higher grade point average were more competent in English, which is reflected in higher average for high school English.
Shamima Syeda Sultana (2003) in her work "Factor Analysis: An application to gross domestic product data" discussed about the factors which have effects on the domestic product from 1995-96 to 1999-2000 for 64 districts of Bangladesh. She also discussed the division wise factors and comparison among the districts for the factors for that period.
1.3
The objectives of this study are 1. To collect primary data for the factor analysis. 2. To reduce the list of variables to few factors for modeling purposes. 3. To fit a model with this factors to check the significance of the model. . 4. To find out the factors which influence the academic performance of the students. 5. To find out the association between these variables.
1.4
Sources of Data:
The data used in this study were collected from the students who live in the Shahidullah hall (residential) of University of Dhaka.
1.5
Data processing:
After collecting data, the following computer application packages are used to process the data: 1. SPSS 16 2. Microsoft Excel
1.6
The limitations of this study are: 1. As primary data is used to analysis the data collecting procedure was not 100% accurate. 2. In this report we only consider major 12 variables. Other less influential variables have been ignored. 3. In our data collecting procedure the non residential students are ignored. 4. The data collecting procedure took a long time so that enough analysis could not be done.
2.1
Principal component analysis (PCA) is a classical statistical method. It was first derived by Karl Pearson (1901) and rediscovered by Hostelling in (1933). Principal Components Analysis (PCA) is a multivariate procedure which rotates the data such that maximum variabilities are projected onto the axes. Essentially, a set of correlated variables are transformed into a set of uncorrelated variables which are ordered by reducing variability. The uncorrelated variables are linear combinations of the original variables, and the last of these variables can be removed with minimum loss of real data. The main use of PCA is to reduce the dimensionality of a data set while retaining as much information as is possible. It computes a compact and optimal description of the data set. In communication theory, it is known as the Karhunen-Loeve transform. This procedure performs Principal Component Analysis on the selected dataset. A principal component analysis is concerned with explaining the variance covariance structure of a high dimensional random vector through a few linear combinations of the original component variables. Consider a p-dimensional random vector X = (Xi, X2...
YI,
Xp). k principal components ( k p ) of X are k (univariate) random variables Yk which are defined by the following formulae Y1 = l1X = l11X1 +l12X2 + . . . + l1pXp Y2 = l2X = l21X1 +l22X2 + . . . + l2PXp
. . .
Y2,...,
Where the coefficient vectors l1, l2 . . . etc are chosen such that they satisfy the following conditions:
First Principal Component = Linear combination l1X that maximizes Var (l1 X) and || l1|| =1 Second Principal Component = Linear combination l2X and maximizes Var (l2X) and || l2||=1 and Cov (l1X, l2 X) = 0. j th Principal Component = Linear combination ljX that maximizes Var(ljX) and ||lj|| =1 and Cov(lk'X,l 'j X) =0 for all k < j.
This says that the principal components are those linear combinations of the original variables which maximize the variance of the linear combination and which have zero covariance (and hence zero correlation) with the previous principal components.
It can be proved that there are exactly p such linear combinations. However, typically, the first few of them explain most of the variance in the original data. So instead of working with all the original variables X1, X2, . . . ,Xp you would typically first perform PCA and then use only first two or three principal components, say Y1 and Y2, in subsequent analysis.
2.2
1. To discover or to reduce the dimensionality of the data set. 2. To identify new meaningful underlying variables. 3. To derive a small number of linear combinations (principal components) of a set of variables that retain as much of the information in the original variables as possible. 4. To reveal relationship that was not previously suspected and thereby allows interpretation that would not ordinarily result.
2.3
1. Transformation from z to y: The equation y = V'z represents a transformation, where y is the transformed variable, z is the original standardized variable and V is the premultiplier to go from z to y.
2. Orthogonal transformations simplify things: To produce a transformation vector for y for which the elements are uncorrelated is the same as saying that we want V such that Dy is a diagonal matrix. That is, all the off-diagonal elements of Dy must be zero. This is called an orthogonalizing transformation.
3. Infinite number of values for V: There are an infinite number of values for V that will produce a diagonal Dy for any correlation matrix R. Thus the mathematical problem "find a unique V such that Dy is diagonal" cannot be solved as it stands. A number of famous statisticians such as Karl Pearson and Harold Hotelling pondered this problem and suggested a "variance maximizing" solution.
4. Principal components maximize variance of the transformed elements one by one: Hotelling (1933) derived the "principal components" solution. It proceeds as follows: for the first principal component, which will be the first element of y and be defined by the coefficients in the first column of V, (denoted by V1), we want a solution such that the variance of y1 will be maximized 5. Constrain v to generate a unique solution: The constraint on the numbers in V1 is that the sum of the squares of the coefficients equals 1. Expressed mathematically, we wish to maximize
1 N 2 y1i N i =1
where y1i = v1' zi and v1'v1 = 1 (this is called "normalizing v1).
6. Computation of first principal component from R and v1, Substituting the middle equation in the first yields
1 N
y
i =1
2 1i
= v1 Rv1
where R is the correlation matrix of Z, which, in turn, is the standardized matrix of X, the original data matrix. Therefore, we want to maximize v1 Rv1 subject to v1'v1 = 1. 2.4 Procedure for Principal Component Analysis
Principal Components are particular linear combinations of the p random variables X1, X2, . . . ,Xp The first principal component is then the linear combination of the variables X1, X2,
,Xp ...
Z1 = l11X1 + l12X2 + . . . + 11pXP = liX. That varies as much as possible for the individuals, subject to the condition that li li => 1211+ I2'1p + . . .+12 1p =1 Thus the variance of Z, V(Z)is as large as possible given this constraint on the constants 1ij. The second principal component, Z2 = l21X1 + 122X2 +......... +12pXp =12 X is such that V(Z2 ) is as large as possible subject to 1212=1=> 1221+ 1222 +122p =1 and also to the condition that Z1 and Z are uncorrected i.e. COV(Zl Z2)COV(l1 X,l'2X)= 0. Similarly other principal components are defined in this way. If there are p variables, there can be up to p principal components. The variance-covariance matrix is,
Where the diagonal elements Cii are the variances of Xi' s and Cii' s are covariance's. The variances of the principal components are the eigenvalues of the matrix C.
7
Assuming that the eigenvalues are ordered as 1 2 ... p 0, theni corresponds to the ith principal component
Z 1 = 111 X 1 + 112 X 2 + ... + 11 p X p = 1' i X .
Now V(Zj) = i and the constants 1i1,1i 2,...,1ip . are the elements of eigenvector. An important property of the eigenvalues is 1 i ... p = c11 + c 22 + ... + c pp . It means that sum of the variances of principal components is equal to the sum of the variance of the original data.
2.5
1. First code the variables X 1, X 2, ..., X p to have zero means and unit variances.
2. Then calculate the nxp Data Matrix, Covariance Matrix, S or Correlation Matrix, R
3. Get eigenvalues ( 1, i , .., p ) and eigenvectors ( ai , a 2,..., a p ), proportion of total variation explained by the jth principal component is j / tr(S) and proportion of total variation explained by the jth principal component is j / p.
1/ 2
5. Choose the number of principal components, Select a percentage of the total variation that could be explained (70%-90%), Exclude principal components whose eigenvalues are less than tr(S)/l (for R).
2.6
Factor Analysis
Factor Analysis is a statistical approach that can be used to analyze interrelationships among a large number of variables and to explain these variables in terms of their common underlying dimensions (factors). Thousands of variables have been proposed to explain or describe the complex variety and interconnections of social and international relations. Perhaps an equal number of hypotheses and theories linking these variables have been suggested. Cureton and DAgostino (1983) described factor analysis as "a collection of procedures for analyzing the relations among a set of random variables observed or counted or measured for each individual of a group". Bryman and Cramer (1990) broadly defined factor analysis as "a number of related statistical techniques which help us to determine the characteristics which go together".
Hair et al. (1992) described factor analysis as "The statistical approach involving finding a way of condensing the information contained in a number of original variables into a smaller set of dimensions (factors) with a minimum loss of information" 2.7 Types of factor analysis: Two main types:
1. Principal component analysis: - This method provides a unique solution, so that the original data can be reconstructed from the results. It looks at the total variance among the variables, so the solution generated will include as many factors as there are variables, although it is unlikely that they will all meet the criteria for retention. There is only one method for completing a principal components analysis; this is not true of any of the other multidimensional methods described here. 2. Common factor analysis: - This is what people generally mean when they say "factor analysis." This family of techniques uses an estimate of common variance among the original variables to generate the factor solution. Because of this, the number of factors will always be less than the number of original variables. So, choosing the number of factors to keep for further analysis is more problematic using common factor analysis than in principle components.
9
2.8
1. To reduce a large number of variables to a smaller number of factors for modeling purposes, where the large number of variables precludes modeling all the measures individually. As such, factor analysis is integrated in structural equation modeling (SEM), helping create the latent variables modeled by SEM. However, factor analysis can be and is often used on a stand-alone basis for similar purposes. 2. To select a subset of variables from a larger set, based on which original variables have the highest correlations with the principal component factors. 3. To create a set of factors to be treated as uncorrelated variables as one approach to handling multi co-linearity in such procedures as multiple regression 4. To validate a scale or index by demonstrating that its constituent items load on the same factor, and to drop proposed scale items which cross-load on more than one factor. 5. To establish that multiple tests measure the same factor, thereby giving justification for administering fewer tests. 6. To identify clusters of cases and/or outliers. 7. To determine network groups by determining which sets of people cluster together.
10
2.9
1. Large enough sample to yield reliable estimates of the correlations among the variables. 2. Statistical inference is improved if the variables are multivariate normal. 3. Relationships among the pairs of variables are linear. 4. Absence of outliers among the cases. 5. Some degree of co-linearity among the variables but not an extreme degree or singularity among the variables. 6. Large ratio of N/k.
2.10
Factor analysis has similar aim to principal component analysis. Here also, we reduce a set of p variables to a few number of indices or factors and hence elucidate the relationship between variables. Spearman proposed the idea that the test scores are all of the form X i = i F + i , where -X~ is the ith standardized test score with mean 0, standard deviation 1. 1i is a constant. F is a factor value having mean 0 and standard deviation =1 for all the individuals as a whole, c, is the part of X that is specific to ith test. Also V(Xi)=l 2i + V( i ) Since li is a constant, F and 6i are independent and V (F) is assumed to be unity. But V(X,) is also unity, so that l2i+ V( i ) = 1. Hence the constant li also called the factor loading, is such that its square is the proportion of the variance of Xj that is accounted for by its factor.In the way the generalized factor analysis model is -
11
variance or uniqueness i.e., the part of variance that is unrelated to the common factors.
2.11
1. Collect data and compute an intercorrelation matrix. Compute the factorability of the matrix. 2 .Extract an initial solution. 3. From the initial solution, determine the appropriate number of factors to be extracted in the final solution 4. If necessary, rotate the factors to clarify the factor pattern in order to better interpret the nature of the factors 5. Depending upon subsequent applications, compute a factor score for each subject on each factor
2.12
Let us assume that our Y variables are related to a number of functions operating linearly. That is, Equation 1:
Y1 = x11 F1 + x12 F2 + ... + x1m Fm ,
Y2 = x21 F1 + x 22 F2 + ... + x 2 m Fm ,
Where: Y = a variable with known data x= a constant F = a function, f ( ) of some unknown variables.
12
By application to the known data on the Y variables, factor analysis defines the unknown F functions. The loadings emerging from a factor analysis are the constants. The factors are the F functions. The size of each loading for each factor measures how much that specific function is related to Y. For any of the Y variables of Equation 1 we may write Equation 2:
Y = x1 F1 + x2 F2 + x3 F3 + ... + xmFm,
With the F's representing factors and the it's representing loadings.
2.13
Methods of Estimation
A variety of methods have been developed to extract factors from an intercorrelation matrix. SPSS offers the following methods i 1. Principle components method j 2. Maximum likelihood method (a commonly used method) 3. Principal axis method also know as common factor analysis 4. Unweighted least-squares method 5. Generalized least squares method 6. Alpha method 7. Image factoring
The most popular methods of estimation of parameters of factor analysis are the principle component method and the maximum likelihood method. In this methods principal component analysis transforms the correlation matrix into new, smaller sets of linear combinations of independent (i.e., uncorrelated) principle components (Zillmer and Vuz, 1995). Principal component analysis is a separate technique from the ML method because it partitions the variance of the correlation matrix into new principle components (Zillmer and Vuz, 1995).
13
2.13.1
Let the observable random vector X has covariance matrix eigenvector pairs ( i , i ) with i , 2 ... p ) 0 Then
=
We can write
= LL
=LL + .....................(2)
'
2.14
A factor loading is the correlation between a variable and a factor that has been extracted from the data .Factor loadings are the basis for imputing a label to the different factors. The correlations between the variables and the two factors (or "new" variables), as they are extracted by default; these correlations are called factor loadings. Factor loading are those values, which explain how closely the variables are related to each one of the factors discovered. They are also known as factor-variable correlation. In fact, factor loading work as key to understanding what the factors mean. It is the absolute size (rather than the signs, plus or minus) of the loading that is important in the interpretation of a factor. In a word, correlation between the factor and a variable is called factor-loading. The component matrix indicates the correlation of each variable with each factor.
14
2.14.2
Communality
The sum of the squared factor loadings for all factors for a given variable (row) is the variance in that variable accounted for by all the factors, and this is called the communality. Communality shows how much of each variable is accounted for by the underlying factor taken together. A high value of communality means that not much of the variable is left over after that ever the factors represent is taken into consideration. It is worked out in respect of each variable as under: Communality of the ith variable= (ith factor loading of factor A)2 + (ith factor loading of factor B)2+ ...
2.14.3
Eigenvalue is the amount of variance in variable set explained by the factor. When we take the sum of squared values of factor loading relating to a factor, then such sum is referred to as eigenvalue or latent root. Eigenvalue indicates the relative importance of each factor in accounting for the particular set of variables being analyzed.
2.14.4
Correlation Matrix
The most often employed techniques of factor analysis are applied to a matrix of correlation coefficients among all the variables. The full correlation matrix involved in the factor analysis is usually shown if the number of variables analyzed is not overly large. Often, however, the matrix is presented without comment. Specifically, the correlation matrix has the following features.
The coefficients of correlation express the degree of linear relationship between the row and column variables of the matrix. The closer to zero the coefficient, the less the relationship; the closer to one, the greater the relationship. A negative sign indicates that the variables are inversely related.
To interpret the coefficient, square it and multiply by 100. This will give the percent variation in common, for the data on the two variables.
15
The correlation coefficient between two variables is the cosine of the angle between the variables as vectors plotted on the cases (coordinate axes)..
2.14.5
KMO and Bartletts test of sphericity produces the Kaiser-Meyer-Olkin measure of sampling adequacy and Bartletts test. KMO value should be greater than 0.5 if the sample is adequate. The KMO statistic varies between 0 and 1.A value close to 1, indicating the factor analysis is preferable. Bartletts measure test the null hypothesis that the original correlation matrix is an identity matrix. For factor analysis to work we need some relationships between variables and if the R-matrix were an identity matrix then all correlation coefficients would be zero. Therefore, we want this test to be significant (i.e. have a significant value less than .05). A significant test tells us that the R-matrix is not an identity matrix; therefore there are some relationships between variables we hope to include in the analysis. 2.14.5 Rotation
There are various methods that can be used in factor rotation... 1. Varimax Rotation: - Varimax Rotation attempts to achieve loadings of ones and zeros in the columns of the component matrix (1.0 & 0.0).
2. Quartimax Rotation: - Quartimax Rotation attempts to achieve loadings of ones and zeros in the rows of the component matrix (1.0 & 0.0).
3. Equimax Rotation: - Equimax Rotation combines the objectives of both varimax and quartimax rotations
4. Orthogonal Rotation: - Orthogonal Rotation preserves the independence of the factors, geometrically they remain 90 apart.
16
5. Oblique Rotation: - Oblique Rotation produce factors that are not independent, geometrically not 90 apart.
Rotation, in the context of factor analysis, is some thing like staining a microscope slide. Just as different structures in the data. Though different rotations give results that appear to be entirely different, but from a statistical point of view, all results are taken as equal, none superior or inferior to others. However, from the stand point of making sense of the results of factor analysis, one must select of the right rotation. If the factors are independent orthogonal rotation is done and if the factors are correlation, an oblique rotation is made. Communality for each variable will remain undisturbed regardless of rotation but the eigenvalue will change as a result of rotation.
2.14.6
Factor Rotation
All factor loadings obtained from the initial loadings by an orthogonal transformation have the same ability to reproduce the covariance matrix. From matrix algebra we know that an orthogonal transformation corresponds to a rigid rotation of the coordinate axes. For this reason an orthogonal transformation of the factor loadings and the implied orthogonal transformation of the factors are called factor rotation. 2.14.7 Unrotated Factor Matrix
Two different factor matrices are often displayed in a report on a factor analysis. The first is the unrelated factor matrix; it is usually given without comment. The features of the matrix which are useful for interpretation are as follows The number of factors (columns) is the number of substantively meaningful independent (uncorrelated) patterns of relationship among the variables. The loadings, ,, measure which variables are involved in which factor pattern and to what degree The square of the loading multiplied by 100 equals the percent variation that a variable has in common with an unrotated pattern. The first unrotated factor pattern delineates the largest pattern of relationships in the data; the second delineates the next largest pattern that is independent of (uncorrelated with) the first; the third pattern delineates the third largest pattern that is independent of the first and second; and so on. Thus the amount of variation in the data described by
17
each pattern decreases successively with each factor; the first pattern defines the greatest amount of variation, the last pattern the least. The column headed "h2" displays the communality of each variable. This is the proportion of a variable's total variation that is involved in the patterns. The coefficient (communality) shown in this column, multiplied by 100, gives the percent of variation of a variable in common with each pattern. The percent of common variance figures indicate how whatever regularity exists in the data is divided among the factor patterns. The percent of total variance figures measure bow much of the data variation is involved in a pattern; the percent of common variance figures measure how much of the variation accounted for by all the patterns is involved in each pattern. The eigenvalues equal the sum of the column of squared loadings for each factor. They measure the amount of variation accounted for by a pattern. Dividing the eigenvalues either by the number of variables or by the sum of h2 values and multiplying by 100 determines the percent of either total or common variance, respectively. 2.14.8 Rotated Factor Matrix
The rotated factor matrix should not differ in format from the unrelated factor matrix, except that the h2 may not be given and eigenvalues are inappropriate. The following features characterize the rotated matrix: If the rotated matrix is orthogonal then several features of the unrotated matrix are preserved by the orthogonally rotated matrix. In the unrotated matrix, factor patterns are ordered by the amount of data variation they account for, with the first defining the greatest degree of relationship in the data. In the orthogonally rotated matrix, no significance is attached to factor order. If the rotated matrix is oblique rather than orthogonal then Oblique rotation takes place in one of two coordinate systems: either a system of primary axes or a system of reference axes. The primary factor pattern matrix and the reference factor structure matrix delineate the oblique patterns or clusters of interrelationship among the variables. Their loadings define the separate patterns and degree of involvement in the patterns for each variable.
18
2.14.9
Factor Scores
A useful by product of factor analysis is factor scores. Factor scores are composite measures that can be computed for each subject on each factor. They are standardized measures with a mean = 0.0 and a standard deviation of 1.0, computed from the factor score coefficient matrix. Factor score represents the degree to which each respondent gets high scores on the group of items that load high on each factor. Factor scores can explain what the factors mean with such scores, several other multivariate analyses can be performed. 2.15 Advantages of Factor Analysis
The advantages of Factor Analysis are discussed below 1. Factor analysis can simultaneously manage over a hundred variables, compensate for random error and invalidity, and disentangle complex interrelationships into their major and distinct regularities. 2. The technique of factor analysis is quite useful when we want to condense and simplify the multivariate data. 3. The technique is useful to verify conceptualization of a construct of interest. 4. The technique is helpful in pointing out important and interesting relationships among observed data that were there all the time, but not easy to see from the data alone. 5. The technique can reveal the latent factors (i.e., underlying factors not directly observed that determine relationships among several variables concerning a research study. 6. The technique may be used in the context of empirical clustering of products, media or people i.e., for providing a classification scheme when data scored on various rating scales have to be grouped together. 7. The technique may be used in the context of empirical clustering of products, media or people i.e., for providing a classification scheme when data scored on various rating scales have to be grouped together.
19
2.16
The disadvantages of Factor Analysis are discussed below 1. It is mathematically complicated and entails diverse and numerous considerations in application. 2. Its technical vocabulary includes strange terms such as eigenvalues, rotate, simple structure, orthogonal, loadings, and communality etc. 3. The problem of communicating factor analysis is especially crucial for peace research. Scholars in this field are drawn from many disciplines and professions, and few of them are acquainted with the method. 4. It involves laborious computations involving heavy cost burden. With computer facility available these days, there is no doubt that factor analyses have become relatively faster and easier, but the cost factor continues to be the same i.e., large factor analyses are still bound to be quite expensive. 5. The results of a single factor analysis are considered generally less reliable and dependable for very often a factor analysis starts with a set of imperfect data 6. Factor analysis is a complicated decision tool that can be used only when one has through knowledge and enough experience of handling this tool. Even then, at times it may not work well and may even disappoint the user.
2.17
The uses of factor analysis are discussed below 1. Interdependency and pattern delineation: - If a scientist has a table of data-say, UN votes, personality characteristics, or answers to a questionnaire-and if he suspects that these data are interrelated in a complex fashion and then factor analysis may be used to untangle the linear relationships into their separate patterns. 2. Parsimony or data reduction: - Factor analysis can be useful for reducing a mass of information to an economical description. For example, data on fifty characteristics for 300 nations are unwieldy to handle, descriptively or analytically. The management, analysis, and understanding of such data are facilitated by reducing them to their common factor patterns.
20
3. Structure: - Factor analysis may be employed to discover the basic structure of a domain. As a case in point, a scientist may want to uncover the primary independent lines or dimensions-such as size, leadership, and age-of variation in group characteristics and behavior. Data collected on a large sample of groups and factor analyzed can help disclose this structure 4. Classification or description: - Factor analysis is a tool for developing an empirical typology. It can be used to group interdependent variables into descriptive categories, such as ideology, revolution, liberal voting, and authoritarianism. It can be used to classify nation profiles into types with similar characteristics or behavior. 5. Scaling:-A scientist often wishes to develop a scale on which individuals, groups, or nations can be rated and compared. The scale may refer to such phenomena as political participation, voting behavior, or conflict. A problem in developing a scale is to weight the characteristics being combined. Factor analysis offers a solution by dividing the characteristics into independent sources of variation (factors). 6. Hypothesis testing:- Hypotheses abound regarding dimensions of attitude, personality, group, social behavior, voting, and conflict. Since the meaning usually associated with characteristics or existence. 7. Data transformation: - Factor analysis can be used to transform data to meet the assumptions of other techniques. If the predictor variables are correlated in violation of the assumption, factor analysis can be employed to reduce them to a smaller set of uncorrelated factor scores. 8. Exploration: - In a new domain of scientific interest like peace research, the complex interrelations of phenomena have undergone little systematic investigation. The unknown domain may be explored through factor analysis. It can reduce complex interrelationships to a relatively simple linear expression and it can uncover unsuspected, perhaps startling, relationships. 9. Mapping: - Besides facilitating exploration, factor analysis also enables a scientist to map the social terrain. This means the systematic attempt to chart major empirical concepts and sources of variation. These concepts may then be used to describe a domain or to serve as inputs to further research.
21
x
"dimension" is that of a cluster or group of highly intercorrelated j behavior, factor analysis may be used to test for their empirical
3.1
Target population
The intended target population for this study is the 3rd, 4th and the M.SC students of the Shahidullah Hall of University of Dhaka. Each student is considered as sampling unit. Shahidullah Hall is one of the biggest halls of university of Dhaka. In that hall every kind of students lives. So for data collection this hall is considered. The 3rd, 4th and the M.SC students were considered in this study because they have at least spent three years in this university and for that reason their academic performances can be considered as adequate for this kind of analysis. For sampling cluster sampling technique is used, this is because enough information was not available to construct the sampling frame for other probability sampling techniques. For data collection only the main building of the Shahidullah Hall is considered because most of the 3rd, 4th
and
that building hall rooms were considered as clusters. There are 178 clusters in the sampling frame. Among them 32 were selected randomly. From this selected clusters 56 sampling units are taken.
22
3.2
Data
The questionnaire was distributed to the selected hall students. The questionnaire is given in appendix. A total of 56 questionnaires were completed. Of the 56 sample units, 16.07 % are 3rd year students, 64.28 % are 4th year students and rest of them are M.SC students.
23
24
3.3
Variables
We consider here 11 Variables which influence students academic performances. Qualitative variables are ignored for this study to avoid complications. The 11 Variables are given below: 1. Attendance in class. 2. Study hours per week after class. 3. Family income. 4. Involvement in political activities. 5. Involvement in extracurricular activities. 6. Past academic performances (SSC and HSC results). 7. Entertainment. 8. Involvement in financial (income earning) activities. 9. How long it took to get a seat in the hall. 10. Number of roommate. 11. Sleeping hours. 1. Attendance in class:
Attendance in class refers to the attendance of the 3rd, 4th and the M.SC students in their class of the Shahidullah hall of University of Dhaka 2. Study hours per week after class:
This variable refers to the time spent for study in a week after class of the 3rd, 4th and the M.SC students of the Shahidullah hall of University of Dhaka 3. Family income:
This variable refers to the monthly income of the earning members of the family of the of the 3rd, 4th and the M.SC students of the Shahidullah hall of University of Dhaka
25
4.
This variable refers to the time spent for political works, discussions, meetings and other political activities of the 3rd, 4th and the M.SC students of the Shahidullah hall of University of Dhaka. 5. Involvement in extracurricular activities:
This variable refers to the time spent for the extracurricular activities such as playing different types indoor and outdoor games, participate in debate competitions etc of the 3rd, 4th and the M.SC students of the Shahidullah hall of University of Dhaka. 6. Past academic performances (SSC and HSC results).
This variable refers to the SSC and HSC result of the 3rd, 4th and the M.SC students of the Shahidullah hall of University of Dhaka. 7. Entertainment:
It refers to the time spent for watching TV, listening to music, reading novels etc of the 3rd, 4th and the M.SC students of the Shahidullah hall of University of Dhaka. 8. Involvement in financial (income earning) activities:
It refers to the time spent for the income earning activities such as tutoring students, part time jobs, business etc. 9. How long it took to get a seat in the hall:
This variable refers to the time taken to get seat in the hall. 10. Number of roommate:
It refers to the number of roommate of the 3rd, 4th and the M.SC students of the Shahidullah hall of University of Dhaka. 11. Sleeping hours:
26
Analysis of Data
Factor Analysis is a statistical approach that can be used to analyze interrelationships among a large number of variables and to explain these variables in terms of their common underlying dimensions (factors). Thousands of variables have been proposed to explain or describe the complex variety and interconnections of social and international relations. Perhaps an equal number of hypotheses and theories linking these variables have been suggested. So this method is chosen for analysis.
The first thing to do when conducting a factor analysis is to look at the inter-correlation between variables. We expect that our variables correlate with each other. If we find any variables that do not correlate with any other variables then we should consider excluding these variables before the factor analysis is run. The correlations between variables can be checked using the correlate procedure to create a correlation matrix of all variables. This matrix can also be created as part of the main factor analysis. KMO and Bartletts test of sphericity produces the Kaiser-Meyer-Olkin measure of sampling adequacy and Bartletts test. KMO value should be greater than 0.5 if the sample is adequate. The KMO statistic varies between 0 and 1.A value close to 1, indicating the factor analysis is preferable. Bartletts measures test the null hypothesis that the original correlation matrix is an identity matrix. For factor analysis to work we need some relationships between variables and if the R-matrix were an identity matrix then all correlation coefficients would be zero. Therefore, we want this test to be significant (i.e. have a significant value less than .05). A significant test tells us that the R-matrix is not an identity matrix; therefore there are some relationships between variables we hope to include in the analysis.
27
4.1
interpretation SPSS software is used to analyze the data. Table 4.1: The correlation matrix involved in the factor analysis for academic
performance
1.000 -.197 .106 .655 .505 .182 -.368 .175 1.000 .084 -.218 -.352 -.162 .066 -.010 1.000 .025 .022 .203 -.084 -.149 1.000 .734 -.025 -.319 .130 1.000 .094 -.391 .127 1.000 -.608 .210 1.000 -.366 1.000
28
From table 4.1 we can see that students average attendance and average study per week after class have high positive correlation with previous result, and strong negative correlation with average political activity. It means students have good previous result who attends the class regularly and who studies a good amount of time after class than the other students. Average attendance has positive correlation with average study per week after class and negative correlation with political activity. That means if a student is regular in class then he spends good amount of time for his study and can not spend a lot of time for his political activities. We found that the family income has negative correlations with the financial activity and positive correlation with extracurricular activity. That means students whose parents have high income do not have to involve in much financial activities and their involvements in extracurricular activities are better for this reason.
29
Table 4.2:
Kaiser-Meyer-Olkin Measure of Sampling Adequacy. Bartlett's Test of Sphericity Approx. Chi-Square df Sig.
KMO and Bartletts test of sphericity produces the Kaiser-Meyer-Olkin measure of sampling adequacy and Bartletts test. KMO value should be greater than 0.5 if the sample is adequate. The KMO statistic varies between 0 and 1.A value close to 1, indicating the factor analysis is preferable. Bartletts measure tests the null hypothesis that the original correlation matrix is an identity matrix. For factor analysis to work we need some relationships between variables and if the R-matrix were an identity matrix then all correlation coefficients would be zero. Therefore, we want this test to be significant (i.e. have a significant value less than .05). A significant test tells us that the R-matrix is not an identity matrix; therefore there are some relationships between variables we hope to include in the analysis. As the value of KMO is .616 (Values between 0.5 and 0.7 are considered as mediocre) it indicates that the patterns of correlations are relatively compact and so factor analysis yield distinct and reliable factors. For these data, Bartletts test is significant as p < .05.So, factor analysis is appropriate here. Also in Bartlett's Test of Sphericity, our null Hypothesis is Correlation Matrix Is a Singular Matrix. Which is Rejected for tabulated Chi-Square Value 194.031 (for df 55 and level of significance .01) .So we can say that Correlation matrix of 11 variable is not a singular but positive definite matrix and we can move forward to Extract the Factors.
30
4.1.1
The Eigenvalue of the Correlation matrix is the variance of the variables explained by the factors. We calculate twelve Eigenvalues as same number of the variables. Here in Table 4.3 1st column shows the Calculated Eigenvalues of factor components. 2nd column shows the variance explained by each factor and 3rd column shows the cumulative variance.
Table 4.3:
Initial Eigenvalues Component 1 2 3 4 5 6 7 8 9 10 11 1 Total 3.199 1.648 1.510 1.169 .882 .800 .520 .490 .404 .253 .125 3.199 % of Variance 29.078 14.981 13.729 10.623 8.018 7.271 4.729 4.458 3.677 2.296 1.140 29.078
31
Cumulative % 29.078 44.059 57.788 68.411 76.429 83.700 88.429 92.887 96.564 98.860 100.000 29.078
The largest eigenvalue is 3.199, which is greater than unity. Then the percentage of total sample variance explained by the first factor is 29.078%. The second factor has an eigenvalue= 1.648. Since, this is greater than 1.0, it explains more variance than a single variable, in fact 1.648 times as much. The percent a variance explained = 14.981%.The third factor has an eigenvalue= 1.510.. The percent a variance explained = 13.729%.The fourth factor has an eigenvalue= 1.169. The percent a variance explained 10.623%. Factors 5 through 11 have eigenvalues less than 1 & therefore explain less variance than a single variable. We have found four eigenvalues greater than unity here & in this study this four factors can explain the 68.411% the total variation.
32
4.1.2 Scree Plot Scree plot is special kind of graph showing Eigenvalues for the component factor. here component under the line is not considered because eigenvalues less than 1 is can not explain variance of more than one variable.
33
4.1.3 Extracted factors Here factor having Eigenvalues greater than 1 is extracted
Table 4.4:
Extraction Sums of Squared Loadings % of Component 1 2 3 4 Total 3.199 1.648 1.510 1.169 Variance 29.078 14.981 13.729 10.623 Cumulative % 29.078 44.059 57.788 68.411
Extracted four factors by Scree Plot describe almost 70 % of the total variance (Table 4.4). That is, Factor Analysis by PCA is effective.
34
4.1.4
Extracted factors after rotation also explain more than 70 % of variation. After rotation 1st factor components described variation reduces where other factor components variation increase. (See Table 4.5). This improves the result.
Table 4.5:
Rotation Sums of Squared Loadings Component 1 2 3 4 Total 2.869 1.856 1.595 1.206 % of Variance 26.078 16.871 14.501 10.962 Cumulative % 26.078 42.949 57.450 68.411
35
4.1.5
Table 4.6:
Component matrix
Component
variable
Previous result Number of roommate Time taken to get a hall seat average attendance average study hour per week Family income average financial activity average extracurricular activity average entertainment per day average political activity average sleep per day
.763 -.342
-.164 .385
-.008 .229
.119 .254
.097
.147
-.277
.865
.852
-.199
.348
.123
.788
-.351
.010
-.017
.334 -.637
.435 -.400
-.719 .430
-.031 .126
.388
.451
-.015
-.484
-.085
.718
.279
.161
-.672
-.174
-.414
-.189
.130
.437
.555
-.121
36
4.1.6 Communality The proportion of the variance Explained by the factor loading is the communality of a variable. It is the sum of squared factor loading to each variable. In PCA we assumed all the variables have same variance in common.
Table 4.7:
Communalities
Variable
Previous result Number of roommate Time taken to get a hall seat average attendance average study hour per week Family income average financial activity average extracurricular activity average entertainment per day average political activity average sleep per day .531 .623 .382
.856
.902
.744
.818 .766
.588
.626
.689
From the Table 4.7 we have seen that almost every variable has high communality (common or shared variance explained by the factors) except number of roommate.
37
4.1.7
We omit the factor loading less than 40 % for better understanding of factor description.
Table 4.8:
Component
variable
Previous result Number of roommate Time taken to get a hall seat
1 .763
.865
average attendance average study hour per week Family income average financial activity average extracurricular activity average entertainment per day average political activity average sleep per day
.451
-.484
.718
-.672
-.414
.437
.555
38
Here we see, Factor-1 holds all variables except Number of roommate, Time taken to get a hall seat, Family income, average extracurricular activity, average entertainment per day and average sleep per day. Factor-2 holds Family income, average financial activity, average extracurricular activity and average entertainment per day. Factor-3 holds Family income, average
entertainment per day, average political activity and average sleep per day. Factor-4 holds Time taken to get a hall seat and average extracurricular activity. This factor does not give us precise idea. So we will check the factor loading for Varimax rotated factor Rotation. 4.1.8 Suppressed Rotated Factor Loadings Suppressed Rotated Component Matrix variable
Previous result
Table 4.9:
.737
Number of roommate
.502
.905
average attendance
.949
.784
Family income
.880
-.816
.532
-.464
.766
-.740
39
Here we see Factor-1 holds Previous result, Average attendance, Average study hour per week, Average political activity.
Factor-2 holds Family income , Average financial activity , Average extracurricular activity.
Factor-3 holds Average entertainment per day, Average sleep per day. Number of roommate Factor-4 holds Time taken to get a hall seat Average extracurricular activity.
4.1.9
Factor naming
Factor1: Academic effort factor Factor 2: Financial factor Factor 3: Leisure and entertainment factor Factor 4: Hall seat factor
40
Table 4.10:
Factor score
41
4.2:
Regression Model
Here, we consider CGPA= dependent variable Factor 1= independent variable Factor 2 = independent variable Factor 3= independent variable Factor 4= independent variable
42
Table 4.11:
CGPA Pearson Correlation CGPA factor1 factor2 factor3 factor4 Sig. (1-tailed) CGPA factor1 factor2 factor3 factor4 N CGPA factor1 factor2 factor3 factor4 1.000 .925 .042 -.124 -.011 . .000 .380 .181 .469 56 56 56 56 56
factor1 .925 1.000 .000 .000 .000 .000 . .500 .500 .500 56 56 56 56 56
factor2 .042 .000 1.000 .000 .000 .380 .500 . .500 .500 56 56 56 56 56
factor3 -.124 .000 .000 1.000 .000 .181 .500 .500 . .500 56 56 56 56 56
factor4 -.011 .000 .000 .000 1.000 .469 .500 .500 .500 . 56 56 56 56 56
This table gives details of the correlation between each pair of variables. We do not want strong correlations between the criterion and the predictor variables. From the above table we can see that factor 1 has strong correlation with CGPA and factor 3 and factor has negative correlation with CGPA. Factor 2 has very small correlation with the CGPA.
43
R Square .872
Square .862
This table is important. The Adjusted R Square value tells us that our model accounts for 86.2% of variance in the CGPA scores a very good model.
Table 4.13:
ANOVA
ANOVA Sum of
df 4 51 55
F 87.066
Sig. .000
a
This table reports an ANOVA, which assesses the overall significance of our model. As p < 0.05 our model is significant.
44
Table 4.14:
Collinearity diagnostics
Coefficients Standardized Unstandardized Coefficients Coefficients Beta t Sig. Collinearity Statistics Tolerance
Model
The Standardized Beta Coefficients give a measure of the contribution of each variable to the model. A large value indicates that a unit change in this predictor variable has a large effect on the criterion variable. The t and Sig (p) values give a rough indication of the impact of each predictor variable a big absolute t value and small p value suggests that a predictor variables having a large impact on the criterion variable. From the table we can see that the factor 1 is highly significant as expected. Academic perseverance has positive effect on outcome meaning, larger values of this factor result in better academic performance. Also factor 3 is significant and has negative effect on outcome, meaning that higher values for this factor score result in poorer academic performance. Average time spent on entertainment, number of hours of sleep per day and to some extent, numbers of roommate has high loadings on this factor. Thus, these variables are correlated and they have a negative impact on academic performance.
45
Conclusion
This study is conducted to identify the factors influencing the performance of students living at Shahidullah Hall in University of Dhaka. The factor analysis is done with eleven variables which are attendance in class, study hours per week after class, family income, political influences, extracurricular activities, past academic performances (SSC and HSC results), entertainment, involvement in financial activities, how long it took to get a seat in the hall, number of roommate, sleeping hours. From the analysis our finding is that we reduce the variables to four factors that are academic effort factor, financial factor, leisure and entertainment factor and hall seat factor. Then a regression model is fitted with this factors which are considered as
impendent variables and Current CGPA is considered as dependent variable. We test significance of the model. The result of the significance test is that our model is significant with these factors. That means this four factors mainly influence the academic performance of the students living at Shahidullah Hall in University of Dhaka. The findings of the study were summarized and discussed in the following paragraphs.
The result of the analysis indicates that the academic effort factor has very high positive effect on the CGPA that means students who have good previous result, who attends the class regularly, who studies a good amount of time after class have, better CGPA than the other students. We found that the financial factor has very small influence to the academic performance. That means if a student is busy with income earning activities then it will have little effect on his CGPA. From the result we also found that entertainment factor has negative influence to the academic performance of the students. That means if a student is too busy with entertainment that it will have bad effect on his academic performance. We also found that hall seat factor, factor 3 is significant and has negative effect on outcome, meaning that higher values for this factor score result in poorer academic performance. Average time spent on entertainment, number of hours of sleep per day and to some extent, numbers of roommate has high loadings on this factor. Thus, these variables are correlated and they have a negative impact on academic performance.
46
The result of the analysis indicates the previous result, attendance, study hour after class have the positive correlation and political activity has negative correlation with the CGPA that means students who have good previous result, who attends the class regularly, who studies a good amount time after class have better CGPA than the other students. Again if a student is spending lots of time for his political activities then he has less CGPA then others who have not involved in political activities. We found that students who were actively engage in extracurricular activities obtained greater CGPA. This is proved by the result that involvement in extracurricular activities has positive relationship with CGPA. We found that the family income has negative correlations with the financial activity and positive correlation with extracurricular activity. That means students whose parents have high income do not have to involve in much financial activities and their involvements in extracurricular activities are better for this reason. We also found that Time taken to get a hall seat has negative correlation with Average extracurricular activity. That means if time taken to get a hall seat is too long for a student then a he can not involve in extracurricular activities as much as other can. This is because if a student can not get a seat in the hall in proper time then he has to face many problems So he does not get proper time for involvement in extracurricular activities. From the result we also found that entertainment has positive correlation with the sleep per day and has negative correlation with political activity. We can interpret this as if a student is busy with entertainment then he gets tired and sleep more than usual and can not involve in much political activities. After discussing all these, we can comment that, the students performance (academic achievement) plays an important role in producing the best quality graduates who will become great leader and manpower for the country thus responsible for the countries economic and social development. The performance of students in universities should be a concern not only to the administrators and educators, but also to corporations in the labour market. Academic achievement is one of the main factors considered by the employer in recruiting workers especially the fresh graduates. Thus, students have to place the greatest effort in their study not only to obtain a good grade but also in order to developing a set of moral and ethical values, developing social competency and consistent attendance.
47
Appendix Questionnaire
48
A survey to determine the factors influencing academic performance of students living in Shahidullah Hall of University of Dhaka
Questionnaire
Department:
Current year:
Division:
1st year
2nd year
3rd year
4th year
49
2.2. If yes, how long did it take for you to get a seat?
1st year
2nd year
3rd year
4th year
4.1. Average number of study hours after class per week during 1st year 2nd year 3rd year 4th year
4.2. At which times of the day do you usually study after class? (e.g., 6p.m.10 p.m.)
a) No formal education
b) Primary (class 1 to 5)
e) Undergraduate
a) No formal education
6. What is your birth order (i.e. eldest, second child, youngest, etc)?
7. Have you been involved in money earning activities (for example, tutoring students, part time jobs, business, etc) during
a) 1st year
a. no
b. yes If yes, average amount of time (in hrs) spent in a day: b. yes If yes, average amount of time (in hrs) spent in a day: b. yes If yes, average amount of time (in hrs) spent in a day: b. yes If yes, average amount of time (in hrs) spent in a day:
8. 1. Have you ever been involved in extracurricular activities (say, playing football or cricket, participating in debate competitions, etc.)? a) yes b) no
8.2. If yes, how many hours did you spend on average per day during the 1st year 2nd year 3rd year 4th year
9. What is the average amount of time (in hrs) you spend each day for entertainment (say, watching TV, listening to music, reading novels, etc.)?
51
10. Have you ever been involved in political activities? a) 1st year a. no b. yes If yes, average amount of time spent per week: hrs hrs hrs hrs
b. yes If yes, average amount of time spent per week: b. yes If yes, average amount of time spent per week: b. yes If yes, average amount of time spent per week:
11.1. On average, how many hours do you sleep in a day (i.e. over a 24hr period)?
11. 2. Have you ever been seriously ill during the academic year? a) yes b) no
11. 3. At present, what is the condition of your physical heath? a) good b) somewhat good c) bad
14 How many hours do you spend with friends outside class each day?
hrs
15.1 Are classes conducted regularly and in a timely manner for most of the courses? a) yes b) no.
52
16.4. Where do you usually study? a) library b) own room c) class room d) other places e) reading room 17. Do you rely on financial support from your family? a) yes b) no 18. What is your marital status? a) single b) engaged c) married 19. Are you satisfied with your current academic performance? a. yes, I am satisfied. b. yes, but I want to improve it. c. no, I really need to improve it.
20. Comment on the quality of food served in the dining halls. a. good . b average c poor quality
53
Bibliography
Richard A. Johnson and Dean W. Wichern (1982). Applied Multivariate Statistical Analysis. Prentic- Hall, Inc. Englewood Cliffs, New Jersy 07632.
Mc Donald, R. (1985). Factor Analysis and Related Methods. Hillsdale, NJ: Erlbaum. Economic Trends, July 2007, July 2008, July 2009, Statistics Department, Bangladesh Bank.
Syeda Shamima Sultana (2003) Factor Analysis: An application to Gross Domestic Product Data. Institute of Statistical Research and Training. University of Dhaka.
Md. Omar Faruque (2008) An application of factor analysis to the Agricultural Production in Bangladesh. Institute of Statistical Research and Training. University of Dhaka.
Websites:
http://www.hawaii.edu http://www.wikiedia.com http://www.cscanada.org http://www.cscanada.net http://www.moedu.gov.bd
54