2012-408 Understanding Correlation Matrices
2012-408 Understanding Correlation Matrices
The correlation matrix is a positive semi definite matrix that describes the dependency between different
data sets. In case of multi variables mode or dealing with many secondary variables which is hard to
predict spatial distribution and dependency between variables, correlation matrix is a key element to
describe this dependency. The principal directions of data set variance are defined by principal
components. Principal Component Analysis (PCA) is a statistical procedure to calculate eigenvalues and
eigenvectors of correlation matrix which are principal component of data set, by dimension reduction.
Introduction
Dependency refers to any statistical relationship between two random variables or two sets of data.
Correlation refers to any of a broad class of statistical relationship involving dependence. Correlation
between two set of data set 𝑋, 𝑌 defined as:
𝐶𝑜𝑣(𝑋, 𝑌) 𝐸[(𝑋 − 𝜇𝑋 )(𝑌 − 𝜇𝑌 ]
𝜌𝑋𝑌 = 𝐶𝑜𝑟𝑟(𝑋, 𝑌) = = (1)
𝜎𝑋 𝜎𝑌 𝜎𝑋 𝜎𝑌
1 𝑛
(2)
𝐶𝑜𝑣(𝑋, 𝑌) = � (𝑋𝑖 − 𝜇𝑋 )(𝑌𝑖 − 𝜇𝑌 )
𝑁 𝑖=1
𝑛 (3)
∑𝑖=1(𝑋𝑖 − 𝜇𝑋 ) ∑𝑛𝑖=1(𝑌𝑖 − 𝜇𝑌 )
𝜎𝑋 = � 𝜎𝑌 = �
𝑛 𝑛
Where 𝜌, 𝐸 𝑎𝑛𝑑 𝐶𝑜𝑣, are correlation, expected value and covariance operator respectively, 𝜇 is the
mean, 𝜎 is standard deviation and 𝑛 is number of variables. The correlation is +1 in the case of a perfect
positive linear relationship, −1 in the case of a perfect negative linear relationship, and some value
between −1 and 1 in all other cases, indicating the degree of linear dependence between the variables. As
it approaches zero there is less of a relationship (closer to uncorrelated). The closer the coefficient is to
either −1 or 1, the stronger the correlation between the variables. The correlation cannot exceed 1
in absolute value. When correlations among several variables are computed, they are typically
summarized in the form of a correlation matrix. Correlation matrices are built to describe the dependency
between different data sets and are symmetric as 𝐶𝑜𝑟𝑟(𝑋, 𝑌) = 𝐶𝑜𝑟𝑟(𝑌, 𝑋). Correlation matrices must be
positive semi definite. It means for all Non-zero column vector 𝑍, 𝑍 𝑇 𝜌𝑍 > 0 (𝑍 𝑇 is the transpose of 𝑍 and
𝜌 is correlation matrix).
The subject of multivariate analysis deals with the statistical analysis of the data collected on
more than one variable. These variables may be correlated with each other, and their statistical
dependence is often taken into account when analyzing such data. Correlation matrix is a key element to
explain and apply this dependency in multi variable mode. In reservoir estimation the primary well data,
which is expensive to obtain by drilling is predicted using the easy and cheap obtaining secondary seismic
data. Correlation matrix can be useful for spatial prediction and dimension reduction when we are
dealing with many secondary variables (Kumar, A. and Deutsch, C.V., 2009, CCG annual report).
408-1
Paper 408, CCG Annual Report 14, 2012 (© 2012)
𝑎11 − λ 0 ⋯ 0
0 𝑎22 − λ … 0
= 𝑑𝑒𝑡 � �
⋮ ⋮ ⋱ 0
0 0 0 𝑎𝑛𝑛 − λ
= (𝑎11 − λ)(𝑎22 − λ) … (𝑎𝑛𝑛 − λ) = 0
The solutions to this equation are the eigenvalues λ𝑖 = 𝑎𝑖𝑖 (i=1,2,…n). When we deal with large size of
matrices, get the eigenvalues and eigenvectors from characteristic equation is not an easy job. There are
two classes of numerical methods to calculate eigenvalues and eigenvectors (Panhuis, P.H.M., 2005): (1)
Partial methods - computation of extrema eigenvalues such as power method, and (2) Global methods -
approximation of whole spectrum such as: Principal Component Analysis (PCA), Multi-Dimensional Scaling
(MDS), and Factorization. This report provides an introduction to global methods and specifically focuses
on PCA method. MDS and factorization will be covered in future works.
Figure 1 Figure 2
If the variation in a data set is caused by some natural property, or is caused by random experimental
error, then we may expect it to be normally distributed. In this case we show the nominal extent of the
normal distribution by a hyper-ellipse (the two dimensional ellipse inFigure1). The hyper ellipse encloses
data points that are thought of as belonging to a class. It is drawn at a distance beyond which the
probability of a point belonging to the class is low, and can be thought of as a class boundary.
If the variation in the data is caused by some other relationship, then PCA gives us a way of
reducing the dimensionality of a data set. Consider two variables that are nearly related linearly as shown
in figure2. As in figure1 the principal direction in which the data varies is shown by the U axis and the
secondary direction by the V axis. However in this case all the V coordinates are very close to zero. We
may assume, for example, that they are only non-zero because of experimental noise or measurement
408-2
Paper 408, CCG Annual Report 14, 2012 (© 2012)
error. Thus in the U-V axis system we can represent the data set by one variable U and discard V. Thus we
have reduced the dimensionality of the problem by 1.
In computational terms the principal components are found by calculating the eigenvectors and
eigenvalues of the data covariance matrix. This process is equivalent to finding the axis system in which
the co-variance matrix is diagonal. The eigenvector with the largest eigenvalue is the direction of greatest
variation, the one with the second largest eigenvalue is the (orthogonal) direction with the next highest
variation and so on. In the other word, eigenvalues are the variance of principal components. The first
eigenvalue is the variance of the first principal component; the second eigenvalue is the variance of the
second principal component and so on (Gillies,D., 2005).
The eigenvalues of A, 𝑛 × 𝑛 matrix, are defined as the roots of:
det(𝐴 − λ) = |𝐴 − λI| = 0 (5)
Let λ be an eigenvalue of A. Then there exists a vector 𝑥 such that:
𝐴𝑥 = λ𝑥 (6)
The vector 𝑥 is called an eigenvector of A, associated with the eigenvalue λ. Notice that there is no
unique solution for 𝑥 in the above equation. It is a direction vector only and can be scaled to any
magnitude. To find a numerical solution for 𝑥 we need to set one of its elements to an arbitrary value, say
1, which gives us a set of simultaneous equations to solve for the other elements. If there is no solution
we repeat the process with another element. Ordinarily we normalize the final values so that 𝑥 has length
one, that is 𝑥. 𝑥 𝑇 = 1.
Suppose we have a 3 × 3 matrix A with eigenvectors 𝑥1 , 𝑥2 , 𝑥3 and eigenvalues λ1 , λ2 , λ3 so:
𝐴𝑥1 = λ1 𝑥1 𝐴𝑥2 = 𝜆2 𝑥2 𝐴𝑥3 = 𝜆3 𝑥3 (7)
Putting the eigenvectors as the columns of a matrix gives:
λ1 0 0
𝐴[𝑥1 𝑥2 𝑥3 ] = � 0 λ2 0 � [𝑥1 𝑥2 𝑥3 ] (8)
0 0 λ3
Writing:
λ1 0 0
𝜑 = [𝑥1 𝑥2 𝑥3 ] ʌ = � 0 λ2 0 � (9)
0 0 λ3
Gives us the matrix equation:
𝐴𝜑 = ʌ𝜑 (10)
We normalized the eigenvectors to unit magnitude, and they are orthogonal, so:
𝜑𝜑 𝑇 = 𝜑 𝑇 𝜑 = 𝐼 (5)
This means that:
𝜑 𝑇 𝐴𝜑 = ʌ (6)
And:
𝐴 = 𝜑ʌ𝜑 𝑇 (7)
Now let us consider how this applies to the covariance matrix in the PCA process. Let Σ be a 𝑛 × 𝑛
covariance matrix. There is an orthogonal 𝑛 × 𝑛 matrix 𝜑 whose columns are eigenvectors of Σ and a
diagonal matrix ʌ whose diagonal elements are the eigenvalues of Σ, such that:
𝜑Σ𝜑 𝑇 = ʌ (8)
We can look on the matrix of eigenvectors 𝜑 as a linear transformation which, in the example of figure1
transforms data points in the [X, Y] axis system into the [U,V] axis system. In the general case the linear
transformation given by 𝜑 transforms the data points into a data set where the variables are
uncorrelated. The correlation matrix of the data in the new coordinate system is ʌ which has zeros in all
the off diagonal elements.
Principal component analysis is appropriate when you have obtained measures on a number of
observed variables and wish to develop a smaller number of artificial variables (called principal
components) that will account for most of the variance in the observed variables. Some limitations of PCA
(Izenman, A.J.,2008): (1) The directions with largest variance are assumed to be of most interest. (2) We
only consider orthogonal transformations (rotations) of the original variables. (Kernel PCA is an extension
of PCA that allows non-linear mappings). (3) PCA is based only on the mean vector and the covariance
matrix of the data. Some distributions (e.g. multivariate normal) are completely characterized by this, but
408-3
Paper 408, CCG Annual Report 14, 2012 (© 2012)
others are not. (4) Dimension reduction can only be achieved if the original variables were correlated. If
the original variables were uncorrelated, PCA does nothing, except for ordering them according to their
variance. (5) PCA is not scale invariant.
Implementation
In this study, a program written in fortran90 code used to calculate correlation matrix, eigenvalues and
eigenvectors for a data set with six variables. This code reads data set in ASCII format, deletes null values
(use full valued subset to delete -999) and it calculates Mean, Standard deviation, Covariance and
Correlation matrix for data set. Then calculate eigenvalues and eigenvectors for the correlation matrix.
Standard GSLIB convention (corrmat_plot) is used to plot correlation matrix.
Calculated correlation matrix for six different variables displayed in Figure 3. This is an
appropriate opportunity to review just how a correlation matrix is interpreted. The rows and columns of
Figure 3 correspond to the six variables included in the analysis: Row 1 (and column 1) represents variable
1, row 2 (and column 2) represents variable 2, and so forth. When a given row and column intersect, you
will find the correlation between the two corresponding variables. For example, where the row for
variable 2 intersects with the column for variable 1, you find a correlation of 0.14; this means that the
correlation between variables 1 and 2 is 0.14.
Based on Figure 3, variables 2, 5 and 6 show relatively strong positive correlation with one another
(𝜌25= 0.76 , 𝜌26= 0.66, 𝜌56= 0.56) .
Variable 4 shows relatively strong negative correlation with variable 2, 5 and 6
(𝜌42= − 0.73 , 𝜌45= − 0.59, 𝜌46= − 0.46).
Variable 3 also shows negative correlation with variable 2, 5 and 6
(𝜌32= − 0.53 , 𝜌35= − 0.38, 𝜌36= − 0.49).
However, variable 1 has no correlation with the other variables. When the correlation between two
variables is less than 0.2 (|𝜌| < 0.2) we assume they are uncorrelated (Babak, O. and Deutsch, C.V.,2008).
Let’s reorder variables based on their correlations to each other and have a new correlation matrix for
reordered variables. Variable 2, 5 and 6 which have strong positive correlation to each other place in first,
second and third orders respectively. Variable 4 and 3 which have negative correlation with variables 2, 5
and 6 place in fourth and fifth order respectively. Variable 1 has no correlation with the other variables
places at the last order.
1 Ni(V1) Fe(V2)
2 Fe(V2) Co(V5)
3 SiO2(V3) Al2O3(V6)
4 MgO(V4) MgO(V4)
5 Co(V5) SiO2(V3)
6 Al2O3(V6) Ni(V1)
Table1: Reordering variables based on their correlation
Figure 3: Correlation matrix for six different variables Figure 4: Correlation matrix based on reordered variables
408-4
Paper 408, CCG Annual Report 14, 2012 (© 2012)
The new correlation matrix (Figure4) shows that the six variables seem to hang together in three distinct
groups. First group, variable2,5 and 6 which they have strong positive correlation to each other. Second
group, variable3 and 4 which they have negative correlations with group1. The last group is variable1 has
no correlation with the rest of variables (Figure 5). This is the redundancy of six variables to three. In
essence, this is what accomplished by correlation matrix. In multivariate analysis mode or when dealing
with many secondary variables, correlation matrix allows you to reduce a set of observed variables into a
smaller set of artificial variables which called dimension reduction.
Second group
First group
Eigenvalues and eigenvectors are calculated for this correlation matrix and displayed on Table2. Then they
are reordered based on their magnitudes as the first eigenvector with largest eigenvalue is the direction
of greatest principal component and so on (Table 3).
Eigenvalues
0.009 3.1376 0.8583 1.2835 0.2816 0.43
Eigenvector
Ni(V1) -0.0553 0.1469 0.8668 0.4732 0.0095 0.0025
Fe(V2) -0.6259 0.5325 -0.083 -0.0748 -0.5228 -0.1971
SiO2(V3) -0.4957 -0.3081 0.3746 -0.6546 0.2518 0.1645
MgO(V4) -0.5912 -0.3942 -0.2838 0.5675 0.2962 -0.0691
Co(V5) 0.0025 0.4836 -0.0306 -0.1053 0.7021 -0.511
Al2O3(V6) -0.0994 0.4589 -0.1411 0.0944 0.2872 0.8174
Table 2: Eigenvalues and eigenvectors for correlation matrix
Re-ordered eigenvalues
3.1376 1.2835 0.8583 0.43 0.2816 0.009
Cumulative eigenvalues
0.52293 0.73685 0.8799 0.95157 0.9985 1
Eigenvectors
Ni(V1) 0.1469 0.4732 0.8668 0.0025 0.0095 -0.0553
Fe(V2) 0.5325 -0.0748 -0.083 -0.1971 -0.5228 -0.6259
SiO2(V3) -0.3081 -0.6546 0.3746 0.1645 0.2518 -0.4957
MgO(V4) -0.3942 0.5675 -0.2838 -0.0691 0.2962 -0.5912
Co(V5) 0.4836 -0.1053 -0.0306 -0.511 0.7021 0.0025
Al2O3(V6) 0.4589 0.0944 -0.1411 0.8174 0.2872 -0.0994
Table 3: Reordered eigenvalues and eigenvectors for correlation matrix
Cumulative eigenvalue curve c(x) is defined to be:
∑𝑥𝑖 λ𝑖
𝐶(𝑥) = 𝑁 × 100 (15)
∑𝑖=1 λ𝑖
408-5
Paper 408, CCG Annual Report 14, 2012 (© 2012)
100
Eigenvalues cummulative
80
60
40
20
0
0 1 2 3 4 5 6
Order of Eigenvalues
Figure 6: Cumulative eigenvalue VS Order of eigenvalues
The interpretation of this curve, Figure 6, is that the value C(x) represents the amount of information
maintained in the input vectors if we project them onto the subspace spanned by the top x eigenvectors.
A feature transformation that, for instance, retains 50 percent of the original information (variance) of the
input data can be obtained by first eigenvalue(C(x) =50), and more than 95 percent of the original
information can be obtained by first four eigenvalues (C(x) =95). If the data are independent then
cumulative curve, C(X), follows the red line. Consider the following classification for the eigenvectors:
0.2 ≤ 𝜌 ≤ 1 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 → 𝐺𝑟𝑒𝑒𝑛
𝑖𝑓 � |𝜌| < 0.2 𝑛𝑜 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 → 𝐺𝑟𝑎𝑦
−1 ≤ 𝜌 ≤ −0.2 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 → 𝑂𝑟𝑎𝑛𝑔𝑒
Then, it is easier to visualize and interpret eigenvectors based on variables correlation (Table 4). First
eigenvectors (λ=3.1376) shows strong positive correlation for variable 2, 5 and 6, negative correlation for
variable 3 and 4 and de-correlation for variable 2 as expected.
Re-ordered eigenvalues
3.1376 1.2835 0.8583 0.43 0.2816 0.009
Cumulative eigenvalues/Number of variables
0.52293 0.73685 0.8799 0.95157 0.9985 1
Eigenvectors
Fe(V2) 0.5325 -0.0748 -0.083 -0.1971 -0.5228 -0.6259
Co(V5) 0.4836 -0.1053 -0.0306 -0.511 0.7021 0.0025
Al2O3(V6) 0.4589 0.0944 -0.1411 0.8174 0.2872 -0.0994
MgO(V4) -0.3942 0.5675 -0.2838 -0.0691 0.2962 -0.5912
SiO2(V3) -0.3081 -0.6546 0.3746 0.1645 0.2518 -0.4957
Ni(V1) 0.1469 0.4732 0.8668 0.0025 0.0095 -0.0553
Table 4: Reordering eigenvectors based on variable correlation (Table1)
Summary and Future work
The correlation matrix summarizes correlation among several variables can describe statistical
dependency between them. When dealing with large size of matrices, it is difficult to calculate
eigenvalues and eigenvectors from characteristic equation. There are several numerical methods to
calculate eigenvalues and eigenvectors for large size of matrices. Principal component analysis (PCA)
calculates eigenvalues and eigenvectors by dimension reduction. In future work, other numerical
methods such as Multi-Dimensional scaling and factorization will be described.
References
Babak, O., and Deutsch, C.,V., 2008, CCG annual report: Testing for the Multivariate Gaussian Distribution of Spatially
Correlated Data.
Gillies, D., 2005, Lectures and course materials: Principal Component Analysis.
Izenman, A.J., 2008, Modern Multivariate Statistical Techniques, pp 597-606
Kumar, A., and Deutsch,C.V., 2009, CCG annual report :Optimal Correlation of Indefinite Correlation Matrices.
Panhuis, P.H.M.W., 2005, Iterative Techniques For Solving Eigenvalue problems.
Wickelmaier, F.,2003, An Introduction to MDS.
408-6