Section 2 - Descriptive Multivariate Statistics
Section 2 - Descriptive Multivariate Statistics
By
Dr.Richard Tuyiragize
School of Statistics and Planning
Makerere University
.. .. .. ... .. ... ..
. . . . .
x̄1
x̄2
x̄ = ..
.
x̄p
STA3120 1 Email:[email protected]
n
1
(xij − x̄)2
P
Sjj = n−1
i=1
n
1
P
where x̄j = n
xij
i=1
n
1
P SS(xj )
Sjj = n−1
(xij − x̄j )(xij − x̄j ) = n−1
i=1
As an extension, taking any two variables in the multivariate data set, say xj and xk . A
measure of joint dispersion/variance is the sample co-variance demoted as Sjk . such that
n
1
P SCP (xj xk )
Sjk = n−1
(xij − x̄j )(xik − x̄k ) = n−1
i=1
The measure of dispersion for a multivariate data set is a square matrix of order p
V ar(x11 ) Cov(x1 x2 ) · · · · · · Cov(x1 xp )
Cov(x2 x1 ) V ar(x22 ) · · · · · · Cov(x1 xp )
S=
.. .. .. ... .
..
. . .
Cov(xp x1 ) Cov(xp x2 ) · · · · · · Cov(xp xp )
Generally,
S11 S12 · · · ··· S1p
S21 S22 · · · · · · S2p
S=
.. .. .. . . ..
. . . . .
Sp1 Sp2 · · · · · · Spp
The sample variance covariance matrix can be expressed in vector terms;
n
1
P ′ 1
S= n−1
(xi − x̄)(xi − x̄) = n−1
.A
i=1
where;
xi is the ith row of the data set
x̄ is the sample mean vector
A is the sample sum of squares of cross products matrix (SSCP matrix)
STA3120 2 Email:[email protected]
The determinant of the sample variance covariance matrix summarizes the dispersion and is
called generalized sample variance of the multivariate data.
Properties of matrix S
1. Its diagonal entries are variances and the off diagonal entries are covariances
2. If the p variables are all pairwise jointly uncorrelated, the off diagonal entries will be
zero
3. For any two variables, xj and xk . If Cov(xj xk ) = Cov(xk xj ), the the matrix is sym-
metric
4. Given a sample size N , for N > P ; then the matrix S is always positive definite
10. The distribution of S is known as the Wishart distribution, denoted as Wp (n, Σ).
Where n = N − 1. Hence S is called the Wishart matrix. The Wishart distribution
is a generalization of the χ2 distribution i.e. S ∼ σχ2N −1
4 Measures of correlation
The correlation coefficient, denoted by r, is a measure of the strength of the straight-line or
linear relationship between two continuous variables. For any two variables, xj and xk , in
the multivariate data set, the sample pearson correlation coefficient is given by;
Sample Cov(xj xk ) S
Sample PCC, r = = √ jk√
Sample SD(xj )Sample SD(xk ) Sjj Skk
STA3120 3 Email:[email protected]
As a measure of correlation for the nxp multivariate data set, we summarize the sample PCC
into a square matrix of order pxp, called the Sample correlation matrix, denoted as R;
1 r12 · · · · · · r1p
r21 1 · · · · · · r2p
R=
.. .. .. . . ..
. . . . .
rp1 rp2 · · · · · · 1
Properties of R
1. If the p variables in the data are pairwise jointly uncorrelated then the off diagonal
entries will be zero. Hence R will take on the form;
R = Diagp (I), the identity matrix.
3. Given some sample covariance matrix S, then the corresponding sample correlation
matrix R is computed as;
R = D−1 SD−1 , where D = diagp ( Sjj )
p
Example For some bivariate data set, the sample covariance matrix has been found to be;
16 3
3 25
Compute the sample correlation coefficient, R
Solution
1
16 3 −1 −1 −1 0
D= ; R = D SD ; D = 4 1
3 25 3
1 1 5
0 16 3 0
R = D−1 SD−1 = 4 1 4
3 5 3 25 3 15
3
1 20
R= 3
20
1
Question Find the sample
mean vector, covariance and correlation matrices for the following
4 1
data matrix. −1 3
3 5
STA3120 4 Email:[email protected]
5 Random Vectors and Matrices
A random vector is a vector whose components are random variables and a random matrix
is a matrix whose components are random.
If we have a column vector of p random components and row vector of q random components,
then pxq is a rectangular matrix, U =uij for i=1, 2, 3, ....., p and j=1, 2, 3, ....., q; where pq
elements uij ′ s are random variables.
For this course, we shall deal with a column or row vector X with p variables.
We shall take the p variables in the multivariate data set as realizations of some p random
variables x1 , x2 , x3 , · · · , · · · , xp , whose simultaneous probabilistic or stochastic behaviour we
need to investigate; and we take the p random variables as forming the row vector;
′
x = x1 , x2 , · · · , xp
x1 E(x1 ) µ1
x2 E(x2 ) µ2
E(x) = E .. = .. = .. =µ
. . .
xp E(xp ) µp
STA3120 5 Email:[email protected]
σx2 = E(x − µ)2 for µ = E(x)
For two random variables x and y, a common measure of joint dispersion is the population
covariance,
The measure of dispersion for a p-variate random vector x is the population p-variate
variance-covariance matrix, denoted as Σ.
′
Σ = E(x − µ)(x − µ) where for µ = E(x)
x1 − µ 1
x2 − µ 2
Σ = E .. (x1 − µ1 )(x2 − µ2 ) · · · (xp − µp )
.
xp − µ p
(x1 − µ1 )2 (x1 − µ1 )(x2 − µ2 ) · · · · · · (x1 − µ1 )(xp − µp )
(x2 − µ2 )(x1 − µ1 )
(x2 − µ2 )2 ··· · · · (x2 − µ1 )(x1 − µp )
=E
.. .. .. .. ..
. . . . .
2
(xp − µp )(x1 − µ1 ) (xp − µp )(x2 − µ2 ) · · · ··· (xp − µp )
V ar(x11 ) Cov(x1 x2 ) · · · · · · Cov(x1 xp )
Cov(x2 x1 ) V ar(x22 ) · · · · · · Cov(x2 xp )
=
.. .. .. .. ..
. . . . .
Cov(xp x1 ) Cov(xp x2 ) · · · · · · V ar(xpp )
σ11 σ12 · · · · · · σ1p
σ21 σ22 · · · · · · σ2p
Σ=
.. .. . . .. ..
. . . . .
σp1 σp2 · · · · · · σpp
STA3120 6 Email:[email protected]
6 Linear Combinations of Random Vectors
1. Univariate case:
E(a1 x1 ) = a1 E(x1 ) = a1 µ1
V ar(a1 x1 ) = a21 V ar(x1 ) = a21 σ11
2. Bivariate case:
′
=⇒ V ar(aX) = a Σa
3. Multivariate case:
′
If X a p-dimensional random vector and a R, then the linear combination a X is a
one-dimensional random variable. That is,
X1
X2
.
X = ..
.
..
Xp
X1
X2
.
Linear combination: a1 X1 + a2 X2 + .... + ap Xp = a1 , a2 , ... ap .. = a X
′
.
..
Xp
STA3120 7 Email:[email protected]
′ ′ ′
E(a X) = a E(X) = a µ
′
V ar(aX) = a Σa
σ11 · · · σ1p
.. ..
Here Σ = . .
σp1 · · · σpp
p
P
Z1 = a11 X1 + a12 X2 + · · · + a1p Xp = a1j Xj = a1j X
j=1
Pp
Z2 = a21 X1 + a22 X2 + · · · + a2p Xp = a2j Xj = a2j X
j=1
..
.
p
P
Zq = aq1 X1 + aq2 X2 + · · · + aqp Xp = aqj Xj = aqj X
j=1
In matrix for:
a11 a12 · · · ···
a1p
Z1 Z1
Z 2 Z 2
a21 a22 · · · · · · a2p
= = .. ⇐⇒ Z = AX
..
. .. .. . . .
. .
. . ..
. . .
Zp Zp
ap1 p2 · · · · · · app
′
Cov(Z) = Cov(AX) = AΣA
Example
Find the mean vector and covariance matrix for the linear combinations: Z1 = X1 − X2 and
Z2 = X1 + X2 .
1 −1 X1
Z= = AX
1 1 X2
1 −1 µ1 µ1 − µ2
E(Z) = AE(X) = Aµ = =
1 1 µ2 µ1 + µ2
′
Cov(Z)
=ACov(Z)A
1 −1 σ11 σ12 1 1 σ11 − 2σ12 + σ22 σ11 − σ22
= =
1 1 σ21 σ22 −1 1 σ11 − σ22 σ11 + 2σ12 + σ22
STA3120 8 Email:[email protected]