0% found this document useful (0 votes)
114 views14 pages

MVA Section1 2012

This document provides an introduction to multivariate statistical methods. It discusses multivariate data organization and representation as data matrices and vectors. It defines key concepts such as sample mean vector, sample covariance matrix, sample correlation matrix, and measures of multivariate scatter such as total variation and generalized variance. It also introduces random vectors and definitions of population mean vector and covariance matrix. Examples of multivariate data applications are given for plant classification, credit scoring, and image processing.

Uploaded by

George Wang
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
114 views14 pages

MVA Section1 2012

This document provides an introduction to multivariate statistical methods. It discusses multivariate data organization and representation as data matrices and vectors. It defines key concepts such as sample mean vector, sample covariance matrix, sample correlation matrix, and measures of multivariate scatter such as total variation and generalized variance. It also introduces random vectors and definitions of population mean vector and covariance matrix. Examples of multivariate data applications are given for plant classification, credit scoring, and image processing.

Uploaded by

George Wang
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

MATH38061/48061/68061

MATH38061/48061/68061
Multivariate Statistical Methods
Mike Tso: email [Link]@[Link]
Webpage for notes: [Link]
1. Introduction to multivariate data
1.1 Books
Chateld, C. and [Link], Introduction to multivariate analysis. Chapman & Hall, 1980
Krzanowski, W.J., Principles of multivariate analysis. Oxford, 2000
Johnson, [Link] D.W. Wichern Applied multivariate statistical analysis. Prentice Hall. 2007
(6
th
Ed.)
Rencher, Alvin D. Methods of multivariate analysis [e-book]. Wiley, 2002 ( 2
nd
Ed.)
Hewson, Paul. Multivariate Statistics with R [available as Open Text Book]
[Link]
[Link]
1.2 Applications & Data Organization
The need often arises in science, medicine and social science (business, management) to analyze
data on j 1 variables (Note that j = 2 = data are bivariate).
Suppose we have a simple random sample consisting of : items. For each item of the sample,
measurements are made on j variables. Let r
ij
denote the measurement of the ,th variable on the
ith item. One can display the measurements as the table
Variable 1 Variable 2 Variable , Variable j
Item 1 r
11
r
12
r
1j
r
1p
Item 2 r
21
r
22
r
2j
r
2p
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Item i r
i1
r
i2
r
ij
r
ip
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Item : r
n1
r
n2
r
nj
r
np
or equivalently as a matrix
1
X =
_

_
r
11
r
12
r
1j
r
1p
r
21
r
22
r
2j
r
2p
.
.
.
.
.
.
.
.
.
.
.
.
r
i1
r
i2
r
ij
r
ip
.
.
.
.
.
.
.
.
.
.
.
.
r
n1
r
n2
r
nj
r
np
_

_
When j = 2 the data matrix X is : 2 and we can plot the rows in 2-dimensional space, but
in higher dimensions, j 2, other techniques are needed.
The sample consists of : vectors of measurements on j variates i.e. : jvectors (by convention
column vectors) x
1
; :::x
n
which are inserted as rows x
T
1
; :::;x
T
n
into a (: j) data matrix X.
[Note that some books form X as a j : data matrix from the : columns x
1
, ..., x
n
]
It is also sometimes useful to regard X as j columns each of which is an :vector. Column
, represents the set of measurements on variable , and a measure of the association between two
variables, which may be either a covariance or a correlation, can be computed from operations (dot
products) of these columns. Broadly, it is worth remembering that multivariate techniques either
have the items (units) as a primary focus or some evaluation of the variables.
Example 1
Classication of plants (taxonomy)
Variables: (j = 3) leaf size (r
1
), colour of ower (r
2
), height of plant (r
3
)
Sample items: : = 4 plants from a single species
Aims of analysis:
i) understand within species variability
ii) classify a new plant species
An example of a 4 3 data matrix X is as follows
Variables
r
1
r
2
r
3
1 6.1 2 12
Plants 2 8.2 1 8
(Items) 3 5.3 0 9
4 6.4 2 10
(1)
Example 2
Credit scoring
Variables: personal data held by bank
Items: sample of good/bad customers
Aims of analysis:
i) predict potential defaulters (CRM)
2
ii) risk assessment for new applicant
Example 3
Image processing for e.g. quality control
Variables: "features" extracted from an image
Items: sampled from a production line
Aims of analysis:
i) quantify "normal" variability
ii) reject faulty (o specication) batches
1.3 Sample mean and sample covariance matrix
We shall adopt the following notation:
x (j 1) a random (column) vector of
observations on j variables
X(: j) a data matrix whose : rows contain an
independent random sample x
T
1
; :::; x
T
n
of j observations on x
x(j 1) sample mean vector x =
1
:

n
i=1
x
i
S (j j) sample covariance matrix containing
the sample covariances dened as
:
jk
=
1
: 1

n
i=1
(r
ij
r
j
) (r
ik
r
k
)
R (j j) sample correlation matrix containing
the sample correlations dened as
r
jk
=
:
jk
_
:
jj
:
kk
=
:
jk
:
j
:
k
, say
Notes
1. r
j
=

n
i=1
r
ij
= mean of variable , is the ,
th
component of the vector mean x
2. The jj covariance matrix S is square, symmetric (S = S
T
), and holds the sample variances
:
jj
= :
2
j
=
1
: 1

n
i=1
(r
ij
r
j
)
2
along its main diagonal. Owing to symmetry, the number
of distinct elements in a j j covariance matrix = the number of elements on and below the
main diagonal
1 + 2 + ... + j =
1
2
j (j + 1)
3
The :
j
are simply the sample standard deviations.
[For this course we dene S as the unbiased estimator of the covariance matrix with divisor
: 1 rather than : (see later)]
3. The diagonal elements of R are r
jj
= 1 and 1 _ r
jk
_ 1 for each ,, /.
Proof
r
jk
=
:
jk
:
j
:
k
=
a
T
b
(a
T
a)
1
2
_
b
T
b
_
1
2
where
a =
_
_
_
_
r
1j
r
j
.
.
.
r
nj
r
j
_
_
_
_
b =
_
_
_
_
r
1k
r
k
.
.
.
r
nk
r
k
_
_
_
_
Thus r
jk
= u
T
v where u and v are unit length vectors:
u =
a
_
(a
T
a)
=
a
jaj
and v =
b
q
(b
T
b)
=
b
jbj
Hence 1 _ r
jk
_ 1 which is also a consequence of the Cauchy-Schwarz inequality
a
T
b
_
a
T
a
_
1
2
_
b
T
b
_
1
2
1.4 Matrix-vector representations
Given a (: j) data matrix X; we dene the : 1 column vector of ones
1 = (1, 1, ..., 1)
T
The row sums of X are then obtained by pre-multiplying X by 1
T
1
T
X =
_
n

i=1
r
i1
, ... ,
n

i=1
r
ip
_
= (: r
1
, ..., : r
p
)
= : x
T
Take transposes and divide by :, we obtain
x =
1
:
X
T
1 (2)
The centred data matrix X
0
is derived from X by subtracting the variable mean (column mean)
from each element of X. i.e. r
0
ij
= r
ij
r
j
. The same is achieved by subtracting a constant vector
4
x
T
from each row of X
X
0
= X 1 x
T
(3)
= X
1
:
11
T
X
=
_
I
n

1
:
11
T
_
X
= HX , say (4)
where
Denition 1 H =
_
I
n

1
:
11
T
_
is known as the centring matrix.
Exercises:
1. H is a symmetric matrix i.e. H
T
= H
2. H is an idempotent matrix i.e. H
T
= H and H
2
= H
Denition 2 The sample covariance matrix is
1
: 1
the centred sum of squares and products
(SSP) matrix
S =
1
: 1
X
0T
X
0
(5)
Exercise:
We have also the alternative expressions
S =
1
: 1
X
T
HX (6)
=
1
: 1
n

i=1
x
0
i
x
0T
i
(7)
where x
0
i
= x
i
x denotes the i
th
mean-corrected data point. Therefore S can be expressed as a
sum of "outer products" of the form ab
T
, each product being a j j matrix of rank 1.
Result 1
The sample covariance matrix S is positive semi-denite (p.s.d.)
Proof
For any real jvector y we have
(: 1) y
T
Sy = y
T
X
0T
X
0
y
= z
T
z where z = X
0
y
=
p

i=1
.
2
i
_ 0
Result 2
5
Let D be a j j diagonal matrix containing sample variances
D =
_
_
_
_
_
_
:
2
1
0 0 0
0 :
2
2
0 0
0 0
.
.
.
0
0 0 0 :
2
p
_
_
_
_
_
_
= diaq
_
:
2
1
, ..., :
2
p
_
, say
The sample correlation matrix is given (Ex) by
R = D

1
2
SD

1
2
Example
Two measurements r
1
, r
2
made at the same position on each of : = 3 cans of food, resulted in
the following 3 2 data matrix (j = 2):
X =
_

_
4 1
1 3
3 5
_

_
Find the sample mean vector x and covariance matrix S.
Solution
X =
_

_
4 1
1 3
3 5
_

_
=
_

_
x
T
1
x
T
2
x
T
3
_

_
= [x
1
; x
2
; x
3
]
T
x =
1
3

3
i=1
x
i
=
1
3
__
4
1
_
+
_
1
3
_
+
_
3
5
__
=
_
2
3
_
X
0
=
_

_
2 2
3 0
1 2
_

_
X
0
T
X
0
=
_
14 2
2 8
_
S =
1
2
X
0
T
X
0
=
_
7 1
1 4
_
S may also be built up from the individual data points x
0
i
x
0
T
i
(i = 1, 2, 3):
S =
1
2
__
2
2
_
_
2 2
_
+
_
3
0
_
_
3 0
_
+
_
1
2
_
_
1 2
_
_
and
R =
_
1 0.189
0.189 1
_
since r
12
=
1
2
_
7
6
1.5 Measures of multivariate scatter
It is useful to have a single number as a measure of spread in the data. Based on S we dene two
scalar quantities
Denition 3 The total variation is
tr (S) = tracc (S) =
p

j=1
:
ii
= sum of diagonal elements
= sum of eigenvalues of S
Denition 4 The generalized variance is
[S[ = product of eigenvalues of S (8)
In the above example we have
tr (S) = 7 + 4 = 11
[S[ = 7 4 1 = 27
1.6 Random vectors
We will in this course generally regard the data as an independent random sample from some
continuous population distribution with a probability density function
) (x) = ) (r
1
, ..., r
p
) (9)
Here x = (r
1
, ..., r
p
) is regarded as a (row or column) vector of j random variables. Indepen-
dence here refers to the rows of the data matrix. If two of the variables (columns) are for example
height and weight of individuals (rows), then knowing one individuals weight says nothing about
any measurement on another individual. However the height and weight for any individual are
correlated.
For any region T in jspace of the variables
Pr (x 2 T) =
_
D
) (x) dx
1.6.1 Mean vector (population mean)
For any , the population mean of r
j
is given by the jfold integral
E(r
j
) = j
j
=
_
r
j
) (x) dx
7
where the region of integration is R
p
.
The expectation of a random vector is dened to be the vector of expectations
E(x) =
_
_
_
_
_
_
Er
1
Er
2
.
.
.
Er
p
_
_
_
_
_
_
=
_
_
_
_
_
_
j
1
j
2
.
.
.
j
p
_
_
_
_
_
_
(10)
As expectation is a linear operator, we have
E(Ax +b) = AE(x) +b
= A +b (11)
For any random matrix X and conformable matrices A; B; C of constants we have
E(AXB +C) = AE(X) B +C (12)
i.e. constants are in a sense transparent as far as the operator E(.) is concerned (a property
of linear operators).
1.6.2 Covariance matrix (population)
The covariance between two random variables r
j
, r
k
is dened as
o
jk
= Co (r
j
, r
k
)
= E
__
r
j
j
j
_
(r
k
j
k
)

= E[r
j
r
k
] j
j
j
k
(13)
When , = / we obtain the variance of r
j
o
jj
= E
_
_
r
j
j
j
_
2
_
= E
_
r
2
j

j
2
j
The covariance matrix is a j j matrix
=(o
ij
) =
_

_
o
11
o
12
o
1p
o
21
o
22
o
2p
.
.
.
.
.
.
o
p1
o
p2
o
pp
_

_
(14)
The notations \ (x) , Co (x) and the terms variance-covariance matrix or dispersion matrix
are often used for
8
In matrix form
= E
_
(x ) (x )
T
_
= E
_
xx
T

T
(15)
More generally we dene the covariance between two random vectors x (j 1) and y ( 1)
as the (j ) matrix
Co (x; y) = E
_
(x
x
)
_
y
y
_
T
_
= E
_
xy
T

T
y
(16)
where
x
= E(x) and
y
= E(y) . In particular, note that
i) Co (x; x) = E
_
(x
x
) (x
x
)
T
_
= \ (x)
ii) \ (x +y) = \ (x) + \ (y) + Co (x; y) + Co (y; x)
iii) Co (x +y; z) = Co (x; z) + Co (y; z)
iv) Co (Ax; By) = ACo (x; y) B
T
These properties can all be deduced from the denition of Co (x; y) . For example
Co (Ax; By) = E
_
Axy
T
B
T
_
E(Ax) E
_
y
T
B
T
_
= AE
_
xy
T
_
B
T
AE(x) E
_
y
T
_
B
T
= A
_
Exy
T
E(x) E
_
y
T
_
B
T
Result 3
A very important property of is positive semi-deniteness
Proof
Let a (j 1) be a constant vector, then
E
_
a
T
x
_
= a
T
E(x) = a
T

and
\
_
a
T
x
_
= E
_
_
a
T
x a
T

_
2
_
= a
T
E
_
(x ) (x )
T
_
a
= a
T
a
Since variance is always _ 0, we have a
T
a 0, \a:
9
By denition (see handout), is a positive semi-denite (p.s.d.)matrix.
Suppose we have an independent random sample x
1
; x
2
; :::; x
n
from a distribution with mean
and covariance matrix . What is the relation between (a) the sample and population means,
(b) the sample and population covariance matrices?
Result 4
The mean (i.e. expectation) and covariance matrix of the sample mean x are
E(x) = (17)
\ (x) =
1
:
(18)
Proof
E( x) =
1
:
E
_
n

i=1
x
i
_
=
1
:
n

i=1
E(x
i
)
=
\ ( x) = Co
_
_
1
:
n

i=1
x
i
,
1
:
n

j=1
x
j
_
_
=
1
:
2
(:)
noting that Co (x
i
, x
i
) = and Co (x
i
, x
j
) = 0 for i ,= ,. Hence
\ ( x) =
1
:
(19)
Consequence
V ( x) = E
_
x x
T
_

T
E
_
x x
T
_
=
1
:
+
T
(20)
Result 5
We now examine S and prove it is an unbiased estimator for :
E(S) = (21)
10
Proof
S =
1
: 1
n

i=1
(x
i
x) (x
i
x)
T
=
1
: 1
_
n

i=1
x
i
x
T
i
: x x
T
_
noting that
1
:

n
i=1
x
i
x
T
=
1
:
x

n
i=1
x
T
i
= x x
T
.
From (15) and (20) we see that
E
_
x
i
x
T
i
_
= +
T
(\i)
E
_
x x
T
_
=
1
:
+
T
hence
E(S) =
:
: 1
_
_
+
T
_

_
1
:
+
T
__
E(S) = as required
Therefore S is an unbiased estimate of .
1.7 Linear transformations
Let x = (r
1
, ..., r
p
)
T
be a random jvector. It is often natural and useful to consider linear
combinations of the components of x such as for example j
1
= r
1
+ r
2
or j
2
= r
1
+ 2r
3
r
4
. In
general we consider a transformation from the j component vector x to a component vector y
( < j) given by
y = Ax +b (22)
where A ( j) and b ( 1) are constant matrices.
Suppose that E(x) = and \ (x) = .
The corresponding expressions for y are
E(y) = A +b (23)
\ (y) = AA
T
(24)
11
These follow from the linearity of the expectation operator
E(y) = E(Ax +b)
= AE(x) +E(b)
= A +b
=
y
say
and
\ (y) = E
_
yy
T

y
T
= E
_
(Ax +b) (Ax +b)
T
_
(A +b) (A +b)
T
= AE
_
xx
T
_
A
T
+AE(x) b
T
+bE
_
x
T
_
A
T
+bb
T
A
T
A
T
Ab
T
b
T
A
T
bb
T
= A
_
E
_
xx
T
_

A
T
= AA
T
as required
1.8 The Mahalanobis transformation
Given a jvariate random variable x with E(x) = and \ (x) = . The Mahalanobis trans-
formation results in standardized variables with mean=0 and variance=1. Suppose is positive
denite i.e. there is no exact linear dependence in x. Then the inverse
1
exists and it has a
"square root"

1
2
given by

1
2
= V

1
2
V
T
(25)
where = V V
T
is the eigenanalysis of . Here V is an orthogonal matrix
_
V
T
V = V V
T
= I
p
_
whose columns are the eigenvectors of and = diaq (`
1
, ..., `
p
) are thecorresponding eigenvalues.
Denition 5 The Mahalanobis transformation takes the form
z =

1
2
(x ) (26)
Using results (23) and (24) we can show that
E(z) = 0
\ (z) = I
p
12
Proof
E(z) = E
_

1
2
(x )
_
=

1
2
[E(x) j]
E(z) = 0 (27)
\ (z) =

1
2

1
2
\ (z) = I
p
(28)
1.8.1 Sample Mahalanobis transformation
Given a data matrix X
T
= (x
1
; :::; x
n
) , the sample Mahalanobis transformation z
i
=S

1
2
(x
i
x) for
i = 1, ..., : where S = S
x
is the sample covariance matrix
1
: 1
X
T
HX creates a transformed
data matrix Z
T
= (z
1
; :::; z
n
). The data matrices are related by
Z = HXS

1
2
(29)
where H is the centring matrix.
We may easily show (Ex.) that Z is centred (premultiply by H ) and that S
z
= I
p
.
1.8.2 Sample scaling transformation
A transformation of the data that scales each variable to have mean=0 and variance=1 but preserves
the correlation structure is given by y
i
= D
1
(x
i
x) for i = 1, ..., : where D = diaq (:
1
, ..., :
p
) .
Now
Y = HXD
1
(30)
Ex. Show that S
y
= R
x
.
1.8.3 A useful matrix identity
Let u; v be :vectors and form the : : outer product matrix A = uv
T
. Then
jI +uv
T
j = 1 +v
T
u (31)
Proof
First observe that A and I +A share a common set of eigenvectors since Av = `v =
(I +A) v = (1 + `) v: Moreover the eigenvalues of I +A are 1 +`
i
where `
i
are the eigenvalues
of A:
13
Now uv
T
is a rank one matrix, therefore has a single nonzero eigenvalue (see handout). Since
_
uv
T
_
u = u
_
v
T
u
_
= `u where ` = v
T
u, this ` is the nonzero eigenvalue of uv
T
with eigenvector
u:
Therefore the eigenvalues of I + uv
T
are 1 + `, 1, ...1, 1. The determinant of I + uv
T
is the
product of the eigenvalues, hence the result.
14

You might also like