0% found this document useful (0 votes)

74 views9 pages

Section 2 - Descriptive Multivariate Statistics

This document discusses descriptive multivariate statistics for analyzing a dataset with n observations and p variables. It defines measures of central tendency as the sample mean vector. Measures of dispersion are defined as the sample variance covariance matrix S, which contains the variances and covariances of the variables. The sample correlation matrix R contains the sample Pearson correlation coefficients between each pair of variables. These descriptive statistics provide important preliminary information for understanding the location, variation and correlation structure of multivariate data.

Uploaded by

Twinomugisha Morris

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

74 views9 pages

Section 2 - Descriptive Multivariate Statistics

Uploaded by

Twinomugisha Morris

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Descriptive Multivariate Statistics

By
Dr.Richard Tuyiragize
School of Statistics and Planning
Makerere University

February 22, 2022

1 Introduction
Before embarking on any statistical analysis of the nxp multivariate data set, a preliminary
data analysis should be done. This includes computation of multivariate descriptive statistics
such as measure of location, measure of spread, sample covariance, and sample correlation
coefficient. Consider the following data set;

Variable1 Variable2 · · · Variablej · · · Variablep

Item1 x11 x12 ··· x1j ··· x1p

Item2 x21 x22 ··· x2j ··· x2p

.. .. .. ... .. ... ..
. . . . .

Itemn xn1 xn2 ··· xnj ··· xnp

2 Measures of central tendency

Measures of central tendency help you find the middle, or the average, of a data set.

For any variable xj , we compute the sample mean such that;

n
1X
x̄j = xij
n i=1
The measure of central tendency for the multivariate data set is defined as the vector of
sample means for all the p variables

 
x̄1
x̄2 
x̄ =  .. 
 
.
x̄p

called the Sample mean vector.

3 Measures of dispersion (variation)

Taking any one of the variables xj , a usual measure of dispersion on this variable is the
sample variance denoted as Sjj which is the sample mean of the squares deviation of the n
observations.

STA3120 1 Email:[email protected]
n
1
(xij − x̄)2
P
Sjj = n−1
i=1
n
1
P
where x̄j = n
xij
i=1

n
1
P SS(xj )
Sjj = n−1
(xij − x̄j )(xij − x̄j ) = n−1
i=1

where SS = sum of squares of deviations

As an extension, taking any two variables in the multivariate data set, say xj and xk . A
measure of joint dispersion/variance is the sample co-variance demoted as Sjk . such that
n
1
P SCP (xj xk )
Sjk = n−1
(xij − x̄j )(xik − x̄k ) = n−1
i=1

where SCP = sum of cross products of deviations

The measure of dispersion for a multivariate data set is a square matrix of order p
 
V ar(x11 ) Cov(x1 x2 ) · · · · · · Cov(x1 xp )
 
 
Cov(x2 x1 ) V ar(x22 ) · · · · · · Cov(x1 xp )
 
S=
 

 .. .. .. ... .
.. 

 . . . 

 
Cov(xp x1 ) Cov(xp x2 ) · · · · · · Cov(xp xp )

Generally,  
S11 S12 · · · ··· S1p
 
 
S21 S22 · · · · · · S2p 
 
S=
 

 .. .. .. . . .. 
 . . . . . 
 
 
Sp1 Sp2 · · · · · · Spp
The sample variance covariance matrix can be expressed in vector terms;
n
1
P ′ 1
S= n−1
(xi − x̄)(xi − x̄) = n−1
.A
i=1

where;
xi is the ith row of the data set
x̄ is the sample mean vector
A is the sample sum of squares of cross products matrix (SSCP matrix)

STA3120 2 Email:[email protected]
The determinant of the sample variance covariance matrix summarizes the dispersion and is
called generalized sample variance of the multivariate data.

Properties of matrix S

1. Its diagonal entries are variances and the off diagonal entries are covariances

2. If the p variables are all pairwise jointly uncorrelated, the off diagonal entries will be
zero

i.e. S = diagp (Sjj ), for all j = 1, 2, .........,p

3. For any two variables, xj and xk . If Cov(xj xk ) = Cov(xk xj ), the the matrix is sym-
metric

4. Given a sample size N , for N > P ; then the matrix S is always positive definite

5. x̄ and S are independently distributed

6. x̄ and S are jointly sufficient statistics for µ and Σ respectively.

S
7. x̄ is an unbiased estimator of µ and N −1
is an unbiased estimator of Σ
S
8. x̄ and N
are the maximum likelihood estimates (MLE) of µ and Σ respectively.
Σ
9. x̄ is distributed as Np (µ, N

10. The distribution of S is known as the Wishart distribution, denoted as Wp (n, Σ).
Where n = N − 1. Hence S is called the Wishart matrix. The Wishart distribution
is a generalization of the χ2 distribution i.e. S ∼ σχ2N −1

4 Measures of correlation
The correlation coefficient, denoted by r, is a measure of the strength of the straight-line or
linear relationship between two continuous variables. For any two variables, xj and xk , in
the multivariate data set, the sample pearson correlation coefficient is given by;

Sample Cov(xj xk ) S
Sample PCC, r = = √ jk√
Sample SD(xj )Sample SD(xk ) Sjj Skk

Note that for j = k, the r = 1

STA3120 3 Email:[email protected]
As a measure of correlation for the nxp multivariate data set, we summarize the sample PCC
into a square matrix of order pxp, called the Sample correlation matrix, denoted as R;
 
1 r12 · · · · · · r1p
 
 
r21 1 · · · · · · r2p 
 
R=
 

 .. .. .. . . .. 
 . . . . . 
 
 
rp1 rp2 · · · · · · 1

Properties of R

1. If the p variables in the data are pairwise jointly uncorrelated then the off diagonal
entries will be zero. Hence R will take on the form;
R = Diagp (I), the identity matrix.

2. The matrix R is symmetric and always semi-positive definite,

′
i.e. X RX ≥ 0 ∀X ̸= 0

3. Given some sample covariance matrix S, then the corresponding sample correlation
matrix R is computed as;
R = D−1 SD−1 , where D = diagp ( Sjj )
p

Example For some bivariate data set, the sample covariance matrix has been found to be;

16 3
3 25
Compute the sample correlation coefficient, R

Solution
1
16 3 −1 −1 −1 0
D= ; R = D SD ; D = 4 1
3 25 3
1 1 5
0 16 3 0
R = D−1 SD−1 = 4 1 4
3 5 3 25 3 15
3

1 20
R= 3
20
1
Question Find the sample
 mean vector, covariance and correlation matrices for the following
4 1
data matrix. −1 3
3 5

STA3120 4 Email:[email protected]
5 Random Vectors and Matrices
A random vector is a vector whose components are random variables and a random matrix
is a matrix whose components are random.

A linear array of p ≥ 2 random variables x1 , x2 , x3 , · · · , · · · , xp in the form of a column or

row i.e
 
x1
 x2 
′
x =  ..  or x = x1 , x2 , · · · , xn
 
.
xp

is called a p-dimensional random vector of p components.

If we have a column vector of p random components and row vector of q random components,
then pxq is a rectangular matrix, U =uij for i=1, 2, 3, ....., p and j=1, 2, 3, ....., q; where pq
elements uij ′ s are random variables.

For this course, we shall deal with a column or row vector X with p variables.

We shall take the p variables in the multivariate data set as realizations of some p random
variables x1 , x2 , x3 , · · · , · · · , xp , whose simultaneous probabilistic or stochastic behaviour we
need to investigate; and we take the p random variables as forming the row vector;
′
x = x1 , x2 , · · · , xp

5.1 Expectation of a random vector or matrix

′
Let x be an nxp random vector, i.e, x = x1 , x2 , · · · , xp . Then the mean vector is;

     
x1 E(x1 ) µ1
x2  E(x2 ) µ2 
E(x) = E  ..  =  ..  =  ..  =µ
     
.  .  .
xp E(xp ) µp

E(x) is a vector of expectations of the random variables.

5.2 Variance Covariance matrix of a random vector

For a univariate random variable x, a common measure of dispersion is the population
variance; where,

STA3120 5 Email:[email protected]
σx2 = E(x − µ)2 for µ = E(x)

For two random variables x and y, a common measure of joint dispersion is the population
covariance,

σxy = E(x − µx )(y − µy )

The measure of dispersion for a p-variate random vector x is the population p-variate
variance-covariance matrix, denoted as Σ.

′
Σ = E(x − µ)(x − µ) where for µ = E(x)
 
x1 − µ 1
 x2 − µ 2 
Σ = E  ..  (x1 − µ1 )(x2 − µ2 ) · · · (xp − µp )
 
 . 
xp − µ p
 
(x1 − µ1 )2 (x1 − µ1 )(x2 − µ2 ) · · · · · · (x1 − µ1 )(xp − µp )
 
 
(x2 − µ2 )(x1 − µ1 )
 (x2 − µ2 )2 ··· · · · (x2 − µ1 )(x1 − µp )

=E
 

 .. .. .. .. .. 
 . . . . . 
 
 
2
(xp − µp )(x1 − µ1 ) (xp − µp )(x2 − µ2 ) · · · ··· (xp − µp )
 
V ar(x11 ) Cov(x1 x2 ) · · · · · · Cov(x1 xp )
 
 
Cov(x2 x1 ) V ar(x22 ) · · · · · · Cov(x2 xp )
 
=
 

 .. .. .. .. .. 
 . . . . . 
 
 
Cov(xp x1 ) Cov(xp x2 ) · · · · · · V ar(xpp )
 
σ11 σ12 · · · · · · σ1p
 
 
σ21 σ22 · · · · · · σ2p 
 
Σ=
 

 .. .. . . .. .. 
 . . . . . 
 
 
σp1 σp2 · · · · · · σpp

STA3120 6 Email:[email protected]
6 Linear Combinations of Random Vectors
1. Univariate case:

E(a1 x1 ) = a1 E(x1 ) = a1 µ1
V ar(a1 x1 ) = a21 V ar(x1 ) = a21 σ11

2. Bivariate case:

Cov(a1 x1 , a2 x2 ) = a1 a2 Cov(x1 x2 ) = a1 a2 σ12

X1 X1 ′
Given X = ; a1 x1 + a2 x2 = (a1 , a2 ) =aX
X2 X2

µ
E(aX) = E(a1 x1 + a2 x2 )= a1 E(x1 ) + a2 E(x2 ) = a1 µ1 +a2 µ2 = (a1 , a2 ) 1
µ2
′
=⇒ E(aX) = a µ

V ar(aX) = V ar(a1 x1 + a2 x2 ) = V ar(a1 x1 ) + V ar(a2 x2 ) + Cov(a1 x1 a2 x2 )

= a21 σ11 + a22 σ22 + a1 a2 σ12

σ11 σ12 a1
=(a1 , a2 )
σ21 σ22 a2

′
=⇒ V ar(aX) = a Σa

3. Multivariate case:

′
If X a p-dimensional random vector and a R, then the linear combination a X is a
one-dimensional random variable. That is,
 
X1
 X2 
 . 
X =  .. 
 
 . 
 .. 
Xp

 
X1
 X2 
 . 
 
Linear combination: a1 X1 + a2 X2 + .... + ap Xp = a1 , a2 , ... ap  ..  = a X
′

 . 
 .. 
Xp

STA3120 7 Email:[email protected]
′ ′ ′
E(a X) = a E(X) = a µ
′
V ar(aX) = a Σa
 
σ11 · · · σ1p
 .. .. 
Here Σ =  . . 
σp1 · · · σpp

4. Consider q linear combinations of p random variables.

p
P
Z1 = a11 X1 + a12 X2 + · · · + a1p Xp = a1j Xj = a1j X
j=1
Pp
Z2 = a21 X1 + a22 X2 + · · · + a2p Xp = a2j Xj = a2j X
j=1
..
.
p
P
Zq = aq1 X1 + aq2 X2 + · · · + aqp Xp = aqj Xj = aqj X
j=1

In matrix for:

 
  a11 a12 · · · ···
  a1p
Z1   Z1
Z 2    Z 2 
  a21 a22 · · · · · · a2p   
     
=  = ..  ⇐⇒ Z = AX
     
 ..  
 .   .. .. . . .
.   . 
. . ..   
   . . .
     
 
Zp Zp
ap1 p2 · · · · · · app

E(Z) = E(AX) = AE(X) = Aµ

′
Cov(Z) = Cov(AX) = AΣA

Example

Find the mean vector and covariance matrix for the linear combinations: Z1 = X1 − X2 and
Z2 = X1 + X2 .

1 −1 X1
Z= = AX
1 1 X2
1 −1 µ1 µ1 − µ2
E(Z) = AE(X) = Aµ = =
1 1 µ2 µ1 + µ2
′
Cov(Z)
=ACov(Z)A

1 −1 σ11 σ12 1 1 σ11 − 2σ12 + σ22 σ11 − σ22
= =
1 1 σ21 σ22 −1 1 σ11 − σ22 σ11 + 2σ12 + σ22

STA3120 8 Email:[email protected]

CR4235 50
No ratings yet
CR4235 50
1 page
Q1 Module 2 Physics 1 Motion
100% (3)
Q1 Module 2 Physics 1 Motion
22 pages
Chap 2 Data Types Fall 2014
No ratings yet
Chap 2 Data Types Fall 2014
78 pages
GS02-1093 - Introduction To Medical Physics I Basic Interactions Problem Set 3.2b Solutions
No ratings yet
GS02-1093 - Introduction To Medical Physics I Basic Interactions Problem Set 3.2b Solutions
6 pages
Dimensionality Reduction Using PCA (Principal Component Analysis)
No ratings yet
Dimensionality Reduction Using PCA (Principal Component Analysis)
13 pages
Shri Shitalnath Vidhan (VGM-453)
No ratings yet
Shri Shitalnath Vidhan (VGM-453)
28 pages
1.12.2024-BSC-301-CSBS-class note_2024-25
No ratings yet
1.12.2024-BSC-301-CSBS-class note_2024-25
58 pages
MVA Section1 2012
No ratings yet
MVA Section1 2012
14 pages
HASTS215_HSTS215 NOTES Chapter1_2
No ratings yet
HASTS215_HSTS215 NOTES Chapter1_2
24 pages
Sst414-Leson 1
No ratings yet
Sst414-Leson 1
6 pages
HASTS215 - HSTS215 NOTES Chapter3
No ratings yet
HASTS215 - HSTS215 NOTES Chapter3
7 pages
Random Vectors:: A Random Vector Is A Column Vector Whose Elements Are Random Variables
No ratings yet
Random Vectors:: A Random Vector Is A Column Vector Whose Elements Are Random Variables
7 pages
Random Vectors
No ratings yet
Random Vectors
9 pages
Chapter 1
No ratings yet
Chapter 1
13 pages
Chapter 1
No ratings yet
Chapter 1
65 pages
Symbiosis International (Deemed University) : Symbiosis School For Online and Digital Learning
No ratings yet
Symbiosis International (Deemed University) : Symbiosis School For Online and Digital Learning
84 pages
Data Reduction or Structural Simplification
No ratings yet
Data Reduction or Structural Simplification
44 pages
stat
No ratings yet
stat
53 pages
Multivariate Analysis (Slides 2)
No ratings yet
Multivariate Analysis (Slides 2)
25 pages
Lecture Note On PCA1
No ratings yet
Lecture Note On PCA1
26 pages
Summarystatistics.pptx_0
No ratings yet
Summarystatistics.pptx_0
17 pages
Covariance Matrix (W Krzanowski)
No ratings yet
Covariance Matrix (W Krzanowski)
5 pages
Pca Tutorial
No ratings yet
Pca Tutorial
27 pages
Face Recognition Using PCA
No ratings yet
Face Recognition Using PCA
23 pages
Characterising and Displaying Multivariate Data
No ratings yet
Characterising and Displaying Multivariate Data
15 pages
Handout 2 Multivariate
No ratings yet
Handout 2 Multivariate
10 pages
Slideset 2
No ratings yet
Slideset 2
63 pages
Prs l5
No ratings yet
Prs l5
24 pages
principal_components
No ratings yet
principal_components
22 pages
Simple Algebra
No ratings yet
Simple Algebra
5 pages
Note 1
No ratings yet
Note 1
5 pages
Lecture_note5
No ratings yet
Lecture_note5
53 pages
s m s t c Inverse Problems Lecture 4
No ratings yet
s m s t c Inverse Problems Lecture 4
47 pages
Tutorial On PCA
No ratings yet
Tutorial On PCA
27 pages
FAQ
No ratings yet
FAQ
7 pages
Sst304 Lesson 1
No ratings yet
Sst304 Lesson 1
8 pages
CPL Practical 1
No ratings yet
CPL Practical 1
14 pages
The Scalar Algebra of Means, Covariances, and Correlations
No ratings yet
The Scalar Algebra of Means, Covariances, and Correlations
21 pages
Stat 1
No ratings yet
Stat 1
6 pages
Chapter 1 Introduction and Review
No ratings yet
Chapter 1 Introduction and Review
43 pages
Activity in English III Court
No ratings yet
Activity in English III Court
3 pages
STAT456 Study Guide
No ratings yet
STAT456 Study Guide
31 pages
Maths PCA
No ratings yet
Maths PCA
28 pages
C0 English
No ratings yet
C0 English
42 pages
PSF_week8_samp.pdf-BIVARIATE NORMAL DISTRIBUTION
No ratings yet
PSF_week8_samp.pdf-BIVARIATE NORMAL DISTRIBUTION
25 pages
MLXX (Dimensionality Reduction) - 1
No ratings yet
MLXX (Dimensionality Reduction) - 1
70 pages
Section 3 - Multivariate Distribution Theory
No ratings yet
Section 3 - Multivariate Distribution Theory
13 pages
4-Lecture 04
No ratings yet
4-Lecture 04
34 pages
AIML Module - 4
No ratings yet
AIML Module - 4
25 pages
Unit 19
No ratings yet
Unit 19
16 pages
Quantitative Methods: Describing Data Numerically
No ratings yet
Quantitative Methods: Describing Data Numerically
32 pages
Chapter Four: Measures of Variation
No ratings yet
Chapter Four: Measures of Variation
26 pages
Week 6
No ratings yet
Week 6
3 pages
Multivariate Statistics Principal Component Analysis (PCA)
No ratings yet
Multivariate Statistics Principal Component Analysis (PCA)
41 pages
STAT3006 Lecture Notes 2021 Aug8 2021
No ratings yet
STAT3006 Lecture Notes 2021 Aug8 2021
110 pages
Week 3 - Notes
No ratings yet
Week 3 - Notes
3 pages
Week 3
No ratings yet
Week 3
3 pages
Week 3
No ratings yet
Week 3
3 pages
A NGUNYI NOTES SMA 2332 Probability and Statistics IV
No ratings yet
A NGUNYI NOTES SMA 2332 Probability and Statistics IV
64 pages
Factor Analysis
No ratings yet
Factor Analysis
57 pages
Multivariate Normal Distribution
No ratings yet
Multivariate Normal Distribution
19 pages
Data Analysis: Measures of Dispersion
No ratings yet
Data Analysis: Measures of Dispersion
6 pages
STAT3006: Tutorial 2
No ratings yet
STAT3006: Tutorial 2
3 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
From Everand
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
5/5 (1)
Super Plasticity
No ratings yet
Super Plasticity
42 pages
Bill Gates: Paul Allen
No ratings yet
Bill Gates: Paul Allen
4 pages
Brain Tumor Detection Using Modified Histogram Thresholding-Quadrant Approach
No ratings yet
Brain Tumor Detection Using Modified Histogram Thresholding-Quadrant Approach
5 pages
Finite Element Analysis in Pipeline Design Using ANSYS at INTECSEA
100% (2)
Finite Element Analysis in Pipeline Design Using ANSYS at INTECSEA
22 pages
Jones Wilkins Lee Parameters
No ratings yet
Jones Wilkins Lee Parameters
7 pages
MODULE 1 Differential Equations
No ratings yet
MODULE 1 Differential Equations
27 pages
The Cost of Transactionion Harold Demsetz
No ratings yet
The Cost of Transactionion Harold Demsetz
21 pages
Linear Dynamic Systems
No ratings yet
Linear Dynamic Systems
2 pages
Mathematical Model Analysis and LCL Filter Design of VSC
No ratings yet
Mathematical Model Analysis and LCL Filter Design of VSC
6 pages
Introduction To Instrumentation System: by Prof - Bikash Mohanty
No ratings yet
Introduction To Instrumentation System: by Prof - Bikash Mohanty
25 pages
Data Types Worksheet 3
No ratings yet
Data Types Worksheet 3
4 pages
Static Equib Worksheet
100% (1)
Static Equib Worksheet
7 pages
AE8601 Finite Elements Methods
No ratings yet
AE8601 Finite Elements Methods
4 pages
Problem 11
No ratings yet
Problem 11
7 pages
Excel Formulas and Functions Fo - Shipman, John
100% (1)
Excel Formulas and Functions Fo - Shipman, John
48 pages
Types of Annuit
No ratings yet
Types of Annuit
3 pages
637227449508725497DataMining (Chapter8)
No ratings yet
637227449508725497DataMining (Chapter8)
8 pages
MAT455 - Assignment May 2024[1] (1)
No ratings yet
MAT455 - Assignment May 2024[1] (1)
2 pages
DAA - Module 4 Notes 4TH SEM
No ratings yet
DAA - Module 4 Notes 4TH SEM
11 pages
Continuous Signals and Systems With MATLAB 3rd Edition Taan S. Elali Download PDF
100% (3)
Continuous Signals and Systems With MATLAB 3rd Edition Taan S. Elali Download PDF
62 pages
Structural Analysis II: Lecture Notes
No ratings yet
Structural Analysis II: Lecture Notes
2 pages
Pointers 1
100% (1)
Pointers 1
114 pages
Lesson Exemplar Digital Comics
No ratings yet
Lesson Exemplar Digital Comics
4 pages
DSA Assignment 1 Solution
No ratings yet
DSA Assignment 1 Solution
5 pages
chap_16
No ratings yet
chap_16
17 pages

Section 2 - Descriptive Multivariate Statistics

Uploaded by

Section 2 - Descriptive Multivariate Statistics

Uploaded by

Descriptive Multivariate Statistics

February 22, 2022

Variable1 Variable2 · · · Variablej · · · Variablep

Item2 x21 x22 ··· x2j ··· x2p

Itemn xn1 xn2 ··· xnj ··· xnp

2 Measures of central tendency

For any variable xj , we compute the sample mean such that;

called the Sample mean vector.

3 Measures of dispersion (variation)

where SS = sum of squares of deviations

where SCP = sum of cross products of deviations

i.e. S = diagp (Sjj ), for all j = 1, 2, .........,p

5. x̄ and S are independently distributed

6. x̄ and S are jointly sufficient statistics for µ and Σ respectively.

Note that for j = k, the r = 1

2. The matrix R is symmetric and always semi-positive definite,

A linear array of p ≥ 2 random variables x1 , x2 , x3 , · · · , · · · , xp in the form of a column or

is called a p-dimensional random vector of p components.

5.1 Expectation of a random vector or matrix

E(x) is a vector of expectations of the random variables.

5.2 Variance Covariance matrix of a random vector

σxy = E(x − µx )(y − µy )

Cov(a1 x1 , a2 x2 ) = a1 a2 Cov(x1 x2 ) = a1 a2 σ12

V ar(aX) = V ar(a1 x1 + a2 x2 ) = V ar(a1 x1 ) + V ar(a2 x2 ) + Cov(a1 x1 a2 x2 )

= a21 σ11 + a22 σ22 + a1 a2 σ12

4. Consider q linear combinations of p random variables.

E(Z) = E(AX) = AE(X) = Aµ

You might also like