0% found this document useful (0 votes)

9 views17 pages

Statistics 05.05

The document provides an overview of descriptive statistics, detailing qualitative and quantitative variables, including nominal, ordinal, discrete, and continuous types. It explains univariate analysis, frequency distributions, and cumulative frequencies, along with examples of absolute, relative, and cumulative frequencies. Additionally, it covers multivariate variables, correlation, and contingency tables to analyze relationships between multiple factors.

Uploaded by

lauraleainbox

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views17 pages

Statistics 05.05

Uploaded by

lauraleainbox

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Statistical Science, Autonomous University of Barcelona

Paper nº 102386.1 Descriptive Statistics

1.1.1 Qualitative Variables

Variables that describe non numerical characteristics or categories
represents qualities, attributes that cannot be measured.
1.1.1.1 Nominal, Categorical Variables
There is any natural order. These are variables whose values are
categories, such as the profession, the gender, or the studies of a
person. Each observation is usually associated, for convenience,
with a number, or letter according to some established criteria.
1.1.1.2 Ordinal, Scale Variables
There is an order even is it not measured numerically. These are
variables whose values correspond to the elements of a scale or
ranking, such as the degree of satisfaction with a given product,
or the position in a given classification.
1.1.2 Quantitative Variables
Represents measurable quantities and are expressed by numerical
values. These variables can be used to perform arithmetic algebra,
as addition, subtraction and averaging.
1.1.2.1 Discrete Variables
Represent countable and numerical values.
1.1.2.2 Continuous Variables
Measurable quantities that can take any value within a range.

1.2 Univariate Variables

Foundation of Descriptive Statistics, single variable analyzed to
describe its distribution, central tendency and dispersion.
1.2.1 Frequency Distribution of Qualitative Variables
In the case of quantitative discrete or qualitative variables, so, is
perceived at how many different values the variable has taken,
and then, it is counting how many times each these values appear
on the sample. Data set of qualitative, discrete variables provide
straightforwardness information of a concrete sample.
1.2.1.1 Absolute Frequency
Is the number of times a value if found on data, or sample given.
1.2.1.2 Absolute Cumulative Frequency
Represents the total number of observations that are less than or
equal to a particular value in the dataset. Adding up the absolute
frequencies of all values less than or equal, to the current value.
𝑎𝑏. 𝑓 = 𝑛𝑖; 𝑎𝑏. 𝑓 = 𝑛𝑥1 ;
𝑎𝑏. 𝑐. 𝑓 = 𝑁𝑖; (𝑛𝑖 + 𝑛𝑖1 , . . , 𝑛𝑖𝑛 ) = 𝑛𝑖𝑛 + 𝑥.
1.2.1.3 Relative Frequency
Represents the proportionate percentage of times a value is found.
1.2.1.4 Relative Cumulative Frequency
Proportion of total relative observations for a particular value.
𝑛𝑖 𝑁𝑖
𝑟𝑙. 𝑓 = 𝑓𝑖 = ; 𝑟𝑙. 𝑐. 𝑓 = 𝐹𝑖 =
𝑛 𝑛
Statistical Science, Autonomous University of Barcelona
Paper nº 102386.1 Descriptive Statistics

“e.g., Absolute Frequency

{𝑥1 2, 𝑥2 3, 𝑥3 4, 𝑥4 5, 𝑥5 8, 𝑥6 2, 𝑥7 4, 𝑥8 5, 𝑥9 7, 𝑥10 2};
𝑎𝑏. 𝑓 = 𝑛𝑖; 𝑎𝑏. 𝑓 = 𝑛𝑥1 ;
𝑥1 2, 𝑥6 2, 𝑥10 2 = 𝑛3;
𝑥2 3 = 𝑛1;
𝑥3 4, 𝑥7 4 = 𝑛2;
𝑥4 5, 𝑥8 5 = 𝑛2;
𝑥9 7 = 𝑛1;
𝑥5 8 = 𝑛1;
𝑛𝑖 {𝑥1 2, 𝑥2 3, 𝑥3 4, 𝑥4 5, 𝑥5 8, 𝑥6 2, 𝑥7 4, 𝑥8 5, 𝑥9 7, 𝑥10 2};
{𝑛𝑖1 3, 𝑛𝑖2 1, 𝑛𝑖3 2, 𝑛𝑖4 2, 𝑛𝑖5 1, 𝑛𝑖6 1};
“e.g., Absolute Cumulative Frequency
∑ 𝑛𝑖 {𝑥1 2, 𝑥2 3, 𝑥3 4, 𝑥4 5, 𝑥5 8, 𝑥6 2, 𝑥7 4, 𝑥8 5, 𝑥9 7, 𝑥10 2};
𝑎𝑏. 𝑐. 𝑓 = 𝑁𝑖; (𝑛𝑖 + 𝑛𝑖1 , . . , 𝑛𝑖𝑛 ) = 𝑛𝑖𝑛 + 𝑥.
{𝑛𝑖1 3, 𝑛𝑖2 1, 𝑛𝑖3 2, 𝑛𝑖4 2, 𝑛𝑖5 1, 𝑛𝑖6 1};
𝑁𝑖1 = 𝑛𝑖1 3;
𝑁𝑖2 = 𝑛𝑖1 3 + 𝑛𝑖2 1 = 4;
𝑁𝑖3 = 𝑁𝑖2 4 + 𝑛𝑖3 2 = 6;
𝑁𝑖4 = 𝑁𝑖3 6 + 𝑛𝑖4 2 = 8;
𝑁𝑖5 = 𝑁𝑖4 8 + 𝑛𝑖5 1 = 9;
𝑁𝑖6 = 𝑁𝑖5 9 + 𝑛𝑖6 1 = 10;
{𝑁𝑖1 3, 𝑁𝑖2 4, 𝑁𝑖3 6, 𝑁𝑖4 8, 𝑁𝑖5 9, 𝑁𝑖6 10};
“e.g., Relative Frequency
𝑛𝑖
𝑟𝑙. 𝑓 = 𝑓𝑖 = ;
𝑛
{𝑛𝑖1 3, 𝑛𝑖2 1, 𝑛𝑖3 2, 𝑛𝑖4 2, 𝑛𝑖5 1, 𝑛𝑖6 1};
𝑛𝑖1 3
𝑓𝑖1 = ; 𝑓𝑖1 = 0,3;
𝑛 10
𝑛𝑖2 1
𝑓𝑖2 = ; 𝑓𝑖2 = 0,1;
𝑛 10
𝑛𝑖3 2
𝑓𝑖3 = ; 𝑓𝑖3 = 0,2;
𝑛 10
𝑛𝑖4 2
𝑓𝑖4 = ; 𝑓𝑖4 = 0,2;
𝑛 10
𝑛𝑖5 1
𝑓𝑖5 = ; 𝑓𝑖5 = 0,1;
𝑛 10
𝑛𝑖6 1
𝑓𝑖6 = ; 𝑓𝑖6 = 0,1;
𝑛 10
𝑛𝑖 {𝑛𝑖1 3, 𝑛𝑖2 1, 𝑛𝑖3 2, 𝑛𝑖4 2, 𝑛𝑖5 1, 𝑛𝑖6 1};
𝑓𝑖 = =
𝑛 𝑛
{𝑓𝑖1 0,3, 𝑓𝑖2 0,1, 𝑓𝑖3 0,2, 𝑓𝑖4 0,2, 𝑓𝑖5 0,1, 𝑓𝑖6 0,1};
Statistical Science, Autonomous University of Barcelona
Paper nº 102386.1 Descriptive Statistics

“e.g., Relative Cumulative Frequency

𝑁𝑖
𝑟𝑙. 𝑐. 𝑓 = 𝐹𝑖 =
𝑛
∑ 𝑓𝑖 {𝑓𝑖1 0,3, 𝑓𝑖2 0,1, 𝑓𝑖3 0,2, 𝑓𝑖4 0,2, 𝑓𝑖5 0,1, 𝑓𝑖6 0,1};
𝑁𝑖 {𝑁𝑖1 3, 𝑁𝑖2 4, 𝑁𝑖3 6, 𝑁𝑖4 8, 𝑁𝑖5 9, 𝑁𝑖6 10}
𝐹𝑖 = = ;
𝑛 𝑛
𝐹𝑖1 = 𝑓𝑖1 0,3;
𝐹𝑖2 = 𝑓𝑖1 0,3 + 𝑓𝑖2 0,1 = 0,4;
𝐹𝑖3 = 𝐹𝑖2 0,4 + 𝑓𝑖3 0,2 = 0,6;
𝐹𝑖4 = 𝐹𝑖3 0,6 + 𝑓𝑖4 0,2 = 0,8;
𝐹𝑖5 = 𝐹𝑖4 0,8 + 𝑓𝑖5 0,1 = 0,9;
𝐹𝑖6 = 𝐹𝑖5 0,9 + 𝑓𝑖6 0,1 = 1,0;

1.2.2. Distribution of Quantitative Variables

1.2.2.1 Range of a Continuous Variable
Difference between the highest and lowest value in the data set.
𝑅 = 𝑣𝑘 − 𝑣1;
“e.g., quantitative variable as an income,
𝑅 = 𝑣𝑘 − 𝑣1; 𝑅 = 3.060,2 − 527,18 = 2.542,02.
1.2.2.2 Length of an Interval
Once the range is known, it is divided into a number of intervals.
The number of intervals depends on the dataset, as the analysis.
(𝑣𝑘 − 𝑣1) 𝑅
𝑙𝑐 = ; 𝑙𝑐 = ;
𝐼 𝐼
𝑅 = 𝑣𝑘 − 𝑣1; 𝑅 = 3.060,20 − 527,18 = 2.542,02.
(3.060,2 − 527,18) 2.542,02
𝑙𝑐 = ; 𝑙𝑐 = ;
𝐼=𝑥 𝐼=𝑥
2.542,02
𝑥 = 8; 𝑙𝑐 = ; 𝑙𝑐 = 317,75
8
1.2.2.3 Intervals of Continuous Variables
Constructed from the lowest value, adding the length iteratively.
𝑖𝑛 = (𝑣1, … , 𝑣1 + 𝑙𝑐);
𝑖𝑘 = (𝑣𝑘, … , 𝑣𝑘 + 𝑙𝑐);
𝑖 = [527,18, … , 527,18 + 317,75);
𝑖 = [527,18, 844,93);

1.2.2.4 Cm Mid, Class Mark Interval

Represents the average of the interval length.
(𝑣1, … , 𝑣1 + 𝑙𝑐) (𝑣1, … +, 𝑣1 + 𝑙𝑐)
𝑐𝑖𝑛 = ; 𝑐𝑖𝑛 =
2 2
(𝑣𝑘, … , 𝑣𝑘 + 𝑙𝑐) (𝑣𝑘, … +, 𝑣𝑘 + 𝑙𝑐)
𝑐𝑖𝑘 = ; 𝑐𝑖𝑘 =
2 2
𝑖1 [527,18 + 844,93)
𝑐𝑖1 = ; 𝑐𝑖1 = ; 𝑐𝑖1 = 686,05
2 2
Statistical Science, Autonomous University of Barcelona
Paper nº 102386.1 Descriptive Statistics

Properties of the Frequencies,

0 ≤ 𝑛𝑖 ≤ 𝑛; 0 ≤ 𝑁𝑖 ≤ 𝑛;
0 ≤ 𝑓𝑖 ≤ 1; 0 ≤ 𝐹𝑖 ≤ 1;
𝑘 𝑘

∑ 𝑛𝑖 = 𝑛 ; ∑ 𝑓𝑖 = 1
𝑖=1 𝑖=1

𝑛1 = 𝑁1 ≤ 𝑁2 ≤, … , ≤ 𝑁𝑘 = 𝑛;
𝑓1 = 𝐹1 ≤ 𝐹2 ≤, … , ≤ 𝐹𝑘 = 1.

1.3 Multivariate Variables

Set of multiple related variables analyzed simultaneously, in order
to understand their collective behavior, relationships and patterns,
enable deeper insights by analyzing multiple factors together.
1.3.1 Correlation Table
Denoted as both variables are quantitative.
Measures the linear relationship between numerical variables.
1.3.2 Contingency Table
Denoted as both variables are qualitative.
Analyze the relationship between categorical values.
Joint frequency represent distribution of bidimensional variables.
𝑣1 , 𝑘1 = 𝑛11
𝑓(𝑥, 𝑦) { 𝑣𝑖 , 𝑘𝑗 = 𝑛𝑖𝑗
𝑣1 , 𝑘2 = 𝑛12
Properties of Joint Frequencies,
𝑘 𝑙 𝑘 𝑙
𝑛𝑖𝑗
∑ ∑ 𝑛𝑖𝑗 = 𝑁; 𝑓𝑖𝑗 = ; ∑ ∑ 𝑓𝑖𝑗 = 1.
𝑛
𝑖=1 𝑗=1 𝑖=1 𝑗=1

1.3.3 Marginal Distribution

Distribution of values on bidimensional space,
𝑣1 , 𝑘1 = 𝑛11 𝑥, 𝑦 𝑘1 𝑘2
𝑓(𝑥, 𝑦) { 𝑣𝑖 , 𝑘𝑗 = 𝑛𝑖𝑗 ; 𝑣1 𝑛11 𝑛12
𝑣1 , 𝑘2 = 𝑛12 𝑣2 𝑛21 𝑛22
For each value contended, there exists a margin of distribution.
𝑣𝑛
𝑣1 = 𝑥1(𝑛11 + 𝑛12)
𝑣2 = 𝑥2(𝑛21 + 𝑛22)
𝑘𝑛
𝑘1 = (𝑛11 + 𝑛21)𝑦1
𝑘2 = (𝑛12 + 𝑛22)𝑦2
Properties of Marginal Distribution of Frequencies,
𝑙 𝑘

𝑛𝑖𝑥 = ∑ 𝑛𝑖𝑗; 𝑛𝑥𝑗 = ∑ 𝑛𝑖𝑗

𝑗= 1 𝑖= 1
Statistical Science, Autonomous University of Barcelona
Paper nº 102386.1 Descriptive Statistics

1.3.3.1 Marginal Relative Frequencies

𝑛𝑖𝑥 𝑛𝑥𝑗
𝑓𝑖𝑥 = ; 𝑓𝑥𝑗 =
𝑛 𝑛
Hence, marginal relative frequencies verify,
𝑘 𝑙

∑ 𝑛𝑖𝑥 = 𝑛; ∑ 𝑛𝑥𝑗 = 𝑛;
𝑖= 1 𝑗= 1
𝑘 𝑙

∑ 𝑓𝑖𝑥 = 1; ∑ 𝑓𝑥𝑗 = 1;
𝑖= 1 𝑗= 1

1.3.4 Conditional Distribution

There exists a condition for the distribution of frequencies.
𝑣1 , 𝑘1 = 𝑛11 𝑥, 𝑦 𝑘1 𝑘2
𝑓(𝑥, 𝑦) { 𝑣𝑖 , 𝑘𝑗 = 𝑛𝑖𝑗 ; 𝑣1 𝑛11 𝑛12
𝑣1 , 𝑘2 = 𝑛12 𝑣2 𝑛21 𝑛22
“e.g., given variables (𝑥, 𝑦) with range variables,
{𝑣1 , 𝑣2 , … , 𝑣𝑛 }; {𝑘1 , 𝑘2 , … , 𝑘𝑛 };
respectively, the relative frequency of the variable 𝑣𝑖 being a
conditional on 𝑦, considering a value 𝑘𝑛 , denoted as,
𝑥 𝑛𝑖𝑗 𝑥𝑜𝑏𝑠 = 𝑣𝑖 + 𝑦 = 𝑘𝑛
=𝑘𝑛
𝑓𝑖 𝑦 = =
𝑛𝑥𝑗 𝑦𝑜𝑏𝑠 = 𝑘𝑛
𝑥 𝑛𝑖𝑗 𝑥𝑜𝑏𝑠 = 𝑘𝑛 + 𝑦 = 𝑣𝑖
=𝑣𝑖
𝑓𝑗 𝑦 = =
𝑛𝑖𝑥 𝑦𝑜𝑏𝑠 = 𝑣𝑖

2.1 Measures of Position

Statistical values that divide data into equal parts, understanding
its distribution and relative standing of data points.
Locate where data points lie within a dataset, providing insights
into central tendency and distributional characteristics.
2.1.1 The Mean
Center of gravity of a distribution set,
balancing all observations equally. Mathematically, it minimizes
the sigma of squared deviations from all data points of the set
values, making it the optimal measure for variance minimization.
Properties of the Mean,
Scaling every data point by a constant “𝑎” scales by the “𝑎𝑥”.
𝑎𝑋 = 𝑎𝑋; {𝑎𝑥1 , 𝑎𝑥2 , 𝑎𝑥𝑛 }
“e.g., [2, 4, 6], 𝑥̅ = 4, supposing Scalar 𝑎 = 3,
[2, 4, 6]; 𝑥̅ = 4;
3: [6, 12, 18], ̅̅̅
3𝑥 = 12.
Translation, by adding a constant “b”, shifts the mean by “b”.
𝑎𝑋 + 𝑏 = 𝑎𝑋 + 𝑏
“e.g., [2, 4, 6]; 𝑥̅ = 4, supposing there is added 𝑏 = 5,
[(2 + 5), (4 + 5), (6 + 5)] = [7, 9, 11],
𝑥 + 5 = 4 + 5 = 9.
Statistical Science, Autonomous University of Barcelona
Paper nº 102386.1 Descriptive Statistics

The mean of the sum of two variables is the sum of their means.
𝑥+𝑦 =𝑥+𝑦
“e.g., given variables 𝑓(𝑥, 𝑦), 𝑥 + 𝑦 = 5 + 2 = 7.
𝑥 = [1, 2, 3]; 𝑥 = 2
𝑦 = [4, 5, 6]; 𝑦 = 5

The sum of deviations from the mean is equal zero, reflecting its
balance point property. Hence, the mean minimizes the sum of
squared deviations, as least squares property.
𝑛

∑(𝑥𝑖 − 𝑥) = 0;
𝑖=1

Single extreme value can drag the mean toward it,

“e.g., for data without outlier [2, 4, 6]; 𝑥̅ = 4
“e.g., for outlier data [10, 12, 100]; 𝑥̅ ≈ 40,7

Calculation Methods,
2.1.1.1 Raw Data, Exact Calculation of the Mean
𝑛
1 𝑛𝑖 · 𝑥𝑖
𝑥̅ = ∑ 𝑥𝑖; 𝑥̅ =
𝑛 𝑛
𝑖=1
“e.g., for a data set “x”
{𝑥1 2, 𝑥2 3, 𝑥3 4, 𝑥4 5, 𝑥5 8, 𝑥6 2, 𝑥7 4, 𝑥8 5, 𝑥9 7, 𝑥10 2} = 𝑛10;
𝑛
1 𝑛𝑖 · 𝑥𝑖
𝑥̅ = ∑ 𝑥𝑖; 𝑥̅ =
𝑛 𝑛
𝑖=1
1
{𝑥 2, 𝑥 3, 𝑥 4, 𝑥 5, 𝑥 8, 𝑥 2, 𝑥 4, 𝑥 5, 𝑥 7, 𝑥 2} = 4,2;
10 1 2 3 4 5 6 7 8 9 10
{𝑥1 2, 𝑥2 3, 𝑥3 4, 𝑥4 5, 𝑥5 8, 𝑥6 2, 𝑥7 4, 𝑥8 5, 𝑥9 7, 𝑥10 2}
= 4,2.
10

2.1.1.2 Distribution of Discrete Variables

𝑛 𝑛
1
𝑥̅ = ∑ 𝑛𝑖 · 𝑥𝑖 ; 𝑥̅ = ∑ 𝑓𝑖 · 𝑥𝑖
𝑛
𝑖=1 𝑖=1

“e.g., for a data set “x”, with an absolute frequency,

{𝑥1 2, 𝑥2 3, 𝑥3 4, 𝑥4 5, 𝑥5 8, 𝑥6 2, 𝑥7 4, 𝑥8 5, 𝑥9 7, 𝑥10 2} = 𝑛10;
{𝑛𝑖1 3, 𝑛𝑖2 1, 𝑛𝑖3 2, 𝑛𝑖4 2, 𝑛𝑖5 1, 𝑛𝑖6 1};
𝑛
1
𝑥̅ = ∑ 𝑛𝑖 · 𝑥𝑖 ;
𝑛
𝑖=1

1 {𝑥1 2, 𝑥2 3, 𝑥3 4, 𝑥4 5, 𝑥5 8, 𝑥6 2, 𝑥7 4, 𝑥8 5, 𝑥9 7, 𝑥10 2}
𝑥= ;
10 {𝑛𝑖1 3, 𝑛𝑖2 1, 𝑛𝑖3 2, 𝑛𝑖4 2, 𝑛𝑖5 1, 𝑛𝑖6 1}
{𝑥1 2, 𝑥2 3, 𝑥3 4, 𝑥4 5, 𝑥5 8, 𝑥6 2, 𝑥7 4, 𝑥8 5, 𝑥9 7, 𝑥10 2}
{𝑛𝑖1 3, 𝑛𝑖2 1, 𝑛𝑖3 2, 𝑛𝑖4 2, 𝑛𝑖5 1, 𝑛𝑖6 1}
= 4,2;
10
1
𝑥 = ∑ {𝑛𝑖1 3, 𝑛𝑖2 1, 𝑛𝑖3 2, 𝑛𝑖4 2, 𝑛𝑖5 1, 𝑛𝑖6 1} = 4,2;
10
Statistical Science, Autonomous University of Barcelona
Paper nº 102386.1 Descriptive Statistics

“e.g., for a data set “x”, with a relative frequency,

{𝑥1 2, 𝑥2 3, 𝑥3 4, 𝑥4 5, 𝑥5 8, 𝑥6 2, 𝑥7 4, 𝑥8 5, 𝑥9 7, 𝑥10 2} = 𝑛10.
{𝑓𝑖1 0,3, 𝑓𝑖2 0,1, 𝑓𝑖3 0,2, 𝑓𝑖4 0,2, 𝑓𝑖5 0,1, 𝑓𝑖6 0,1}
𝑛

𝑥̅ = ∑ 𝑓𝑖 · 𝑥𝑖
𝑖=1

{𝑥1 2, 𝑥2 3, 𝑥3 4, 𝑥4 5, 𝑥5 8, 𝑥6 2, 𝑥7 4, 𝑥8 5, 𝑥9 7, 𝑥10 2}
𝑥= ;
{𝑓𝑖1 0,3, 𝑓𝑖2 0,1, 𝑓𝑖3 0,2, 𝑓𝑖4 0,2, 𝑓𝑖5 0,1, 𝑓𝑖6 0,1}
𝑥 = ∑ 𝑓𝑖 {𝑓𝑖1 0,3, 𝑓𝑖2 0,1, 𝑓𝑖3 0,2, 𝑓𝑖4 0,2, 𝑓𝑖5 0,1, 𝑓𝑖6 0,1} = 4,2;

2.1.1.2 Distribution of Continuous Variables

In continuous variables, data arranged in class intervals Cm Mid,
the mean is considered an approximation value.
𝑛 𝑛
1
𝑥̅ ≈ ∑ 𝑛𝑖 · 𝑐𝑖 ; 𝑥̅ ≈ ∑ 𝑓𝑖 · 𝑐𝑖
𝑛
𝑖=1 𝑖=1

“e.g., for a distribution on intervals “x”, and a given Cm Mid,

1+2
[1, 2); 𝑐𝑖1 = ; 𝑐𝑖1 = 1,5 ; 𝑛𝑖 = 4
2
2+3
[2, 3); 𝑐𝑖2 = ; 𝑐𝑖2 = 2,5 ; 𝑛𝑖 = 3
2
3+4
[3, 4); 𝑐𝑖3 = ; 𝑐𝑖3 = 3,5 ; 𝑛𝑖 = 2
2
4+5
[4, 5); 𝑐𝑖4 = ; 𝑐𝑖4 = 4,5 ; 𝑛𝑖 = 1
2
{𝑐𝑖1 1.5, 𝑐𝑖2 2.5, 𝑐𝑖3 3.5, 𝑐𝑖4 4.5}
{𝑛𝑖1 4, 𝑛𝑖2 3, 𝑛𝑖3 2, 𝑛𝑖4 1}
𝑛
1
𝑥̅ ≈ ∑ 𝑛𝑖 · 𝑐𝑖
𝑛
𝑖=1

1 {𝑐𝑖1 1.5, 𝑐𝑖2 2.5, 𝑐𝑖3 3.5, 𝑐𝑖4 4.5}

𝑥≈ ;
10 {𝑛𝑖1 4, 𝑛𝑖2 3, 𝑛𝑖3 2, 𝑛𝑖4 1}
1
𝑥 ≈ ∑ {(1,5 · 4), (2,5 · 3), (3,5 · 2), (4,5 · 1)};
10
{𝑐𝑖1 1.5, 𝑐𝑖2 2.5, 𝑐𝑖3 3.5, 𝑐𝑖4 4.5}
{𝑛𝑖1 4, 𝑛𝑖2 3, 𝑛𝑖3 2, 𝑛𝑖4 1} 6 + 7,5 + 7 + 4,5
; = 2,5
10 10

Implications of the Mean,

The sample mean “𝑥̅ ” estimates a population mean “𝜇”,
“e.g., 𝑥̅ = 4,2, does not imply a population 𝜇 = 4,2.
Mean is meaningless for nominal categories. Hence, it represents
a central tendency of data, as it is sensitive to outliers.
Statistical Science, Autonomous University of Barcelona
Paper nº 102386.1 Descriptive Statistics

2.1.2.1 The Median

Represents a value that is the center with respect all observations.
1 1
≤ 𝑀, 𝑥̅ 0,5 ≤
2 2
Properties of the Median,
Is robust to outliers, unlike the mean, the median is unaffected by
extreme values. As there exists an ordinal and interval data.

Calculation Method,
2.1.2.1 Odd Sample Size
The data has to be algebraic ordered,
“e.g., {3,1,4,2,5,1}; {1,1,2,3,3,4,5}; 𝑀 = 3, 4𝑡ℎ 𝑥𝑖.

2.1.2.2 Even Sample Size

Analogously, the median is found with the average,
𝑥𝑖 + 𝑥𝑖+1
𝑀= ;
2
“e.g., {2,3,4,5,8,2,4,5,7,2}; {2,2,2,3,4,4,5,5,7,8};
4+4
𝑀= ; 𝑀 = 4, 5,6𝑡ℎ 𝑥𝑖.
2
If both mid values differ, the median is still their average.
4+5
𝑀= ; 𝑀 = 4,5.
2

2.1.2.3 Median for Grouped Data

When a raw data is unshown, is used the cumulative frequency,
𝑛
𝑁𝑖 > ;
2
“e.g., 𝑥{2,3,4,5}; 𝑛𝑖 = {2,2,2,3,4,4,5,5}8𝑛;
𝑁𝑖1 = 3 = 𝑛𝑖; 𝑁𝑖2 = 3 + 1 = 4;
𝑁𝑖3 = 4 + 2 = 6; 𝑁𝑖4 = 6 + 2 = 8;
𝑛=8
𝑁𝑖, 𝑁𝑖4 = 8 > ; 𝑁𝑖4 8 > 4;
2
The edge case, occurs when the cumulative frequency equals,
𝑛 𝑛
𝑁𝑖 = ; ≤ 𝑥𝑖;
2 2
Because of the median isn’t just the value in that row, instead, the
median is the average of that value and the next greater.
2.1.2.4 Median for Continuous Data
Applied for continuous variables, “L” being the lower bound of
Cm Mid, as “𝑤” is the width of the Cm Mid.
𝑛
− 𝐹𝑖
𝑀 = 𝐿 + (2 )𝑤
𝑓𝑖

“e.g., for a distribution on intervals “x”, and a given Cm Mid,

1+2
[1, 2); 𝑐𝑖1 = ; 𝑐𝑖1 = 1,5 ; 𝑛𝑖 = 4
2
Statistical Science, Autonomous University of Barcelona
Paper nº 102386.1 Descriptive Statistics

2+3
[2, 3); 𝑐𝑖2 = ; 𝑐𝑖2 = 2,5 ; 𝑛𝑖 = 3
2
3+4
[3, 4); 𝑐𝑖3 = ; 𝑐𝑖3 = 3,5 ; 𝑛𝑖 = 2
2
{𝑐𝑖1 1.5, 𝑐𝑖2 2.5, 𝑐𝑖3 3.5}
{𝑛𝑖1 4, 𝑛𝑖2 3, 𝑛𝑖3 2}
𝑛=9 4,5 − 3,5
= 4,5; 𝑀 = (𝐿 = 3) + ( ) 1;
2 1
𝑀 = 3 + (4,5 − 3,5); 𝑀 = 3 + 1; 𝑀 = 4.

2.1.3.1 Quartiles
The quartiles divide a data set in 4 parts,
𝑄1 ≥ 0,25; 𝑄2 ≥ 0,5; 𝑄3 ≥ 0,75
As the vale of 𝑄1 , 𝑄3 represents a proportion of 0,25,
mid quartile 𝑄2 is equal to the median.
𝑄2 ≥ 0,5 = 𝑀;
Calculation Methods,
2.1.3.1 Linear Interpretation
𝑄1 = 0,25(𝑛 + 1);
𝑄3 = 0,75(𝑛 + 1);
“e.g., {1,2,3,4,5,6,7,8,9}𝑛9
𝑄1 = 0,25(𝑛9 + 1) = 2,5;
𝑄3 = 0,75(𝑛9 + 1) = 7,5;
2.1.3.2 Quantiles
Represents a special case of quartiles,
which divide data into any number of equal parts.
Percentiles are divided in hundred, representing the percentage,
𝑛
𝑘 100 − 𝐹
𝑃𝑘 = 𝐿 + ( ) 𝑤;
𝑓

“e.g., {12,7,3,8,14,6,9,10}; {3,6,7,8,9,10,12,14}𝑛8

8+9
𝑀= ; 𝑀 = 8, 5
2
𝑄1 = 0,25(𝑛8 + 1) = 2,25;
2𝑛𝑑 = 6 + 0,25𝑄1 = 6,25
𝑄3 = 0,75(𝑛8 + 1) = 6,75;
6𝑡ℎ = 10 + 0,75𝑄3 = 10,75

2.2 .2.2Measures of Dispersion

Measures that inform about the particularities of the observations
in the data set. Such differences are measured with respect mean,
as it is the central tendency. Approximate the degree of variability
of the values that the variable takes.
Statistical Science, Autonomous University of Barcelona
Paper nº 102386.1 Descriptive Statistics

2.2.1 Mean Quadratic Error

Let value “v” assume is a representative of a whole sample as a
central value, as this would imply some error,
𝑥𝑛 = 𝑥𝑖 ; 𝑥̅ = 𝑣; 𝑒𝑛 = (𝑥𝑛 − 𝑣);
Hence, sigma of all values would be the total error,
𝑛 𝑛

𝑡. 𝑒(𝑣) = ∑ 𝑒𝑖 = ∑(𝑥𝑖 − 𝑣);

𝑖= 1 𝑖=1

Implications of the Mean Quadratic Error,

𝑛 𝑛

𝑡. 𝑒(𝑣) = ∑ 𝑒𝑖 = ∑(𝑥𝑖 − 𝑣) > 0; 𝑥𝑖 > 𝑒𝑖.

𝑖= 1 𝑖=1
𝑛 𝑛

𝑡. 𝑒(𝑣) = ∑ 𝑒𝑖 = ∑(𝑥𝑖 − 𝑣) < 0; 𝑥𝑖 < 𝑒𝑖.

𝑖= 1 𝑖=1
(𝑥𝑖 < 𝑒𝑖) + (𝑥𝑖 > 𝑒𝑖) = 0;
In order to do not cancel values with positive and negative forms,
the errors must be squared to keep them positive,
𝑛 𝑛

𝑀𝑄𝐸(𝑣) = ∑ 𝑒𝑖 = ∑(𝑥𝑖 − 𝑣)2 ;

𝑖= 1 𝑖=1

“e.g., {2, 4, 6}; 𝑥̅ 𝑣 = 4;

2 + 4 + 6 12
𝑥̅ = = = 4,
3 3
(2 − 4)2 = (−2)2 = 4;
(4 − 4)2 = (+0)2 = 0;
(6 − 4)2 = (+2)2 = 4;
𝑛 𝑛

𝑡. 𝑒(𝑣) = ∑ 𝑒𝑖 = ∑(𝑥𝑖 − 𝑣)2 ; 𝑡. 𝑒(𝑣) = 8.

𝑖= 1 𝑖=1
On average, each point Mean Quadratic Error, away from mean.
8
𝑀𝑄𝐸(𝑣) = ≈ 2.67;
3
“e.g., different reference values, {1,3,5,7}
From a value not corresponded to the mean 𝑣𝑖 = 3
(1 − 3)2 = (+2)2 = 4; (3 − 3)2 = (+0)2 = 0;
(5 − 3)2 = (+2)2 = 4; (7 − 3)2 = (+4)2 = 16;
𝑛
24
𝑡. 𝑒(𝑣) ∑(𝑥𝑖 − 𝑣)2 ; 𝑡. 𝑒(𝑣) = 24 ; 𝑀𝑄𝐸(𝑣) = = 6;
4
𝑖=1

“e.g., {1,3,5,7}; 𝑥̅ 𝑣 = 4;
1 + 3 + 5 + 7 16
𝑥̅ = = = 4,
4 4
(1 − 4)2 = (+3)2 = 9; (3 − 4)2 = (+1)2 = 1;
(5 − 4)2 = (+1)2 = 1; (7 − 4)2 = (+3)2 = 9;
Statistical Science, Autonomous University of Barcelona
Paper nº 102386.1 Descriptive Statistics
𝑛 𝑛

𝑡. 𝑒(𝑣) = ∑ 𝑒𝑖 = ∑(𝑥𝑖 − 𝑣)2 ; 𝑡. 𝑒(𝑣) = 20 ;

𝑖= 1 𝑖=1
𝑛
20
𝑀𝑄𝐸(𝑣) = ∑(𝑥𝑖 − 𝑣)2 = = 5;
4
𝑖=1
The Mean Quadratic Error is greater when the mean does not be
referenced as a value for the operation, that suppose that mean
value minimizes the value, reason why used in the Variance.
2.2.2 The Variance
Variance measures how far data points spread out from their mean
value. It's the average of squared differences from the mean.
𝑛

𝑆 = 𝑀𝑄𝐸(𝑣) = ∑(𝑥𝑖 − (𝑣 = 𝑥̅ ))2

𝑖=1
Properties of the Variance,
Is the minimizer of the Mean Quadratic Error,
𝑀𝑄𝐸(𝑣) ≥ 𝑀𝑄𝐸(𝑥̅ )
Variance is in squared units, so often is applied standard deviation
for the interpretation in a real scale,
𝑛
√𝑆 2 = 𝑀𝑄𝐸(𝑣) = ∑ √(𝑥𝑖 − (𝑣 = 𝑥̅ ))2
𝑖=1

For independent variables, exists the additivity property,

𝑣(𝑥 + 𝑦) = 𝑣(𝑥) + 𝑣(𝑦)

2.2.2.1 Population Variance

Uses the entire population data with “n” observations.
𝑛
1
𝑉𝜎 = ∑(𝑥𝑖 − 𝑥̅ )2 ;
2
𝑛
𝑖=1

2.2.2.2 Sample Variance

Uses sample data with denominator “n −1” = Bessel's correction,
correct bias when estimating population variance from a sample.
𝑛
1 1
𝑆2 = ∑(𝑥𝑖 − 𝑥̅ )2 ; 𝑆 2 = (∑ 𝑥 2 𝑖) − 𝑥̅ 2
𝑛−1 𝑛
𝑖=1

Bessel Correction is just applied as a variable is unbiased,

4+0+4 4+0+4 8
𝑉𝜎 2 = ≈ 2,67; 𝑆 2 = = =4
3 3−1 2
{3,5,7}; 𝑥̅ 𝑣 = 5; (3 − 5)2 = (+2)2 = 4; (5 − 5)2 = (+0)2
= 0; (7 − 5)2 = (−2)2 = 4;
2.2.2.3 Variance of Absolute Frequencies
𝑛
1
𝑆 = ∑ 𝑛𝑖(𝑥𝑖 − 𝑥̅ )2
2
𝑛
𝑖=1

“e.g., {𝑥1 2, 𝑥2 4, 𝑥3 6}; {𝑛1 3, 𝑛2 2, 𝑛3 1}

Statistical Science, Autonomous University of Barcelona
Paper nº 102386.1 Descriptive Statistics

(𝑥1 2 · 𝑛1 3) + (𝑥2 4 · 𝑛2 2) + (𝑥3 6 · 𝑛3 1) 20

𝑥̅ = = ≈ 3,33;
𝑛𝑖 = 6 6
𝑛1 3(𝑥1 2 − 3,33)2 = 3(1,77) ≈ 5,31,
𝑛2 2(𝑥2 4 − 3,33)2 = 2(0,45) ≈ 0,90,
𝑛3 1(𝑥3 6 − 3,33)2 = 1(7,13) ≈ 7,13,
𝑛
1 5,31 + 0,9 + 7,13
𝑆 = ∑ 𝑛𝑖(𝑥𝑖 − 𝑥̅ )2 ; 𝑆 2 =
2
≈ 2,22.
𝑛 6
𝑖=1

2.2.2.4 Variance of Relative Frequencies

𝑛

𝑆 = ∑ 𝑓𝑖(𝑥𝑖 − 𝑥̅ )2
2

𝑖=1

“e.g., {𝑥1 70, 𝑥2 80, 𝑥3 90}; {𝑓1 0,2, 𝑓2 0,5, 𝑓3 0,3}

𝑥̅ = ∑(𝑣𝑖 · 𝑓𝑖) ;
(𝑥1 70 · 𝑓1 0,2) + (𝑥2 80 · 𝑓2 0,5) + (𝑥3 90 · 𝑓3 0,3) = 81,
𝑛

𝑆 = ∑ 𝑓𝑖(𝑥𝑖 − 𝑥̅ )2 ; 𝑆 2 = 24,2 + 0,5 + 24,3 = 49

𝑖=1

2.2.2.5 Variance for Continuous Variables

𝑛𝑖(𝑚𝑖 − 𝑥̅ )2
𝑆2 =
𝑛
“e.g., for given data grouped on interval sets, with corresponding
absolute frequency of the income sets,
100 + 200
𝑖1 [$100, $200); 𝑐𝑖1 = = 150 ; 𝑛𝑖 = 10.
2
200 + 300
𝑖2 [$200, $300); 𝑐𝑖2 = = 250 ; 𝑛𝑖 = 20.
2
(150 · 10) + (250 · 20) 6500
𝑥̅ = = = 216,667
𝑁𝑖 = 30 30

𝑛1 10(𝑐𝑖1 150 − 216,667)2 = 10(4.444,42) ≈ 44.444,2,

𝑛2 20(𝑐𝑖2 250 − 216,667)2 = 20(1.111,08) ≈ 22.221,6,
44.444,2 + 22.221,6
𝑆2 = = 2.222,193.
30
2.2.3.1 Standard Deviation
It is represented by the square root of the variance, bringing units
back to the original scale, in order to study properly a variable.
𝑛
1 1
√𝑆 2 = ∑√ (𝑥𝑖 − 𝑥̅ )2 ; 𝜎 = √ ∑(𝑥𝑖 − 𝜇)2
𝑛−1 𝑛
𝑖=1

44.444,2 + 22.221,6
𝑆2 = = 2.222,193.
30
√𝜎 2 = √2.222,193 = 47,14
Statistical Science, Autonomous University of Barcelona
Paper nº 102386.1 Descriptive Statistics

“e.g., assuming a dispersion is within the sample,

44.444,2 + 22.221,6 66.666,8
𝑆2 = = = 2.298,82.
30 − 1 29
√𝑆 2 = √2.298,82 = 47,92

2.2.4.1 Coefficient of Variation

It normalizes dispersion for fair comparisons,
𝑆 = √𝑆 2
𝑐. 𝑣. = ;
𝑥̅
𝑐. 𝑣 < 1; There exists a low relative variability.
𝑐. 𝑣 > 1; There exists a high relative variability.

“e.g., from a continuous variable,

√𝑆 2 = √2.222,193 = 47,14 47,14
𝑐. 𝑣. = ; 𝑐. 𝑣. = = 0,2175.
216,667 216,667
𝑐. 𝑣 < 1; There exists a low relative variability.
Supposing within the range it exists a low variability, would be
assumed that the value of the Standard Deviation is lower, rather
than the value of the mean or the average of a sample.
“e.g., hence, analogously,
√𝑆 2 = 300 300
𝑐. 𝑣. = ; 𝑐. 𝑣. = = 1,384.
216,667 216,667
𝑐. 𝑣 > 1; There exists a high relative variability.

2.2.5.1 Interquartile Range

It represents the distance between the first and third quartile,
𝑄 = 0,25(𝑛 + 1)
𝐼. 𝑅 { 1 ; 𝐼. 𝑅 = 𝑄3 − 𝑄1
𝑄3 = 0,75(𝑛 + 1)
“e.g., {1,2,3,4,5,6,7,8,9}𝑛9
𝑄1 = 0,25(𝑛9 + 1) = 2,5;
𝑄3 = 0,75(𝑛9 + 1) = 7,5;
𝐼. 𝑅 = 𝑄3 − 𝑄1 ; 𝐼. 𝑅 = 7,5 − 2,5 = 5.

2.2.6.1 Covariance
Covariance qualifies how many variables vary together,
𝑛
1
𝑆(𝑥, 𝑦) = ∑ √(𝑥𝑖 − 𝑥̅ )2 (𝑦𝑖 − 𝑦̅)2 ;
𝑛
𝑖= 1
𝑛
1
𝑆(𝑥, 𝑦) = ∑(𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦̅);
𝑛
𝑖= 1

The frequency distribution is sensitive to the scale of variables, as

it is unbounded, meaning that there is a difficult interpretation.
−∞ < 𝑆(𝑥, 𝑦) < +∞;
Statistical Science, Autonomous University of Barcelona
Paper nº 102386.1 Descriptive Statistics

“e.g., for a sample,

𝑥 = {1,2,3}; 𝑦 = {2,4,6}, 𝑆(𝑥, 𝑦) = 2,67.
Frequency Distribution of the Covariance,
𝑘 𝑙
1
𝑆(𝑥, 𝑦) = ∑ ∑ 𝑓𝑖𝑗 (𝑣𝑖 − 𝑥̅ )(𝑤𝑗 − 𝑦̅);
𝑛
𝑖= 1 𝑗=1

2.2.7.1 Pearson Correlation Coefficient

Standardizes Covariance to a unitless measure,
𝑆(𝑥, 𝑦)
𝑟= ; 𝐷𝑟 = [−1, 1];
𝑆(𝑥)𝑆(𝑦)
If 𝑟 = 0, there is any linear relation.
“e.g., If 𝑟 = +1, there exists a positive linear relation between.
“e.g., If 𝑟 = −1, there exists a negative linear relation.

“e.g., for a sample,

𝑥 = {1,2,3}; 𝑦 = {2,4,6}, 𝑆(𝑥, 𝑦) = 2,67.
𝑆(𝑥) = 1; 𝑆(𝑦) = 2,
2,67
𝑟= = 1.
1·2
2.2.7.1 Properties of the Correlation Coefficient
There exists symmetry if,
𝑆(𝑥, 𝑦) 𝑆(𝑦, 𝑥)
𝑟(𝑥, 𝑦) = = 𝑟(𝑦, 𝑥) = ;
𝑆(𝑥)𝑆(𝑦) 𝑆(𝑦)𝑆(𝑥)
Correlation Coefficient remains unchanged if one of the variables
is linearly transformed, as well, if there is no correlation there is
no causation of the variables.

2.2 Measures of Shape

Measures of shape describe the distribution pattern of data points
relative to the mean, revealing asymmetry and tail behavior.
2.2.2 The Mode
It appears most frequently in a dataset. Identifies popular values
in categorical data, understanding peaks in distributions.
2.3.1.1 Mode of Discrete Variables
The absolute mode is the single most frequent value,
“e.g., {2,2,2,3,4}𝑛5, 𝑚 = 2.
Multimodal, multiple values tie for highest frequency,
e.g., {2,2,2,3,3,3,4}𝑛7, 𝑚1 = 2 + 𝑚2 = 3.
Are values being equally frequent, hence there is no mode,
“e.g., {2,3,4}𝑛3 ≠ 𝑚.
2.3.1.2 Modal Class for Continuous Variables
Is it represented by the range interval with the higher frequency.
Statistical Science, Autonomous University of Barcelona
Paper nº 102386.1 Descriptive Statistics

2.3.1 Skewness
Qualifies how lopsided the data is relative to the mean, measuring
an asymmetry of a concrete data set.
1 𝑥𝑖 − 𝑥̅ 3
𝑎. 𝑐 = ∑ ( ) ;
𝑛 𝑆
2.3.1.1 Symmetric Skewness
1 𝑥𝑖 − 𝑥̅ 3
𝑎. 𝑐 = ∑ ( ) = 0;
𝑛 𝑆
𝑥̅ = 𝑀 = 𝑚;
{2,3,4,5,6}
{2,3,4,5,6}𝑛5; 𝑥̅ = = 4;
5
(2 − 4)2 + (3 − 4)2 + (4 − 4)2 + (5 − 4)2 + (6 − 4)2
{4 + 1 + 0 + 1 + 4}
𝑆=√ ; 𝑆 = √2 ≈ 1.41
5
1 𝑥𝑖 − 4 3 1 2 − 4 3 1 3 − 4 3
𝑎. 𝑐 =∑( ) = ( ) + ( )
5 1.41 5 1.41 5 1.41
1 4−4 3 1 5−4 3 1 6−4 3 0
+ ( ) + ( ) + ( ) = = 0.
5 1.41 5 1.41 5 1.41 5
2.3.1.2 Positive Right –Skewed
1 𝑥𝑖 − 𝑥̅ 3
𝑎. 𝑐 = ∑ ( ) > 0 ; 𝑥̅ > 𝑀;
𝑛 𝑆
{35,40,45,50,55,60,65,70,150,300}𝑛10;
{35,40,45,50,55,60,65,70,150,300} 870
𝑥̅ = = = 87;
10 10
{55, 60} 115
𝑀= = = 57,5
2 2
2.3.2.3 Negative Left –Skewed
1 𝑥𝑖 − 𝑥̅ 3
𝑎. 𝑐 = ∑ ( ) < 0 ; 𝑥̅ < 𝑀;
𝑛 𝑆
{1,6,7,8,9}
{1,6,7,8,9}𝑛5; 𝑥̅ = = 6,2; 𝑀 = 7.
5
Statistical Science, Autonomous University of Barcelona
Paper nº 102386.1 Descriptive Statistics

2.3.2 Kurtosis Coefficient

Kurtosis quantifies the tailedness and peakedness of a probability
distribution relative to a normal distribution. It measures how
much data clusters in the tails versus the center.
4
1 ∑(𝑥𝑖 − 𝑥̅ )4 1 ∑(𝑥𝑖 − 𝑥̅ )
𝑘. 𝑐 = ( ) ; 𝑘. 𝑐 = ( ) ;
𝑛 𝑆4 𝑛 𝑆
Adjusts for the baseline kurtosis of a normal distribution,
𝑒. 𝑐 = 𝑘. 𝑐 − 3;
2.3.2.1 Mesokurtic
Mesokurtic is the normal distribution of the shape,
(𝑒. 𝑐 = 𝑘. 𝑐 − 3) = 0;
2.3.2.2 Leptokurtic
There exists a sharp peak and heavy tails in the distribution. High
kurtosis, have more data in tails, as there is more risk of outliers.
(𝑒. 𝑐 = 𝑘. 𝑐 − 3) > 0;
2.3.2.3 Platykurtic
Have a flat peak, light tails and wider shoulders, being a uniform
distribution, data is evenly distributed, with fewer extremes.
(𝑒. 𝑐 = 𝑘. 𝑐 − 3) < 0;

3.1 Mean of Linear Combination

For a variable “x” defined as,
𝑥 = 𝑎1𝑥1 + 𝑎2𝑥2;
The mean of the variable “x”,
𝑥̅ = 𝑎1𝑥̅ 1 + 𝑎2𝑥̅ 2;
Mean of linear combination is the same linear combination of the
means. This holds, regardless of dependence between variables,
𝑥̅ = (𝑎1 = 100)(𝑥̅ 1 = 20) + (𝑎2 = 50)(𝑥̅ 2 = 50) = 4.500.
3.1.1 Mean Vector
Consolidates all means of a variable into a single vector,
𝑥̅ = (𝑥̅ 1, 𝑥̅ 2, … , 𝑥̅ 𝑛);
3.1.1.1 Data Matrix of Single Variable
𝑥11 𝑥21 𝑥𝑛1
1
𝑥 = (𝑥12 𝑥22 𝑥𝑛2) ; 𝑥̅ = 𝑛 1𝑇 𝑥;
𝑥1𝑛 𝑥2𝑛 𝑥𝑛𝑛
Statistical Science, Autonomous University of Barcelona
Paper nº 102386.1 Descriptive Statistics

3.1.1.2 Bivariate Data Matrices

𝑛
1
∑ 𝑥1𝑖 𝑥11𝑦11 𝑥21𝑦21 𝑥𝑛1𝑦𝑛1
𝑛
𝑖=1
𝑥, 𝑦 =
̅̅̅̅̅ 𝑛 ; 𝑓(𝑥, 𝑦) = 𝑥12𝑦12 𝑥22𝑦22 𝑥𝑛2𝑦𝑛2
1 𝑥1𝑛𝑦1𝑛 𝑥2𝑛𝑦2𝑛 𝑥𝑛𝑛𝑦𝑛𝑛
∑ 𝑦2𝑖
{ 𝑛
𝑖=1

3.2.1 Variance of Linear Combinations

𝑆 2 𝑥 = 𝑎21 𝑆 2 𝑥1 + 𝑎2 2 𝑆 2 𝑥2 + 2𝑎1 𝑎2 𝑆𝑥1 𝑥2 ;
“e.g., 𝑆 2 𝑥1 = 4; 𝑆 2 𝑥2 = 9, 𝑥 = (3)2 𝑥1 + (2)2 𝑥2 ;
Assuming there is independence between variables,
𝑆 2 𝑥 = 𝑎21 𝑆 2 𝑥1 + 𝑎2 2 𝑆 2 𝑥2 + 2𝑎1 𝑎2 𝑆𝑥1 𝑥2 ;
𝑆 2 𝑥 = ((3)2 · 4) + ((2)2 · 9) = 36 + 36 = 72;
3.2.2 Covariance of Linear Combinations
= 𝑎𝑏 21 𝑆 2 𝑥,𝑦1 + 𝑎𝑏 2 2 𝑆 2 𝑥,𝑦2 + 𝑎𝑏 2 2,1 𝑆 2 𝑥,𝑦2,1 + 𝑎𝑏 21,2 𝑆 2 𝑥,𝑦1,2
3.2.2.1 Covariance Matrix
(𝑆𝑥11𝑦11)2 𝑆𝑥21𝑦21 𝑆𝑥𝑛1𝑦𝑛1
(𝑆𝑥22𝑦22) 2
𝑆(𝑥, 𝑦) = 𝑆𝑥12𝑦12 𝑆𝑥𝑛2𝑦𝑛2
𝑆𝑥1𝑛𝑦1𝑛 𝑆𝑥2𝑛𝑦2𝑛 (𝑆𝑥𝑛𝑛𝑦𝑛𝑛)2

Proprieties of the Covariance Matrix,

Matrices are symmetric if, 𝑆(𝑥, 𝑦) = 𝑆(𝑦, 𝑥),
∑(𝑥1𝑖 − 𝑥̅ 1)2 ∑(𝑥1𝑖 − 𝑥̅ 1)(𝑥2𝑖 − 𝑥̅ 2)
= 𝑛 𝑛
∑(𝑥2𝑖 − 𝑥̅ 2)(𝑥1𝑖 − 𝑥̅ 1) ∑(𝑥2𝑖 − 𝑥̅ 2)2
𝑛 𝑛
2 1
𝑆𝑥1 𝑆𝑥1𝑥2 ⃗⃗⃗ 𝑇
𝑇
⃗⃗⃗ 𝑇
=( 2 ) ; = 𝑛 (𝑥 − 1 𝑥̅ ) (𝑥 − 1 𝑥̅ );
𝑆𝑥2𝑥1 𝑆𝑥2

Introduction To Basic Statistics
100% (2)
Introduction To Basic Statistics
31 pages
Wuolah Free STATISTICS 1
No ratings yet
Wuolah Free STATISTICS 1
12 pages
Chapter 1 MATHS 2
No ratings yet
Chapter 1 MATHS 2
13 pages
Chapter 03 Part 01
No ratings yet
Chapter 03 Part 01
13 pages
Statistics
No ratings yet
Statistics
48 pages
Descriptive Statistics For Data Science
No ratings yet
Descriptive Statistics For Data Science
47 pages
Unit 4 - Descriptive Statistics (A)
No ratings yet
Unit 4 - Descriptive Statistics (A)
19 pages
9b1056e0-94b6-47e2-b1ee-f2a1080f90d6
No ratings yet
9b1056e0-94b6-47e2-b1ee-f2a1080f90d6
22 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
30 pages
ams_l_01_statistics_summary
No ratings yet
ams_l_01_statistics_summary
35 pages
1st Chapter Modified_copie
No ratings yet
1st Chapter Modified_copie
16 pages
SQC
No ratings yet
SQC
53 pages
Research and statistics NSC 328 20242025 2
No ratings yet
Research and statistics NSC 328 20242025 2
19 pages
Basic Statistics: Statistics: Is A Science That Analyzes Information Variables (For Instance
No ratings yet
Basic Statistics: Statistics: Is A Science That Analyzes Information Variables (For Instance
14 pages
LESSON 1_2
No ratings yet
LESSON 1_2
76 pages
Chapter 2 Descriptive Statistics
No ratings yet
Chapter 2 Descriptive Statistics
12 pages
Chapter5 English Lectures
No ratings yet
Chapter5 English Lectures
21 pages
Basic Statistics Power Point
No ratings yet
Basic Statistics Power Point
41 pages
Lec5&6 02sep2016
No ratings yet
Lec5&6 02sep2016
32 pages
Statistics: Afrah Umran
No ratings yet
Statistics: Afrah Umran
27 pages
Presentation 3
No ratings yet
Presentation 3
26 pages
StatiF 1 Slides
No ratings yet
StatiF 1 Slides
27 pages
Lecture 1 PDF
0% (1)
Lecture 1 PDF
49 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
123 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
19 pages
Statistics
No ratings yet
Statistics
35 pages
1 Unnamed 04 01 2024
No ratings yet
1 Unnamed 04 01 2024
66 pages
January Semester 2023 NBHS1202
No ratings yet
January Semester 2023 NBHS1202
23 pages
Assignment (EMBA 502)
No ratings yet
Assignment (EMBA 502)
15 pages
Statistics: Continuous Variables
No ratings yet
Statistics: Continuous Variables
13 pages
Data and overview Lecture 6
No ratings yet
Data and overview Lecture 6
22 pages
Chapter 1 BFC34303 (Lyy)
No ratings yet
Chapter 1 BFC34303 (Lyy)
104 pages
300+ TOP Business Statistics MCQs and Answers 2021 PDF
100% (2)
300+ TOP Business Statistics MCQs and Answers 2021 PDF
13 pages
Statistics
No ratings yet
Statistics
14 pages
Biostatistics Biochemistry 1
No ratings yet
Biostatistics Biochemistry 1
22 pages
C1S1 Statistics Packet
No ratings yet
C1S1 Statistics Packet
24 pages
Advanced Quantitative Methods - Mean Mode
No ratings yet
Advanced Quantitative Methods - Mean Mode
5 pages
S M E: D S: Tatistics With Atlab For Ngineers Escriptive Tatisics
No ratings yet
S M E: D S: Tatistics With Atlab For Ngineers Escriptive Tatisics
16 pages
7CCMMS61 Statistics For Data Analysis: Francisco Javier Rubio Department of Mathematics
No ratings yet
7CCMMS61 Statistics For Data Analysis: Francisco Javier Rubio Department of Mathematics
13 pages
FDT and MCT
No ratings yet
FDT and MCT
19 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
11 pages
Statistics, mg4
No ratings yet
Statistics, mg4
58 pages
GNED 03 Finals Reviewer
No ratings yet
GNED 03 Finals Reviewer
10 pages
CAS_Descriptive Statistics_Final PPT-1
No ratings yet
CAS_Descriptive Statistics_Final PPT-1
112 pages
STAE Lecture Notes - LU3
No ratings yet
STAE Lecture Notes - LU3
24 pages
solutions chapter 1-1
No ratings yet
solutions chapter 1-1
3 pages
HSCC Alg1 Pe 11
No ratings yet
HSCC Alg1 Pe 11
49 pages
Mathematics in The Modern World
No ratings yet
Mathematics in The Modern World
50 pages
Statistics & Probability
No ratings yet
Statistics & Probability
19 pages
Part-I Central Tendency and Dispersion: Unit-3 Basic Statistics
No ratings yet
Part-I Central Tendency and Dispersion: Unit-3 Basic Statistics
32 pages
Quantitative Data Analysis
No ratings yet
Quantitative Data Analysis
31 pages
Statatics Chapter 1
No ratings yet
Statatics Chapter 1
21 pages
Chapter 1
No ratings yet
Chapter 1
23 pages
Determinants of Divisional Performance Evaluation Practices - ScienceDirect
No ratings yet
Determinants of Divisional Performance Evaluation Practices - ScienceDirect
22 pages
summry biostatstics pptx
No ratings yet
summry biostatstics pptx
32 pages
Lectures_ProbaStat for Engineers
No ratings yet
Lectures_ProbaStat for Engineers
60 pages
Sample Standard Deviation Control Chart
No ratings yet
Sample Standard Deviation Control Chart
101 pages
Chapter 1 BFC34303
No ratings yet
Chapter 1 BFC34303
104 pages
1-s2.0-S2352673424000507-main
No ratings yet
1-s2.0-S2352673424000507-main
12 pages
DS Chapter - 2
No ratings yet
DS Chapter - 2
73 pages
Statistical Analysis_ Descriptive Stat (2)
No ratings yet
Statistical Analysis_ Descriptive Stat (2)
6 pages
Describing The Data Using Numerical Measures
No ratings yet
Describing The Data Using Numerical Measures
160 pages
MSU-LNAC Vision
No ratings yet
MSU-LNAC Vision
77 pages
PR2 2nd Quarter
No ratings yet
PR2 2nd Quarter
24 pages
Inferential Statistics
No ratings yet
Inferential Statistics
92 pages
MMWModule - Chapter4
No ratings yet
MMWModule - Chapter4
42 pages
Normal - Distribution Z
No ratings yet
Normal - Distribution Z
29 pages
BMJ j1341 Full
No ratings yet
BMJ j1341 Full
8 pages
Gabriel R. Panganiban Bs Mls 2-Ya-1 Course 3 Unit Task
No ratings yet
Gabriel R. Panganiban Bs Mls 2-Ya-1 Course 3 Unit Task
2 pages
Asas Patologi Klinik
No ratings yet
Asas Patologi Klinik
23 pages
Quantitive Methods of Business: Narsee Monjee College of Commerce and Economics
No ratings yet
Quantitive Methods of Business: Narsee Monjee College of Commerce and Economics
38 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
8 pages
3A06 - Exercise - E - Full Solution
No ratings yet
3A06 - Exercise - E - Full Solution
5 pages
Bmgt230 Exam 1
No ratings yet
Bmgt230 Exam 1
11 pages
Statistics and Probability - Solved Assignments - Semester Spring 2010
No ratings yet
Statistics and Probability - Solved Assignments - Semester Spring 2010
33 pages
Parametric & Non-Parametric Tests
No ratings yet
Parametric & Non-Parametric Tests
34 pages
State 301 Grand Quiz
No ratings yet
State 301 Grand Quiz
4 pages
Basic Biostatistics For Post-Graduate Students: Indian Journal of Pharmacology July 2012
No ratings yet
Basic Biostatistics For Post-Graduate Students: Indian Journal of Pharmacology July 2012
9 pages
Descriptiv Minor1
No ratings yet
Descriptiv Minor1
2 pages
Descriptive Statistics - Practice Problems (99-04) : IB Math - Standard Level Year 1 - Stat Practice Alei - Desert Academy
No ratings yet
Descriptive Statistics - Practice Problems (99-04) : IB Math - Standard Level Year 1 - Stat Practice Alei - Desert Academy
19 pages
Teaching Guide and Question For Last Quarter 8th Grade Mathematics
No ratings yet
Teaching Guide and Question For Last Quarter 8th Grade Mathematics
12 pages
Introduction To Statistics For Data Science: Opensap
No ratings yet
Introduction To Statistics For Data Science: Opensap
11 pages
Frequency Distribution Samples
No ratings yet
Frequency Distribution Samples
11 pages
Mode - Median - Mean - From Frequency Table
No ratings yet
Mode - Median - Mean - From Frequency Table
2 pages
Ex 1B PG 17
No ratings yet
Ex 1B PG 17
2 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Calculus Volume1
From Everand
Calculus Volume1
Ming Yao Tsai
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
From Everand
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
5/5 (1)

Statistics 05.05

Uploaded by

Statistics 05.05

Uploaded by

Statistical Science, Autonomous University of Barcelona

Paper nº 102386.1 Descriptive Statistics

1.1.1 Qualitative Variables

1.2 Univariate Variables

“e.g., Absolute Frequency

“e.g., Relative Cumulative Frequency

1.2.2. Distribution of Quantitative Variables

1.2.2.4 Cm Mid, Class Mark Interval

Properties of the Frequencies,

1.3 Multivariate Variables

1.3.3 Marginal Distribution

𝑛𝑖𝑥 = ∑ 𝑛𝑖𝑗; 𝑛𝑥𝑗 = ∑ 𝑛𝑖𝑗

1.3.3.1 Marginal Relative Frequencies

1.3.4 Conditional Distribution

2.1 Measures of Position

Single extreme value can drag the mean toward it,

2.1.1.2 Distribution of Discrete Variables

“e.g., for a data set “x”, with an absolute frequency,

“e.g., for a data set “x”, with a relative frequency,

2.1.1.2 Distribution of Continuous Variables

“e.g., for a distribution on intervals “x”, and a given Cm Mid,

1 {𝑐𝑖1 1.5, 𝑐𝑖2 2.5, 𝑐𝑖3 3.5, 𝑐𝑖4 4.5}

Implications of the Mean,

2.1.2.1 The Median

2.1.2.2 Even Sample Size

2.1.2.3 Median for Grouped Data

“e.g., for a distribution on intervals “x”, and a given Cm Mid,

“e.g., {12,7,3,8,14,6,9,10}; {3,6,7,8,9,10,12,14}𝑛8

2.2 .2.2Measures of Dispersion

2.2.1 Mean Quadratic Error

𝑡. 𝑒(𝑣) = ∑ 𝑒𝑖 = ∑(𝑥𝑖 − 𝑣);

Implications of the Mean Quadratic Error,

𝑡. 𝑒(𝑣) = ∑ 𝑒𝑖 = ∑(𝑥𝑖 − 𝑣) > 0; 𝑥𝑖 > 𝑒𝑖.

𝑡. 𝑒(𝑣) = ∑ 𝑒𝑖 = ∑(𝑥𝑖 − 𝑣) < 0; 𝑥𝑖 < 𝑒𝑖.

𝑀𝑄𝐸(𝑣) = ∑ 𝑒𝑖 = ∑(𝑥𝑖 − 𝑣)2 ;

“e.g., {2, 4, 6}; 𝑥̅ 𝑣 = 4;

𝑡. 𝑒(𝑣) = ∑ 𝑒𝑖 = ∑(𝑥𝑖 − 𝑣)2 ; 𝑡. 𝑒(𝑣) = 8.

𝑡. 𝑒(𝑣) = ∑ 𝑒𝑖 = ∑(𝑥𝑖 − 𝑣)2 ; 𝑡. 𝑒(𝑣) = 20 ;

𝑆 = 𝑀𝑄𝐸(𝑣) = ∑(𝑥𝑖 − (𝑣 = 𝑥̅ ))2

For independent variables, exists the additivity property,

2.2.2.1 Population Variance

2.2.2.2 Sample Variance

Bessel Correction is just applied as a variable is unbiased,

“e.g., {𝑥1 2, 𝑥2 4, 𝑥3 6}; {𝑛1 3, 𝑛2 2, 𝑛3 1}

(𝑥1 2 · 𝑛1 3) + (𝑥2 4 · 𝑛2 2) + (𝑥3 6 · 𝑛3 1) 20

2.2.2.4 Variance of Relative Frequencies

“e.g., {𝑥1 70, 𝑥2 80, 𝑥3 90}; {𝑓1 0,2, 𝑓2 0,5, 𝑓3 0,3}

𝑆 = ∑ 𝑓𝑖(𝑥𝑖 − 𝑥̅ )2 ; 𝑆 2 = 24,2 + 0,5 + 24,3 = 49

2.2.2.5 Variance for Continuous Variables

𝑛1 10(𝑐𝑖1 150 − 216,667)2 = 10(4.444,42) ≈ 44.444,2,

“e.g., assuming a dispersion is within the sample,

2.2.4.1 Coefficient of Variation

“e.g., from a continuous variable,

2.2.5.1 Interquartile Range

The frequency distribution is sensitive to the scale of variables, as

“e.g., for a sample,

2.2.7.1 Pearson Correlation Coefficient

“e.g., for a sample,

2.2 Measures of Shape

2.3.2 Kurtosis Coefficient

3.1 Mean of Linear Combination

3.1.1.2 Bivariate Data Matrices

3.2.1 Variance of Linear Combinations

Proprieties of the Covariance Matrix,

You might also like