Ix. Introduction To Statistical Concepts: Frequency Distribution Measures of Central Tendency Measures of Variability
Ix. Introduction To Statistical Concepts: Frequency Distribution Measures of Central Tendency Measures of Variability
INTRODUCTION TO
STATISTICAL CONCEPTS
A . I N T R O D U C T I ON
B.FREQUENCY DISTRIBUTION
C . M E A S U R E S O F C E N T R A L T E N D E N CY
D.MEASURES OF VARIABILITY
Statistical Concepts
Meaning of Statistics
Basic Terms in Statistics
Branches of Statistics
Why do we study Statistics?
Population
set of - the totality of all
all
set of
observations or
entities under
votes consideration
some
votes Sample
- representative portion
of a population
Basic Terms in Statistics
Parameter
set - a number that
describes a
of all
set of
characteristic of a
population
votes
some
votes Statistic
- a number that describes a
characteristic of a sample
Branches of Statistics
Descriptive Statistics
-methods concerned with
collecting and describing a set of
data
Braches of Statistics
Inferential Statistics
-methods concerned with the analysis
of a sample leading to a
conclusion/generalization/inference
of the entire population
After getting data from the sample, how do
we organize and present them?
Using Frequency
Distribution
IX. INTRODUCTION TO
STATISTICAL CONCEPTS
B. FREQUENCY
DISTRIBUTION
What is a Frequency Distribution (FD)?
Step 4.
Thus,
UL=LL+(c – 1)
Procedure for Constructing a Grouped Frequency
Distribution for Quantitative Data
Summary of Steps
1. Find the Range (R= HS – LS)
2. Decide on the number of class interval (k=√𝑛)
𝑅
3. Identify the class size (𝑐 = )
𝑘
4. Find LL and UL. LL = LS, UL = LL + (c-1)
5. Find the frequency (f) of each interval.
C. MEASURES OF
CENTRAL TENDENCY
Measures of Central Tendency
Mean
the balancer of the distribution
if, the entire distribution is likened to a “see-
saw”, the mean serves as the “fulcrum”
157 1
The median is the th 79th score
2
Thus, if n=346 (even)
n n
formula if n is even th and 1th score
2 2
346 346
The median is the 173 1 174
2 2
Hence, the median is the average of the 173rd and 174th scores.
Median (Grouped Data)
where,
• LL = true lower limit or lower class boundary of the
median class;
• Fb = the sum of all frequencies below the median class
(or the <cf directly below the median class)
• f = frequency corresponding to the median class; and
• c= class size.
Median (Grouped Data)
1, 2, 3, 4, 5 1, 2, 3, 3, 4, 4
No Mode Modes = 3 and 4
1, 2, 3, 4, 4
2, 2, 3, 3, 4, 4
No Mode
Mode = 4 2, 2, 2, 2, 2
Mode = 2
Mode (Grouped Data)
Modal
Class
Mode (Grouped Data)
Thus, with reference to the
modal class, the frequency of
the lower class interval is 7
Modal while the frequency of the
Class higher class interval is 8. The
values needed to compute the
exact mode are:
For instance…
If the mean is 85.38 while the median is
86.17. Using these values, the value of the
mode using the empirical rule would be
Summary of Measures of Central Tendency
IX. INTRODUCTION TO
STATISTICAL CONCEPTS
D. MEASURES OF
RELATIVE POSITION
AND VARIABILITY
What have we learned?
Measures of a Distribution
Measures of Central Tendency
Mean
Arithmetic Mean (Raw Scores and Frequency
Distribution)
Weighted Mean
Median
Mode
Raw Scores and Frequency Distribution
Relationship of the Three Measures of Central Tendency
Box Plot
Measures of Variability
Range
Mean Absolute Deviation
Variance and Standard Deviation
Measures of Relative Position
Suppose your score in the LET is the 75th percentile
value, what does this mean? (actual LET 2009
question)
lowest highest
score score
median
1 2
Median is a quantile that divides a distribution into two equal
parts which 50% of the entire scores fall below it.
Common Types of Quantiles
2 Median M 1
3 Tertiles T 2
4 Quartiles Q 3
10 Deciles D 9
100 Percentile P 99
Deciles
Formula:
𝑘
𝐷𝑘 = (𝑛 + 1)𝑡ℎ
10
*Linear Interpolation
𝐷𝑘 = LV + 𝑏(𝐻𝑉 − 𝐿𝑉
Quartiles
Formula:
𝑘
𝑄𝑘 = (𝑛 + 1)𝑡ℎ
4
*Linear Interpolation
𝑄𝑘 = LV + 𝑏(𝐻𝑉 − 𝐿𝑉
Percentiles
Formula:
𝑘
𝑃𝑘 = (𝑛 + 1)𝑡ℎ
100
*Linear Interpolation
𝑃𝑘 = LV + 𝑏(𝐻𝑉 − 𝐿𝑉
Percentile
71 88 100 94 87 65 93 72 83 91
85 87 88 91 92 93 94 95 100 100
THINK ABOUT THIS
Is it enough to simply describe a set of data using
averages?
Consider this instance…
Suppose manufacturers of matches claim that there
are 50 matchsticks in every matchbox.
You took 7 samples of a certain brand, say Rizal, and
observed the following numbers:
45 46 47 49 47 48 47
You took another brand (Fuego) and observed the
following sample
47 47 45 44 52
Number of matchsticks for a sample of matchboxes:
Rizal: 45 46 47 49 47 48 47
Fuego: 47 47 45 44 52
Rizal: 45 46 47 49 47 48 47
Fuego: 47 47 45 44 52
329
Mean (R) = =47
7
235
Mean (F) = =47
5
scattering
variation
dispersion
consistency
heterogeneity
Measures of Variability
Range
a crude estimate of variability
R=HS – LS
Measures of Variability
Finding the range of the number of matchsticks
in each set:
Rizal: 45 46 47 49 47 48 47
Fuego: 47 47 45 44 52
Range (R) = 49 – 45 =4
Range (F) = 52 – 44 = 8
How do we identify
mean
the deviation of each
score (x) from the
mean (𝑥)?
ҧ
In tabular form…
How do we get a single
Scor Deviation
value that summarizes
e (score –
these deviations?
mean)
44 -3
45 -2 Answer:
47 0 we get the
47 0 average of these
52 5 deviations
Since the sum of the deviations from the mean
equals zero, we need to remove the negative sign.
How?
∑ 𝒔𝒄𝒐𝒓𝒆−𝒎𝒆𝒂𝒏
MAD=
𝒏
But… Getting an absolute value is
NOT used in higher
Score Deviation statistics…
(score –
mean)
What mathematical
44 -3 operation can we use to
45 -2 eliminate the negative sign?
47 0
47 0 Answer: square the deviations
52 5
Squaring the deviations and getting the average of these
squared deviations…
∑(𝒔𝒄𝒐𝒓𝒆−𝒎𝒆𝒂𝒏)𝟐
MSD=
𝒏
NOTE:
There is two types of variance
Biased Estimate s 2
x x
2
Unbiased Estimate s 2
x x
2
n 1
s 2
2
n x x
2
nn 1
Squaring the deviations and getting the average of these
squared deviations…
∑(𝑠𝑐𝑜𝑟𝑒−𝑚𝑒𝑎𝑛)2
𝑠𝑑 = (unbiased estimate)
𝑛−1
Therefore…
If the variance are
7.6 (biased estimate)
9.5 (unbiased estimate)
Biased Estimate
𝟕. 𝟔 = 𝟐. 𝟕𝟔 matchsticks
Unbiased Estimate
𝟗. 𝟓 = 𝟑. 𝟎𝟖 matchsticks
We find the measure of
variability of the Rizal
brand using MS Excel…
CHARACTERISTICS OF DISTRIBUTIONS
-a distribution with
only one peak
-most common
Modality of a Distribution
-a distribution
with two peaks
-may not be of
the same heights
Relationship of the Three Measures of Central
Tendency