0% found this document useful (0 votes)
84 views

Ix. Introduction To Statistical Concepts: Frequency Distribution Measures of Central Tendency Measures of Variability

This document introduces several key statistical concepts. It discusses frequency distributions, which organize a set of data into categories and their frequencies. It outlines the process for constructing grouped frequency distributions for quantitative data, including determining class intervals. The document also introduces measures of central tendency, specifically discussing how to calculate the arithmetic mean. The mean is defined as the sum of all values divided by the total number of values. Examples are provided to demonstrate calculating the mean from raw data sets and from grouped frequency distributions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views

Ix. Introduction To Statistical Concepts: Frequency Distribution Measures of Central Tendency Measures of Variability

This document introduces several key statistical concepts. It discusses frequency distributions, which organize a set of data into categories and their frequencies. It outlines the process for constructing grouped frequency distributions for quantitative data, including determining class intervals. The document also introduces measures of central tendency, specifically discussing how to calculate the arithmetic mean. The mean is defined as the sum of all values divided by the total number of values. Examples are provided to demonstrate calculating the mean from raw data sets and from grouped frequency distributions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 119

IX.

INTRODUCTION TO
STATISTICAL CONCEPTS

A . I N T R O D U C T I ON
B.FREQUENCY DISTRIBUTION
C . M E A S U R E S O F C E N T R A L T E N D E N CY
D.MEASURES OF VARIABILITY
Statistical Concepts

 Meaning of Statistics
 Basic Terms in Statistics
 Branches of Statistics
Why do we study Statistics?

 “people need to develop a discerning


sense of rational thought that will
enable them to evaluate data as to
make intelligent decisions, inferences
and generalizations” (J.T. McClave &
F. Dietrich Jr., 1985)
Meaning of Statistics

 entities that comprise numerical


information

 the study of rules, techniques or


methods use to collect, present, analyze
and interpret a set of numerical data

 numerical measures derived from a


smaller portion of a data set
What is Statistics?

 Tate(1955) has beautifully summarized


the different meaning of statistics…

“It’s all perfectly clear, you compute


statistics from statistics by statistics.”
Meaning of Statistics

“It’s all perfectly clear, you compute


statistics (mean, median, mode,
etc)
from statistics (numerical facts)
by statistics (statitical
metods).”(Tate, 1955)
Statistics as a discipline it is roughly
defined as a “science of data”
 Data means observations or evidences of a
characteristic of a population

 The scientific educational researches


require the data by means of some
standardized research tools or self-
designed instrument.

 Data are both qualitative and quantitative


in nature.
Basic Terms in Statistics

Population
set of - the totality of all

all
set of
observations or
entities under

votes consideration
some
votes Sample
- representative portion
of a population
Basic Terms in Statistics

Parameter
set - a number that
describes a
of all
set of
characteristic of a
population
votes
some
votes Statistic
- a number that describes a
characteristic of a sample
Branches of Statistics

Descriptive Statistics
-methods concerned with
collecting and describing a set of
data
Braches of Statistics

Inferential Statistics
-methods concerned with the analysis
of a sample leading to a
conclusion/generalization/inference
of the entire population
After getting data from the sample, how do
we organize and present them?

Using Frequency
Distribution
IX. INTRODUCTION TO
STATISTICAL CONCEPTS

B. FREQUENCY
DISTRIBUTION
What is a Frequency Distribution (FD)?

 A frequency distribution is a tabular


arrangement of data indicating the
different classes or categories and the
corresponding frequencies.

 A way of organizing set of data.


What is a Frequency Distribution (FD)?

 Two types of frequency distribution can


arise depending on whether the data
gathered are qualitative or quantitative.

 Qualitative data - variable can vary


only in “quality” or “attribute”
 Quantitative data - the variable can
vary in quantity
How do we construct FD?

For qualitative data, the


construction of the frequency
distribution is straightforward since
the categories are already defined.
Example of FD for qualitative variable marital status
Can we use the same procedure for quantitative data?

 Sometimes, but most of the times, the procedure is different


since reporting each score as a category may result into a table
with so many categories.

 Instead of reporting each score as a category, we condense


the scores into some categories.
Say for this example…
Procedure for Constructing a Grouped Frequency
Distribution for Quantitative Data

Step 1. Find the range R of the


scores, where:

R = highest score – lowest score.


Procedure for Constructing a Grouped Frequency
Distribution for Quantitative Data

Step 2. Decide on the number of class


intervals or classes. We will denote
the number of class intervals by k.
 The ideal number according to some authors
is between 10 to 15 class intervals.
 Sturges’ formula
 Or
Procedure for Constructing a Grouped Frequency
Distribution for Quantitative Data

Step 3. Determine the class size or


class width of the interval. If c
represents the class size, we use the
formula
Procedure for Constructing a Grouped Frequency
Distribution for Quantitative Data

Step 4. Determine the lower limit LL and the upper


limit UL of the lowest class interval.
The lowest class interval should contain the lowest
score or value of the data set.
Hence LL is chosen as the score or value that is less
than or equal to the lowest score. Once the value of LL
has been determine, the value of the upper limit UL is
obtained using the equation
Procedure for Constructing a Grouped Frequency
Distribution for Quantitative Data

Step 4.
Thus,

LL=65 (closest to 67 and a multiple of the class


size c=5)

UL=LL+(c – 1)
Procedure for Constructing a Grouped Frequency
Distribution for Quantitative Data

Step 5. Determine the upper class


intervals by consecutively adding the class
size c to the values of LL and UL of the
lowest class interval until we get a class
interval containing the highest score of
the data set. The interval containing the
highest value is referred to as the
highest class interval.
Procedure for Constructing a Grouped Frequency
Distribution for Quantitative Data

Step 6. Make a tally of the scores using


the obtained class intervals.

Step 7. Summarize the tallies by way of a


table similar to Table 2 in Example 1.
Procedure for Constructing a Grouped Frequency
Distribution for Quantitative Data

Summary of Steps
1. Find the Range (R= HS – LS)
2. Decide on the number of class interval (k=√𝑛)
𝑅
3. Identify the class size (𝑐 = )
𝑘
4. Find LL and UL. LL = LS, UL = LL + (c-1)
5. Find the frequency (f) of each interval.

See demonstration one the board


Next slide ( SCORES)
Say for this example…
NOTE…

The two numbers that define a class interval are


called apparent limits. To reflect the continuity of
the scores, the true limits or class boundaries
are indicated. These are obtained by adding all upper
limits and subtracting all lower limits by “one-half of
the difference between the apparent upper limit of
any class interval and the apparent lower limit of
the succeeding class interval”.
NOTE…

For instance, 79 is the apparent upper limit of


the class interval 75 – 79 and 80 is the
apparent lower limit of the succeeding class
interval . So, half of the difference or 79 and
80 is 0.5 which is the number to be added to
all upper limits and subtracted from all lower
limits to obtain the class boundaries of the
class intervals.
The class
boundaries
remove the
discontinuity
between class
intervals.
NOTE…

In addition to the frequency associated with


each class interval, other statistical
information such as the class midpoint or
class mark, the less than cumulative
frequency (<cf), the greater than
cumulative frequency (>cf), and the
relative frequency (rf) may be reflected in
the final table.
IX. INTRODUCTION TO
STATISTICAL CONCEPTS

C. MEASURES OF
CENTRAL TENDENCY
Measures of Central Tendency

In our everyday life, we are more often than not,


exposed to statements involving the concept of
“average” such as:
 the average price of car,
 the average amount of rainfall in the Philippines or
 the average salary of the nurses abroad.

Somehow, informed individuals generally make


decisions based on their understanding of this
important idea.
What is an “average”?

 values on the scale where most of


the other scores are clustered

 the most typical score

 can be classified as the mean,


median and mode
How is an “average” computed?

can be computed or merely inspected from


a set of data;

 either in their original form (called raw data)


or

 from that have been organized into frequency


distribution (called grouped data)
Measure of Central Tendency

Mean
 the balancer of the distribution
 if, the entire distribution is likened to a “see-
saw”, the mean serves as the “fulcrum”

• can be classified as an arithmetic mean,


weighted mean, harmonic mean, etc
Mean

Arithmetic Mean is obtained by adding the scores in a


distribution and dividing it by the number of scores,
that is:
𝑠𝑢𝑚 𝑜𝑓 𝑠𝑐𝑜𝑟𝑒𝑠
𝑀𝑒𝑎𝑛 =
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑐𝑜𝑟𝑒𝑠
Mean
Example 1. Compute the average score in
science of the 12 students who obtained the
following scores: 22, 24, 17, 21, 19, 18, 21, 30,
30, 17, 15, and 20.

In research, the rule is for us to round-up heavily,


which means that we only have to retain the
digits that are meaningful.
Mean
Example 2. The following data represent the
net take home pay of five rank and file
employees of a certain company. Determine the
average net take home pay of the employees.
NET TAKE HOME PAY: P4,750; P4,535;
P4,380; P3,895; and P9,307.
Mean
Finding mean of a frequency distribution.
 How do we estimate the mean of the data if
the scores are not available?
 The mean of a grouped data can be
computed by:

 where f = class frequency; X = class


midpoint; and N is the total number of
cases in the frequency distribution
Mean

Example 3. Find the mean of the data as shown in


the frequency distribution below.
Mean
Solution:
• First, we expand the table by constructing the
columns for X (midpoints) and
• fX, (the products of corresponding midpoints and
class frequencies).
Mean
We rounded off the
computed mean to the
nearest hundredths
because the class
intervals actually extend
up to the nearest tenths
below and above the
Using the summary given apparent limits.
values from the table,
the mean of the 40
scores is given by
Median (Raw Data)

 the score where half of the total number of scores is


found below it and the other half above it.
 the middle score when all the scores have been
ranked.
 no common symbol for the median. (but we use Md
for median)
 an appropriate measure of central tendency when
the variable is measured in at least the ordinal
scale.
Median (Raw Data)

Common application of median


Suppose you are tasked to put the
following words in a backdrop, which
letters will you start for it to be
symmetrical?
HAPPY
SUMMER
VACATION
Median (Raw Data)
The following are the steps in finding the median
based on raw data:

1. Arrange the data or scores in ascending order


(from lowest to highest);

2. If n is odd, there will be a middle score. This


middle score is the median.

If n is even, there will be two middle scores and the


median is taken as the arithmetic average of
the two middle scores.
Note:

Because the median depends only on


the number of cases, it is more
preferred than the mean whenever
extreme values occur in a data set.
Median (Raw Data)
Example 6. The scores of nine students in a
science test are: 22, 24, 17, 21, 19, 18, 21, 30, 30 .
Find the median score.
Solution: Arranging the scores in ascending order
and identifying the middle score, the median score is 21.
Median (Raw Data)
Example 7. The age (in years) of six teachers are
listed below 42, 23, 24, 30, 27, and 34. Find the
median age of the six teachers.
Solution: Because the number of cases is 6 (even), there will
be two middle scores. The age in ascending order are listed
below. From the arranged data, the two middle scores are 27
and 30. Hence, the median is 28.5.
Median (Raw Data)
When n is large, locating the position of the median by
ocular inspection may not be easy. However, based on
the definition given and the parity (oddness or
evenness) of n, we can use the following formulas for
locating the middle score (s).
Median (Raw Data)
 n 1
Thus, if n=157 (odd) formula if n is odd  th
 2 

 157  1 
The median is the  th  79th score
 2 
Thus, if n=346 (even)
n n 
formula if n is even  th and   1th score
2 2 
 346   346 
The median is the    173   1  174
 2   2 
Hence, the median is the average of the 173rd and 174th scores.
Median (Grouped Data)

The formula can be used in finding the median of a frequency


distribution

where,
• LL = true lower limit or lower class boundary of the
median class;
• Fb = the sum of all frequencies below the median class
(or the <cf directly below the median class)
• f = frequency corresponding to the median class; and
• c= class size.
Median (Grouped Data)

Example 8. Find the median of the following


frequency distribution.
Median (Grouped Data)

Solution: We note that = 50%(30) = 15. Looking at


the <cf, 15 is between 7 and 17.
Hence the median class is 80 – 84. With reference to
the median class, we have ; Fb = 7; f = 10; and c = 5.
Therefore, the value of the median is given by
Mode (Raw Data)

 the value or the score that occurs most


frequently in a collection of scores.

 represented by the tallest column on a


histogram or the highest peak in a
frequency polygon.

 appropriate when the variable is


measured in the nominal scale.
Mode (Raw Data)

Example 10. Find the modal age of the 10 Grade


III pupils whose ages are listed below:
Age x (in years): 10.25, 9.0, 10.25, 9.5, 9.0, 10, 9.0,
9.25, 10.75, 10.
Solution: We first arrange the scores in ascending order and
take note of the frequency of occurrence of each score.
Mode (Raw Data)

Find the mode of the following data set.

 1, 2, 3, 4, 5 1, 2, 3, 3, 4, 4
No Mode Modes = 3 and 4
 1, 2, 3, 4, 4
2, 2, 3, 3, 4, 4
No Mode
Mode = 4 2, 2, 2, 2, 2
Mode = 2
Mode (Grouped Data)

For data that are summarized in a frequency


distribution, an approximate value of the
mode called the “crude mode” is defined
as the midpoint of the class interval with the
highest class frequency (called the modal
class), and thus is also easily obtained by
inspection.
Mode (Grouped Data)
A more accurate value of the mode (exact mode) for grouped
data is also obtained by linear interpolation.
The resulting formula is given by

 where, LL = true lower limit or lower boundary of the modal


class;
 d1 = absolute difference between the frequencies of the
modal class and the lower class interval (interval just below
it);
 d2 = absolute difference between the frequencies of the
modal class and the higher class interval (interval just above
it);
 c = the class size.
Mode (Grouped Data)
Example 11. Find the exact mode based on the same
frequency distribution given in Example 3.
Solution: The modal class of the frequency distribution
shown below is 85 – 89 since it has the largest value of f.

Modal
Class
Mode (Grouped Data)
 Thus, with reference to the
modal class, the frequency of
the lower class interval is 7
Modal while the frequency of the
Class higher class interval is 8. The
values needed to compute the
exact mode are:

 LL = 84.5; d1 = |12 - 7| = 5 and d2


= |12 – 8| = 4 and c = 5.
Mode (Grouped Data)
Note…
There is also a rule in statistics called the
“empirical rule” which allows us to
compute the value of the mode of
grouped data when the mean and the
median are available. This rule is given
by the formula
Mode (Grouped Data)

For instance…
If the mean is 85.38 while the median is
86.17. Using these values, the value of the
mode using the empirical rule would be
Summary of Measures of Central Tendency
IX. INTRODUCTION TO
STATISTICAL CONCEPTS

D. MEASURES OF
RELATIVE POSITION
AND VARIABILITY
What have we learned?
Measures of a Distribution
 Measures of Central Tendency
 Mean
 Arithmetic Mean (Raw Scores and Frequency
Distribution)
 Weighted Mean
 Median

 Raw Scores and Frequency Distribution

 Mode
 Raw Scores and Frequency Distribution
 Relationship of the Three Measures of Central Tendency

 Summary (When to use, advantages and disadvantages)


What to discuss?
 Measures of Relative Position
 Quantiles
 Percentile,
Decile and Quartile
 How to compute Percentile Value (PV)

 How to find Percentile Rank

 Box Plot

 Measures of Variability
Range
Mean Absolute Deviation
Variance and Standard Deviation
Measures of Relative Position
Suppose your score in the LET is the 75th percentile
value, what does this mean? (actual LET 2009
question)

A. You got a score of 75 in the test.


B. You got 75% of the total number of items correct.
C. You are higher than 75% of those who took the test.
D. You are one of the top 75% of all the takers.

Remember your choice as we discuss “QUANTILES”…


What are “quantiles”?

 referred as measure of relative position

 values on a scale or distribution which


we can find a certain percent of all other
scores below this value

 these values divide the entire


distribution into “q” equal parts
for instance

lowest highest
score score

median
1 2
Median is a quantile that divides a distribution into two equal
parts which 50% of the entire scores fall below it.
Common Types of Quantiles

Number of Type of Symbol Number of


Partitions (q) Quantile Quantiles

2 Median M 1
3 Tertiles T 2
4 Quartiles Q 3

10 Deciles D 9
100 Percentile P 99
Deciles

 The values that divide the distribution


into 10 equal parts.

Formula:
𝑘
𝐷𝑘 = (𝑛 + 1)𝑡ℎ
10
*Linear Interpolation
𝐷𝑘 = LV + 𝑏(𝐻𝑉 − 𝐿𝑉
Quartiles

 The values that divide the distribution


into 4 equal parts.

Formula:
𝑘
𝑄𝑘 = (𝑛 + 1)𝑡ℎ
4
*Linear Interpolation
𝑄𝑘 = LV + 𝑏(𝐻𝑉 − 𝐿𝑉
Percentiles

 The values that divide the distribution


Into 100 equal parts.

Formula:
𝑘
𝑃𝑘 = (𝑛 + 1)𝑡ℎ
100
*Linear Interpolation
𝑃𝑘 = LV + 𝑏(𝐻𝑉 − 𝐿𝑉
Percentile

 The values that divide the distribution


into 100 equal parts are called percentiles.

 If Px denotes the xth percentile value, then

 Px = value on the scale below which we


can find x% of the scores.
Percentile
Thus, P90 or the 90th percentile value is the value in
the distribution below which we can find 90% of all
the other scores.
In a class consisting of 50 pupils, a pupil
whose final grade corresponds to or is greater
than P90 is said to belong to the upper 10% of
the entire pupils in the class.

This also means that his grade is better than


90%(50) = 45 pupils in the class.
Example

Find Q1, Q3 and P99 for the following data:


95 81 59 68 100 92 75 67 85 79

71 88 100 94 87 65 93 72 83 91

Solution: We first arrange the data in ascending order as


follows:
59 65 67 68 71 72 75 79 81 83

85 87 88 91 92 93 94 95 100 100
THINK ABOUT THIS 
Is it enough to simply describe a set of data using
averages?
Consider this instance…
Suppose manufacturers of matches claim that there
are 50 matchsticks in every matchbox.
 You took 7 samples of a certain brand, say Rizal, and
observed the following numbers:
45 46 47 49 47 48 47
 You took another brand (Fuego) and observed the
following sample
47 47 45 44 52
 Number of matchsticks for a sample of matchboxes:

 Rizal: 45 46 47 49 47 48 47
 Fuego: 47 47 45 44 52

 What Brand would you more likely select the next


time you buy a match?

 What possible statistics will help you decide on this?


 Finding the average number of matchsticks in each
set:

 Rizal: 45 46 47 49 47 48 47
 Fuego: 47 47 45 44 52

329
 Mean (R) = =47
7
235
 Mean (F) = =47
5

 In fact, both sets have a median and mode of 47.


NOTE…

 The situation shows two possible sets with


the same averages (mean, median and mode)
but of different values.

 Generally, the measure of central tendency


is not enough in describing a set of data.

 We must also consider how the scores vary


among each other.
 A number that describes how scores
vary among in a data
measure of variability

 Can also be called as the measure of


 spread

 scattering

 variation

 dispersion

 consistency

 heterogeneity
Measures of Variability

Range
 a crude estimate of variability

 difference between the highest score


and the lowest score

 R=HS – LS
Measures of Variability
 Finding the range of the number of matchsticks
in each set:
Rizal: 45 46 47 49 47 48 47
Fuego: 47 47 45 44 52

 Range (R) = 49 – 45 =4
 Range (F) = 52 – 44 = 8

 Since the range of Fuego is larger than that


of Rizal, the number of matchsticks of the
former is more varied than the latter.
Measures of Variability
 Since the range is just a crude estimate, there are
times that it may not compare different variations
correctly.

For instance, consider the following sets:


A: 1 2 3 4 5 6 7
B: 1 1 1 1 1 1 1 1 1 1 1 8

 What is the range of each set?


 How do you compare the ranges relative to the actual
variation of the set?
How do we solve this problem?

 we consider all the scores in finding a


measure of variation
 this is done by finding first a number that will
serve as reference point and where all the
scores will be assessed relative to this point
 what number can we use?
 It’s the MEAN, why?
 afterwards, we can now identify how far each
score is away from the mean
We take the case of Fuego…

What is the mean


of these scores?
-3 - 2 5
47
44 45 47 52

How do we identify
mean
the deviation of each
score (x) from the
mean (𝑥)?
ҧ
In tabular form…
How do we get a single
Scor Deviation
value that summarizes
e (score –
these deviations?
mean)
44 -3
45 -2 Answer:
47 0 we get the
47 0 average of these
52 5 deviations
Since the sum of the deviations from the mean
equals zero, we need to remove the negative sign.
How?

Scor Deviation Absolute


Answer:
(score – Deviation
e we get the
mean) absolute value
of these
44 -3 3 deviations
45 -2 2
47 0 0
47 0 0
52 5 5
We can now find the average of these
absolute deviations .

Score Deviation Absolute


(score – Deviation Mean Absolute
mean) Deviation or
44 -3 3 MAD
45 -2 2
3+2+0+0+5
47 0 0 =2
5
47 0 0
52 5 5
Mean Absolute Deviation

Average of the absolute deviations from the mean

𝑠𝑢𝑚 𝑜𝑓 𝑡ℎ𝑒 𝑎𝑏𝑠𝑜𝑙𝑢𝑡𝑒 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛𝑠 𝑜𝑓 𝑒𝑎𝑐ℎ 𝑠𝑐𝑜𝑟𝑒 𝑓𝑟𝑜𝑚 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛


MAD=
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑐𝑜𝑟𝑒𝑠

∑ 𝒔𝒄𝒐𝒓𝒆−𝒎𝒆𝒂𝒏
MAD=
𝒏
But… Getting an absolute value is
NOT used in higher
Score Deviation statistics…
(score –
mean)
What mathematical
44 -3 operation can we use to
45 -2 eliminate the negative sign?
47 0
47 0 Answer: square the deviations
52 5
Squaring the deviations and getting the average of these
squared deviations…

Score Deviation Squared Mean Squared


(score – Deviations Deviation or
mean) MSD
44 -3 9
45 -2 4
9 + 4 + 0 + 0 + 25
47 0 0 = 7.6
5
47 0 0 commonly
52 5 25 known as the
VARIANCE
Variance
Average of the squared deviations
from the mean

𝑠𝑢𝑚 𝑜𝑓 𝑡ℎ𝑒 𝑠𝑞𝑢𝑎𝑟𝑒𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛𝑠 𝑜𝑓 𝑒𝑎𝑐ℎ 𝑠𝑐𝑜𝑟𝑒 𝑓𝑟𝑜𝑚 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛


MSD=
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑐𝑜𝑟𝑒𝑠

∑(𝒔𝒄𝒐𝒓𝒆−𝒎𝒆𝒂𝒏)𝟐
MSD=
𝒏
NOTE:
There is two types of variance

Biased Estimate s 2

 x  x
2

Unbiased Estimate s 2

 x  x
2

n 1

s  2
 2

n  x   x 
2

nn  1
Squaring the deviations and getting the average of these
squared deviations…

Score Deviation Squared Mean Squared


(score – Deviations Deviation or
mean) MSD
44 -3 9
45 -2 4
9 + 4 + 0 + 0 + 25
47 0 0 = 9.5
5−1
47 0 0 commonly known as
the VARIANCE
52 5 25 (Unbiased Estimate)
However…

 How do you compare the value of the


variance relative to the deviations?

 What do you observe with the units of


the scores, the mean and the variance?
Answer…
 How do you compare the value of the variance
relative to the deviations?

The variance is extremely different


compared with the deviations.

 What do you observe with the units of the


scores, the mean and the variance?

The unit of the variance is inconsistent


with that of the scores and the mean.
How do we solve the problem?

 We get the square root of the variance


and call the value as the standard
deviation…
 Standard deviation (s/sd) = 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
∑(𝑠𝑐𝑜𝑟𝑒−𝑚𝑒𝑎𝑛)2
 𝑠𝑑 = (biased estimate)
𝑛

∑(𝑠𝑐𝑜𝑟𝑒−𝑚𝑒𝑎𝑛)2
 𝑠𝑑 = (unbiased estimate)
𝑛−1
Therefore…
If the variance are
7.6 (biased estimate)
9.5 (unbiased estimate)

The standard deviation are

Biased Estimate
𝟕. 𝟔 = 𝟐. 𝟕𝟔 matchsticks

Unbiased Estimate
𝟗. 𝟓 = 𝟑. 𝟎𝟖 matchsticks
We find the measure of
variability of the Rizal
brand using MS Excel…
CHARACTERISTICS OF DISTRIBUTIONS

The distribution of data derived from


the measurement of a continuous or
interval variable can be further
characterized using the graph of the
distribution.
By studying the graph closely, we can
describe its symmetry, peakedness, as
well as the modality.
1. Symmetry and Skewness

A symmetric distribution arises when a


vertical line drawn from the highest point
of the graph to the base line divides the
graph into two portions which are mirror
images of each other with respect to the
vertical line drawn.
1. Symmetry and Skewness

Skewness is the degree of departure of


the distribution from symmetry.

In general, distributions can either be


positively skewed or negatively skewed.
1. Skewness

A distribution is said to be positively


skewed when it tapers to the right.

Positive skewness can occur when most


scores or values in the distribution are
low and only few are high.
1. Skewness

A distribution is said to be negatively


skewed when it tapers to the left.

Negative skewness can occur when most


scores or values in the distribution are
high and only few are low.
2. Kurtosis

Kurtosis is defined as the peakedness of a


curve (highly peaked or lowly peaked)
Platykurtic Distribution

When the values in a set of data are


highly heterogeneous, the resulting graph
will be low peaked.
Leptokurtic Distribution

When the values in a set of data are


highly homogeneous, the resulting graph
will be high peaked.
Mesokurtic Distribution

A distribution that is neither highly


peaked nor low peaked and is said to have
an average kurtosis.
Modality of a Distribution

Characterized by the number of peaks of a


distribution.
Modality of a Distribution

-a distribution with
only one peak

-most common
Modality of a Distribution

-a distribution
with two peaks

-may not be of
the same heights
Relationship of the Three Measures of Central
Tendency

When the distribution of a set of data is


symmetric, the three measures of central
tendency have the same values
Relationship of the Three Measures of Central
Tendency

For skewed distributions…


“If you can’t quantify
what you are saying,
you don’t know what
you are saying”
Lord Kelvin

You might also like