Module 1A - Review of Elementary Statistics
Module 1A - Review of Elementary Statistics
1
Review of
Elementary Statistics
CONTENTS: OBJECTIVES
A. Descriptive Statistics 1. identify the classification of data of the given
set of data;
1. Classification of Data 2. determine the level of measurement of a
2. Levels of Measurement given data;
3. compute and interpret measures of central
3. Measures of Central tendency tendency, and position for ungrouped data;
4. use appropriate statistical measures in
4. Measures of Position
analyzing and interpreting statistical data;
5. Exploratory Data Analysis 5. identify the different parts of a boxplot;
6. Construct and interpret a boxplot.
DEFINITION OF STATISTICS
In its plural sense, statistics refers to a set of numerical data. These are numerical facts
and figures collected in a systematic manner with a definite purpose in any field of study.
In this sense, statistics are also aggregates of facts which are expressed in numerical
form. (e.g., vital statistics in a beauty contest, monthly sales of a company, daily peso-
dollar exchange rate)
In its singular sense, Statistics is the branch of science which deals with the collection,
presentation, analysis, and interpretation of data.
FIELDS OF STATISTICS
Descriptive Statistics
- consist of methods concerned with the collection, organization, summarization and
presentation of a set of data without drawing conclusions or inferences about a larger
set
- in descriptive statistics, the statistician only tries to describe a situation or the sample
- conclusions apply only to the data on hand
Inferential Statistics
- Consists of generalizing from samples to populations, performing estimations and
hypothesis tests and determining relationships among variables,
- the main concern is not only to describe but to actually predict and make inferences
based on the information gathered
- conclusions are applicable to a larger set of data which the data on hand is only a
subset
Data – the values (or observations) that the variables can assume.
CLASSIFICATION OF VARIABLES
Qualitative variables – variables that can be placed into distinct categories, according to
some characteristics or attribute.
Example:
marital status (single, married, widowed), gender (male or female), religion, occupation
Discrete variables – assume values that can be counted, usually measured by counting
or numeration.
Continuous variables – can assume infinite number of values between any two specific
values, usually obtained by measuring and often include fractions and decimals.
LEVELS OF MEASUREMENT
1. Nominal Level - The nominal level is the weakest level of measurement. Numbers or
symbols are used simply for categorizing subjects into different groups.
2. Ordinal Level - contains the properties of the nominal level. In addition, it classifies data
into categories that can be ranked. However, precise difference between the ranks do not
exist.
A true zero point means that when a variable has a value of 0, that means the variable
does not exist.
In the case of interval level, there’s no true zero point because getting a value of 0
does not mean “nothingness”.
4. Ratio Level – this is the highest level of measurement. It possesses all the properties of
interval level, plus there exists a true zero.
Summary:
Numerical Data
Qualitative Quantitative
1. MEAN - the sum of the values, divided by the total number of values. This is also known
as arithmetic average.
X 1+ X 2 + X 3 +…+ X n ∑ X
Sample mean: X = =
n n
Where:
X - sample mean
n – sample size (number of values in the sample)
X 1+ X 2+ X 3 +…+ X n ∑ X
Population mean: μ= =
N N
Where:
μ– population mean
N – population size (number of values in the population)
2. MEDIAN
- middlemost value in an ordered array of data (midpoint)
- unaffected by extreme values/outliers*
- measure of position
*Outlier - an extremely high or an extremely low data value when compared to the rest of
data values.
Example: in a 100-item quiz, most of the students got a score of 80, 85, 83, 90 and 87;
however, there are few students who got a score of 5 and 3.
3. MODE
- value that occurs most often or the value that has the highest frequency.
- lowest level of central tendency
1. Thirty people were asked the question, “How many people do you consider your best
friend?”. The graph below shows the result of the survey.
2. The mean age of 10 full time guidance counselors is 35 years old. Two new full time
guidance counselors, aged 28 and 30, are hired. Five years from now, what would be the
average age of these twelve guidance counselors?
4. For the Senior High School dance competition, there is a debate going on among students
regarding the team/school color that will be featured prominently. Votes were sent by
students via SMS, and the results are as follows:
MEASURES OF POSITION
These are used to locate the relative position of a data value in a data set.
A. PERCENTILE
- A measure that pinpoints a location that divides distribution into 100 equal parts.
- Denoted by P1 , P2 , P3 ,… , P99
Examples:
a. P2 is the 2nd percentile. This the value for which at least 2% of the observations are less
than or equal to it. At the same time, 98% of the observations are greater than or equal
to it.
b. If you are in the 88th percentile in your Stat exam, it means that 12% of your
classmates scored higher than you while 88% of your classmates scored lower than
you.
PERCENTILE FORMULA
Example 1:
A teacher gives a 20-point test to 10 students. The scores are shown below. Find the
percentile rank of a score of 12.
Example 1 – Solution
3. Thus, a student whose score was 12 did better than 65% of the class.
Example 2:
Using the data in example 1, find the percentile rank for a score of 6
n∙ p
c=
100
Steps:
1. Arrange the data in order from highest to lowest.
n∙ p
2. Substitute the values in the formula: c=
100
3. If c is not a whole number, round UP to the next whole number. Starting at the lowest
value, count over to the number that corresponds to the rounded-up value.
4. If c is a whole number, use the value halfway between the c th and the (c+1)th values
when counting up from the lowest value.
Example 3:
Using the same set of scores in example 1, find the value corresponding to (a)25 th
percentile (b) 60th percentile.
3. Since the value of c is not a whole number, round it up to the next whole number, which
is 3.
4. Start at the lowest value and count over to the third value, which is 5.
5. Therefore, the value 5 corresponds to the 25 th percentile.
10 ∙60
c= =6
100
3. Since the value of c is a whole number, use the value halfway between c and c+1. In
this case the 6th and the 7th value.
10+12
=11
2
B. QUARTILE
- Quartiles divide the distribution into four equal parts
- separated by Q 1 ,Q 2 , Q 3
Steps:
1. Arrange the data in order from highest to lowest.
2. Find the median of the data values. This is the value for Q 2
3. Find the median of the data values that fall below Q 2. This is the value for Q1.
4. Find the median of the data values that fall above Q 2. This is the value for Q3.
Example 4
Find Q1, Q2 and Q3 for the data set
Example 4 – Solution
1. Arrange the data values from lowest to highest
5, 6, 12, 13, 15, 18, 22, 50
5, 6, 12, 13
6 +12
MD= =9(this is Q1 )
2
18+ 22
MD= =20(this is Q3)
2
C. DECILE
- Deciles divide the distribution into ten equal groups
- They are denoted by D 1 , D 2 , D3 ,… , D 9
A boxplot can be used to graphically represent the data set. These plots involve five
specific values:
Box-and-whisker plot
A boxplot is a graph of a data set obtained by drawing a horizontal line from the minimum
data value to Q1, drawing a horizontal line from Q3 to the maximum data value, and
La Salle Green Hills Statistics and Probability Learning
Module
Senior High School Department Alvarez | Estonilo |
Garcia
drawing a box whose vertical sides pass through Q 1 and Q3 with a vertical line inside the
box passing through the median or Q 2.
Minimum Maximum
1st Quartile 2nd Quartile / 3rd Quartile
Value Value
Median
2. If the median falls to the left of the center of the box, the
distribution is positively skewed.
3. If the median falls to the right of the center of the box, the
distribution is negatively skewed.
1. Find the five-number summary for the data values, that is the maximum and minimum
data values, Q1 and Q3, and the median.
2. Draw a horizontal axis with a scale such that it includes the maximum and minimum
data values.
3. Draw a box whose vertical sides go through Q 1 and Q3, and draw a vertical line through
the median.
4. Draw a line from the minimum data value to the left side of the box and a line from the
maximum data value to the right of the box.
Practice Exercise: Draw and make an interpretation on the boxplot that you can make
on each problem.
2. A dietitian is interested in comparing the sodium content of real cheese with the
sodium content of a cheese substitute. The data for two random samples are
shown. Compare the distributions, using boxplots.