0% found this document useful (0 votes)
36 views

Module 1A - Review of Elementary Statistics

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

Module 1A - Review of Elementary Statistics

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

UNIT

1
Review of
Elementary Statistics

CONTENTS: OBJECTIVES
A. Descriptive Statistics 1. identify the classification of data of the given
set of data;
1. Classification of Data 2. determine the level of measurement of a
2. Levels of Measurement given data;
3. compute and interpret measures of central
3. Measures of Central tendency tendency, and position for ungrouped data;
4. use appropriate statistical measures in
4. Measures of Position
analyzing and interpreting statistical data;
5. Exploratory Data Analysis 5. identify the different parts of a boxplot;
6. Construct and interpret a boxplot.

DEFINITION OF STATISTICS
 In its plural sense, statistics refers to a set of numerical data. These are numerical facts
and figures collected in a systematic manner with a definite purpose in any field of study.
In this sense, statistics are also aggregates of facts which are expressed in numerical
form. (e.g., vital statistics in a beauty contest, monthly sales of a company, daily peso-
dollar exchange rate)

 In its singular sense, Statistics is the branch of science which deals with the collection,
presentation, analysis, and interpretation of data.

FIELDS OF STATISTICS
 Descriptive Statistics
- consist of methods concerned with the collection, organization, summarization and
presentation of a set of data without drawing conclusions or inferences about a larger
set
- in descriptive statistics, the statistician only tries to describe a situation or the sample
- conclusions apply only to the data on hand
 Inferential Statistics
- Consists of generalizing from samples to populations, performing estimations and
hypothesis tests and determining relationships among variables,
- the main concern is not only to describe but to actually predict and make inferences
based on the information gathered
- conclusions are applicable to a larger set of data which the data on hand is only a
subset

Hence, it is important to differentiate population and sample:

A population consists of all elements A sample is a group of subjects


La Sallebeing
Green Hills
considered in a statistical study. selected from and
Statistics a population.
Probability Learning
Module
Senior High School Department Alvarez | Estonilo |
Garcia
VARIABLES AND TYPES OF DATA

Variable – a characteristic or attribute which can assume different values or labels.

Measurement – the process of determining the value or label of a particular variable.

Data – the values (or observations) that the variables can assume.

CLASSIFICATION OF VARIABLES
 Qualitative variables – variables that can be placed into distinct categories, according to
some characteristics or attribute.

Example:
marital status (single, married, widowed), gender (male or female), religion, occupation

 Quantitative variables – variable that takes on numerical values representing an


amount or quantity

Example: age, weight, height, body temperature

Quantitative variables can be further classified into discrete and continuous.

 Discrete variables – assume values that can be counted, usually measured by counting
or numeration.

Example: number of students in a classroom, number of subjects taken on a certain day,


number of family members

 Continuous variables – can assume infinite number of values between any two specific
values, usually obtained by measuring and often include fractions and decimals.

Example: weight, temperature, amount of liquid in a container

LEVELS OF MEASUREMENT
1. Nominal Level - The nominal level is the weakest level of measurement. Numbers or
symbols are used simply for categorizing subjects into different groups.

Examples: Jersey number, student ID number, gender, race, marital status

2. Ordinal Level - contains the properties of the nominal level. In addition, it classifies data
into categories that can be ranked. However, precise difference between the ranks do not
exist.

Examples: position in the company, socio-economic status, difficulty of an exam, rank in


a contest

La Salle Green Hills Statistics and Probability Learning


Module
Senior High School Department Alvarez | Estonilo |
Garcia
3. Interval Level - has the properties of the nominal and ordinal levels and precise
difference between units of measure do exist; however, there is no meaningful zero or
true zero point.
But what is “true zero point”?

 A true zero point means that when a variable has a value of 0, that means the variable
does not exist.

 In the case of interval level, there’s no true zero point because getting a value of 0
does not mean “nothingness”.

Example: temperature and IQ score

4. Ratio Level – this is the highest level of measurement. It possesses all the properties of
interval level, plus there exists a true zero.

Example: distance, salary, age

Summary:
Numerical Data

Qualitative Quantitative

Qualitative Qualitative Qualitative Qualitative

cannot be classified as Discrete or Continuous


discrete or continuous
PRACTICE EXERCISE: Complete the table below by classifying each data according to their
respective categories indicated in each column.

Quantitative or Discrete or Level of


Variable
Qualitative? Continuous? Measurement?
1. Monthly sales in a school canteen
2. Zip code
3. Scores in statistics quiz
4. Socio-economic status
5. Names of fast food chains in the
Philippines
6. Number of class periods per day
7. Time required solving a math
problem
8. Telephone and internet bills
9. Hourly Celsius temperature report

La Salle Green Hills Statistics and Probability Learning


Module
Senior High School Department Alvarez | Estonilo |
Garcia
in Metro Manila
10.Number of sets of uniform owned

MEASURES OF CENTRAL TENDENCY


 It provides a very convenient way of describing a set of scores with a single number that
describes the PERFORMANCE of the group.
 It is the single value that is used to describe the “center” of the data.

1. MEAN - the sum of the values, divided by the total number of values. This is also known
as arithmetic average.

X 1+ X 2 + X 3 +…+ X n ∑ X
Sample mean: X = =
n n

Where:
X - sample mean
n – sample size (number of values in the sample)

X 1+ X 2+ X 3 +…+ X n ∑ X
Population mean: μ= =
N N

Where:
μ– population mean
N – population size (number of values in the population)

2. MEDIAN
- middlemost value in an ordered array of data (midpoint)
- unaffected by extreme values/outliers*
- measure of position

*Outlier - an extremely high or an extremely low data value when compared to the rest of
data values.
Example: in a 100-item quiz, most of the students got a score of 80, 85, 83, 90 and 87;
however, there are few students who got a score of 5 and 3.

How to compute for the median?


A. Arrange the values from least to greatest.
B. Find the middlemost value (if there are even number of elements in the set, then
get the average of the two middle most values)

3. MODE
- value that occurs most often or the value that has the highest frequency.
- lowest level of central tendency

La Salle Green Hills Statistics and Probability Learning


Module
Senior High School Department Alvarez | Estonilo |
Garcia
Guide in choosing the most appropriate measure of central tendency

PRACTICE EXERCISE: Solve for what is being asked in each problem.

1. Thirty people were asked the question, “How many people do you consider your best
friend?”. The graph below shows the result of the survey.

a. What variable was used in the problem?


b. What is the level of measurement for this variable?
c. What is the best measure of central tendency to use?

2. The mean age of 10 full time guidance counselors is 35 years old. Two new full time
guidance counselors, aged 28 and 30, are hired. Five years from now, what would be the
average age of these twelve guidance counselors?

La Salle Green Hills Statistics and Probability Learning


Module
Senior High School Department Alvarez | Estonilo |
Garcia
3. Houses in a certain area in a big city have a mean price of PhP4,000,000 but a median
price is only PhP2,500,000. How might you explain this best?

4. For the Senior High School dance competition, there is a debate going on among students
regarding the team/school color that will be featured prominently. Votes were sent by
students via SMS, and the results are as follows:

a. What variable was used in the given problem?


b. What kind of variable is this and what is its level of measurement?
c. Is it possible to compute for the mean, median and mode? If yes, give the values.
d. Which measure of central tendency is best to use in this problem?

MEASURES OF POSITION
 These are used to locate the relative position of a data value in a data set.

A. PERCENTILE
- A measure that pinpoints a location that divides distribution into 100 equal parts.
- Denoted by P1 , P2 , P3 ,… , P99

Examples:
a. P2 is the 2nd percentile. This the value for which at least 2% of the observations are less
than or equal to it. At the same time, 98% of the observations are greater than or equal
to it.
b. If you are in the 88th percentile in your Stat exam, it means that 12% of your
classmates scored higher than you while 88% of your classmates scored lower than
you.

PERCENTILE FORMULA

( numberof values below X ) +0.5


Percentile= ∙100
total number of values

Example 1:
A teacher gives a 20-point test to 10 students. The scores are shown below. Find the
percentile rank of a score of 12.

18, 15, 12, 6, 8, 2, 3, 5, 20, 10

Example 1 – Solution

1. Arrange the data from lowest to highest

2, 3, 5, 6, 8, 10, 12, 15, 18, 20


La Salle Green Hills Statistics and Probability Learning
Module
Senior High School Department Alvarez | Estonilo |
Garcia
2. Use the percentile formula
6+0.5
Percentile= ∙ 100
10
¿ 65 (65th percentile)

3. Thus, a student whose score was 12 did better than 65% of the class.
Example 2:
Using the data in example 1, find the percentile rank for a score of 6

Answer: 35th percentile

FINDING A VALUE CORRESPONDING TO A GIVEN PERCENTILE

n∙ p
c=
100

Where: n = total number of values


p = percentile

Steps:
1. Arrange the data in order from highest to lowest.
n∙ p
2. Substitute the values in the formula: c=
100
3. If c is not a whole number, round UP to the next whole number. Starting at the lowest
value, count over to the number that corresponds to the rounded-up value.
4. If c is a whole number, use the value halfway between the c th and the (c+1)th values
when counting up from the lowest value.

Example 3:
Using the same set of scores in example 1, find the value corresponding to (a)25 th
percentile (b) 60th percentile.

Example 3 – Solution (part a)

1. Arrange the data from lowest to highest

2, 3, 5, 6, 8, 10, 12, 15, 18, 20

2. Substitute the values in the formula:


10 ∙25
c= =2.5
100

3. Since the value of c is not a whole number, round it up to the next whole number, which
is 3.
4. Start at the lowest value and count over to the third value, which is 5.
5. Therefore, the value 5 corresponds to the 25 th percentile.

Example 3 – Solution (part b)

La Salle Green Hills Statistics and Probability Learning


Module
Senior High School Department Alvarez | Estonilo |
Garcia
1. Arrange the data from lowest to highest

2, 3, 5, 6, 8, 10, 12, 15, 18, 20

2. Substitute the values in the formula:

10 ∙60
c= =6
100

3. Since the value of c is a whole number, use the value halfway between c and c+1. In
this case the 6th and the 7th value.

4. 6th value = 10 ; 7th value = 12 (get the average)

10+12
=11
2

5. Therefore, the value 11 corresponds to the 60 th percentile.

B. QUARTILE
- Quartiles divide the distribution into four equal parts
- separated by Q 1 ,Q 2 , Q 3

Q1 corresponds to the 25th percentile


Q2 corresponds to the 50th percentile (median)
Q3 corresponds to the 75th percentile

FINDING DATA VALUES CORRESPONDING TO Q1, Q2 AND Q3

Steps:
1. Arrange the data in order from highest to lowest.
2. Find the median of the data values. This is the value for Q 2
3. Find the median of the data values that fall below Q 2. This is the value for Q1.
4. Find the median of the data values that fall above Q 2. This is the value for Q3.

Example 4
Find Q1, Q2 and Q3 for the data set

15, 13, 6, 5, 12, 5, 22, 18

Example 4 – Solution
1. Arrange the data values from lowest to highest
5, 6, 12, 13, 15, 18, 22, 50

2. Find the median (Q2)

La Salle Green Hills Statistics and Probability Learning


Module
Senior High School Department Alvarez | Estonilo |
Garcia
13+ 15
MD= =14
2

3. Find the median of the values less than 14.

5, 6, 12, 13

6 +12
MD= =9(this is Q1 )
2

4. Find the median of the values greater than 14.

15, 18, 22, 50

18+ 22
MD= =20(this is Q3)
2

Hence, Q 1=9 ,Q 2=14 ,Q 3=20

C. DECILE
- Deciles divide the distribution into ten equal groups
- They are denoted by D 1 , D 2 , D3 ,… , D 9

EXPLORATORY DATA ANALYSIS


- The measure of central tendency used in Exploratory Data Analysis (EDA) is the
median.
- The measure of variation used is interquartile range (Q3-Q1).
- The boxplot (sometimes called a box-and-whisker plot) is used to graphically represent
the data.
- Purpose: to examine the data to find out what information can be discovered about the
data such as the center and the spread of the distribution.

Five-number summary of the data set:

A boxplot can be used to graphically represent the data set. These plots involve five
specific values:

1. The lowest value of the data set (minimum)


2. Q1 (1st Quartile)
3. Median/Q2 (2nd Quartile)
4. Q3(3rd Quartile)
5. The highest value of the data set (maximum)

Box-and-whisker plot

A boxplot is a graph of a data set obtained by drawing a horizontal line from the minimum
data value to Q1, drawing a horizontal line from Q3 to the maximum data value, and
La Salle Green Hills Statistics and Probability Learning
Module
Senior High School Department Alvarez | Estonilo |
Garcia
drawing a box whose vertical sides pass through Q 1 and Q3 with a vertical line inside the
box passing through the median or Q 2.

Parts of a box-and-whisker plot

Minimum Maximum
1st Quartile 2nd Quartile / 3rd Quartile
Value Value
Median

Positively Skewed vs. Negatively Skewed Distribution

Positively Skewed Distribution


A distribution is positively skewed, sometimes called
right-skewed, if most of the scores/values fall toward the
lower side of the scale, and there are very few high
scores.

An example of this kind of distribution is the one on the


right. Most of the U.S. Household income are on the
lower side of the scale.

Negatively Skewed Distribution


A distribution is negatively skewed, sometimes
called left-skewed, if most of the scores/values fall
toward the higher side of the scale, and there are
very few low scores.

An example of this kind of distribution is the one


on the right which shows the London 2012 Men’s
Long Jump Qualifying Round Results.

La Salle Green Hills Statistics and Probability Learning


Module
Senior High School Department Alvarez | Estonilo |
Garcia
Determining the Skewness of the Distribution Using
the Median

1. If the median is near the center of the box, the distribution


is approximately symmetric.

2. If the median falls to the left of the center of the box, the
distribution is positively skewed.

3. If the median falls to the right of the center of the box, the
distribution is negatively skewed.

Procedures in constructing a boxplot

1. Find the five-number summary for the data values, that is the maximum and minimum
data values, Q1 and Q3, and the median.
2. Draw a horizontal axis with a scale such that it includes the maximum and minimum
data values.
3. Draw a box whose vertical sides go through Q 1 and Q3, and draw a vertical line through
the median.
4. Draw a line from the minimum data value to the left side of the box and a line from the
maximum data value to the right of the box.

Practice Exercise: Draw and make an interpretation on the boxplot that you can make
on each problem.

1. Number of Meteorites Found: The number of meteorites found in 10 states of


the United States is 89, 47, 164, 296, 30, 215, 138, 78, 48, 39. Construct a boxplot
for the data.
a. What is the interquartile range (IQR) of the given problem?
b. Based on the boxplot, what can we say about the distribution of meteorites in
the U.S?

2. A dietitian is interested in comparing the sodium content of real cheese with the
sodium content of a cheese substitute. The data for two random samples are
shown. Compare the distributions, using boxplots.

Real cheese Cheese Substitute


310 420 45 40 270 180 250 290
220 240 180 90 130 260 340 310

La Salle Green Hills Statistics and Probability Learning


Module
Senior High School Department Alvarez | Estonilo |
Garcia
La Salle Green Hills Statistics and Probability Learning
Module
Senior High School Department Alvarez | Estonilo |
Garcia
https://www.emathzone.com/tutorials/basic-statistics/meanings-of-statistics.html

La Salle Green Hills Statistics and Probability Learning


Module
Senior High School Department Alvarez | Estonilo |
Garcia

You might also like