Statistical Analysis
Statistical Analysis
INTRODUCTION TO THE
Statistics plays a major role in many aspects of our
lives. It is used in sports, for example, to help a
general manager decide which player might be the
STATISTICAL best fit for a team. It is used in politics to help
candidates understand how the public feels about
• Determine the level of Let’s break this definition into four parts. The first
measurement of a variable. part states that statistics involves the collection of
information. The second refers to the organization
and summarization of information. The third
states that the information is analyzed to draw
conclusions or answer specific questions. The
fourth part states that results should be reported
using some measure that represents how
convinced we are that our conclusions reflect
reality.
Suppose you wanted to use this scenario as a account for the variability in our results. One
gauge of the morality of students at your goal of inferential statistics is to use statistics
school by determining the percent of students to estimate parameters.
who would return the money. How might you
do this? You could attempt to present the PROCESS OF STATISTICS
scenario to every student at the school, but
1. Identify the research objective.
this would be difficult or impossible if the
student body is large. A second possibility is to A researcher must determine the question(s)
present the scenario to 50 students and use he or she wants answered. The question(s)
the results to make a statement about all the must clearly identify the population that is to be
students at the school. studied. Identify the research objective.
In the PHP100 study presented, the population 2. Collect the information needed to answer
is all the students at the school. Each student the questions.
is an individual. The sample is the 50 students
selected to participate in the study. Conducting research on an entire population is
often difficult and expensive, so we typically
Suppose 39 of the 50 students stated that they look at a sample. This step is vital to the
would return the money to the owner. We could statistical process, because if the data are not
present this result by saying that the percent of collected correctly, the conclusions drawn are
students in the survey who would return the meaningless. Do not overlook the importance
money to the owner is 78%. This is an of appropriate data collection.
example of a descriptive statistic because it
describes the results of the sample without Example:
making any general conclusions about the
population. So 78% is a statistic because it is a A research objective is presented. For each
numerical summary based on a sample. research objective, identify the population and
Descriptive statistics make it easier to get an sample in the study.
overview of what the data are telling us.
1. The Philippine Mental Health Associations
If we extend the results of our sample to the contacts 1,028 teenagers who are 13 to 17
population, we are performing inferential years of age and live in Antipolo City and
statistics. The generalization contains asked whether or not they had been
uncertainty because a sample cannot tell us prescribed medications for any mental
everything about a population. Therefore, disorders, such as depression or anxiety.
inferential statistics includes a level of
confidence in the results. So rather than saying Population: Teenagers 13 to 17 years of age
that 78% of all students would return the who live in Antipolo City
money, we might say that we are 95%
confident that between 74% and 82% of all Sample: 1,028 teenagers 13 to 17 years of
students would return the money. Notice how age who live in Antipolo City
this inferential statement includes a level of
confidence (measure of reliability) in our
results. It also includes a range of values to
Sample: 100 selected soybean crop 4. A shipping company wishes to estimate the
number of passengers traveling via their
3. Organize and summarize the information. ships next year using their data on the
number of passengers in the past three
Descriptive statistics allow the researcher to
years. (Inferential Statistics)
obtain an overview of the data and can help
determine the type of statistical methods the 5. A politician wants to determine the total
researcher should use. number of votes his rival obtained in the
past election based on his copies of the
4. Draw conclusion from the information.
tally sheet of electoral returns.
In this step the information collected from the (Descriptive Statistics)
sample is generalized to the population.
DISTINCTION BETWEEN QUALITATIVE AND
Inferential statistics uses methods that takes
QUANTITATIVE VARIABLES
results obtained from a sample, extends them
to the population, and measures the reliability Variables are the characteristics of the
of the result. individuals within the population. For example,
recently my mother and I planted a tomato
Take Note!
plant in our backyard. We collected information
If the entire population is studied, then about the tomatoes harvested from the plant.
inferential statistics is not necessary, because The individuals we studied were the tomatoes.
descriptive statistics will provide all the The variable that interested us was the weight
information that we need regarding the of a tomato.My mom noted that the tomatoes
population. had different weights even though they came
from the same plant. She discovered that
Example: variables such as weight may vary.
For the following statements, decide whether it If variables did not vary, they would be
belongs to the field of descriptive statistics or constants, and statistical inference would
inferential statistics. not be necessary. Think about it this way: If
each tomato had the same weight, then
1. A badminton player wants to know his knowing the weight of one tomato would allow
average score for the past 10 games. us to determine the weights of all tomatoes.
(Descriptive Statistics) However, the weights of the tomatoes vary.
One goal of research is to learn the causes of
2. A car manufacturer wishes to estimate the
the variability so that we can learn to grow
average lifetime of batteries by testing a
plants that yield the best tomatoes.
It is helpful to divide variables into different possible values. If you count to get the
types, as different statistical methods are value of a quantitative variable, it is
applicable to each. The main division is into discrete.
qualitative (or categorical) or quantitative (or
numerical variables). 2. A continuous variable is a quantitative
variable that has an infinite number of
Variables can be classified into two groups: possible values that are not countable. If
you measure to get the value of a
1. Qualitative variables (Categorical) is quantitative variable, it is continuous.
variable that yields categorical responses.
It is a word or a code that represents a Example:
class or category.
Determine whether the following quantitative
2. Quantitative variables (Numeric) takes variables are discrete or continuous.
on numerical values representing an
amount or quantity. 1. The number of heads obtained after
flipping a coin five times. (Discrete)
Example:
2. The number of cars that arrive at a
Determine whether the following variables are McDonald’s drive-through between 12:00
qualitative or quantitative. P.M and 1:00 P.M. (Discrete)
It is important to know which type of scale is 3. Interval Level - This is a measurement level
represented by your data since different not only classifies and orders the
statistics are appropriate for different scales of measurements, but it also specifies that the
measurement. A characteristic may be distances between each interval on the scale
measured using nominal, ordinal, interval and are equivalent along the scale from low interval
ration scales. to high interval. A value of zero does not mean
the absence of the quantity. Arithmetic
1. Nominal Level - They are sometimes operations such as addition and subtraction
called categorical scales or categorical can be performed on values of the variable.
data. Such a scale classifies persons or
objects into two or more categories. Example:
Whatever the basis for classification, a
person can only be in one category, and - Te m p e r a t u r e o n F a h r e n h e i t / C e l s i u s
Thermometer
members of a given category have a
common set of characteristics. - Trait anxiety (e.g., high anxious vs. low
anxious)
Example:
- IQ (e.g., high IQ vs. average IQ vs. low IQ)
- Method of payment (cash, check, debit card,
credit card) 4. Ratio Level - A ratio scale represents the
highest, most precise, level of measurement. It
- Type of school (public vs. private) has the properties of the interval level of
- Eye Color (Blue, Green, Brown) measurement and the ratios of the values of
the variable have meaning. A value of zero
2. Ordinal Level - This involves data that may means the absence of the quantity. Arithmetic
be arranged in some order, but differences operations such as multiplication and division
between data values either cannot be can be performed on the values of the
determined or meaningless. An ordinal scale variable.
not only classifies subjects but also ranks them
in terms of the degree to which they possess a Example:
characteristics of interest. In other words, an
- Height and weight
ordinal scale puts the subjects in order from
highest to lowest, from most to least. Although - Time
ordinal scales indicate that some subjects are
higher, or lower than others, they do not
- Time until death
indicate how much higher or how much better. Operations that make sense for variables of
different scales.
Example:
- Food Preferences
- Stage of Disease
- Social Economic Class (First, Middle, Lower)
- Severity of Pain
Data collection is the process of gathering 3. Determine the method to be used in data
and measuring information on variables of gathering and define the comprehensive
interest, in an established systematic fashion data collection points.
that enables one to answer stated research
questions, test hypotheses, and evaluate 4. Design data gathering forms to be used.
outcomes.
5. Collect data.
Without proper planning for data collection, a
Choosing of Method of Data Collection
number of problems can occur. If the data
collection steps and processes are not Decision-makers need information that is
properly planned, the research project can relevant, timely, accurate and usable. The cost
ultimately end up with a data set that does not of obtaining, processing and analyzing these
serve the purpose for which it was intended. data is high. The challenge is to find ways,
For example, if more than one person is which lead to information that is cost-effective,
involved in the data collection, but data relevant, timely and important for immediate
collectors do not follow consistent data use. Some methods pay attention to timeliness
collection practices, they can end up with data and reduction in cost. Others pay attention to
with different units, collection processes, and accuracy and the strength of the method in
variable names. using scientific.
Consequences from Improperly Collected The statistical data may be classified under
Data two categories, depending upon the sources.
approaches: Primary Data and Secondary
• Inability to answer research questions
Data.
accurately.
SOURCES OF DATA
• Inability to repeat and validate the study.
Whether conducting research in the social
• Distorted findings resulting in wasted
sciences, humanities arts, or natural sciences,
resources.
the ability to distinguish between primary and
• Misleading other researchers to pursue secondary sources is essential.
fruitless avenues of investigation.
Primary Sources - Provide a first-hand
• Compromising decisions for public policy. account of an event or time period and are
considered to be authoritative. They
• Causing harm to human participants and represent original thinking, reports on
animal subjects. discoveries or events, or they can share new
information. Often these sources are created
Steps in Data Gathering at the time the events occurred but they can
also include sources that are created later.
1. Set the objectives for collecting data
They are usually the first formal appearance
2. Determine the data needed based on the of original research.
set objectives.
Primary Data - are data documented by the agency may have been different from the
primary source. The data collectors purpose of the user of these secondary data.
documented the data themselves. Secondly, there may have been bias
introduced, the size of the sample may have
The first hand information obtained by the been inadequate, or there may have been
investigator is more reliable and accurate since arithmetic or definition errors, hence, it is
the investigator can extract the correct necessary to critically investigate the validity of
information by removing doubts, if any, in the the secondary data.
minds of the respondents regarding certain
questions. High response rates might be The primary data can be collected by the
obtained since the answers to various following five methods:
questions are obtained on the spot. It permits
1. Direct personal interviews - The
explanation of questions concerning difficult
researcher has direct contact with the
subject matter.
interviewee. The researcher gathers
Secondary Sources - offer an analysis, information by asking questions to the
interpretation or a restatement of primary interviewee.
sources and are considered to be
2. Indirect/Questionnaire Method - This
persuasive. They often involve
methods of data collection involve sourcing
generalisation, synthesis, interpretation,
and accessing existing data that were
commentary or evaluation in an attempt to
originally collected for the purpose of the study.
convince the reader of the creator's
argument. They often attempt to describe or Designing good “questioning tools” forms an
explain primary sources. important and time consuming phase in the
development of most research proposals.
Secondary Data - are data documented by a
Once the decision has been made to use
secondary source. The data collectors had the
these techniques, the following questions
data documented by other sources.
should be considered before designing our
In secondary data, data are primary data for tools:
the agency that collected them, and become
secondary for someone else who uses these
• What exactly do we want to know, according
to the objectives and variables we identified
data for his own purposes.
earlier? Is questioning the right technique to
Secondary data are less expensive to collect obtain all answers, or do we need additional
both in money and time. These data can also techniques, such as observations or
be better utilized and sometimes the quality of analysis of records?
such data may be better because these might
have been collected by persons who were
• Of whom will we ask questions and what
techniques will we use? Do we understand
specially trained for that purpose.
the topic sufficiently to design a
On the other hand, such data must be used questionnaire, or do we need some loosely
with great care, because such data may also structured interviews with key informants or
be full of errors due to the fact that the purpose a focus group discussion first to orient
of the collection of the data by the primary ourselves?
7. Write special instructions for interviewers or Question wording and question order have a
respondents. large effect on the responses obtained.
9. Always test your questions before taking the Two surveys were taken in late 1993/early
survey. (Pre-test) 1994 about Elvis Presley.
An open-ended question is a type of question One survey asked: “In the past few years,
that does not include response categories. The there have been a lot of rumors and stories
respondent is not given any possible answers about whether Elvis Presley is really dead.
to choose from. This type of question is usually How do you feel about this? Do you think there
appropriate for collecting subjective data. It is any possibility that these rumors are true
permit free responses that should be recorded and that Elvis Presley is still alive, or don’t you
in the respondent’s own words. think so?”
It gives relatively more accurate data on size can produce accuracy of results.
behavior and activities but Investigators or Moreover, the results from the small sample
observer’s own biases, prejudice, desires, and size will be questionable. A sample size that is
etc. and needs more resources and skilled too large will result in wasting money and time
human power during the use of high level because enough sample will normally give an
machines. accurate result.
The secondary data can be collected by the The sample size is typically denoted by n and
following five methods: it is always a positive integer. No exact sample
size can be mentioned here and it can vary in
1. Published report on newspaper and different research settings. However, all else
periodicals. being equal, large sized sample leads to
increased precision in estimates of various
2. Financial Data reported in annual reports.
properties of the population.
3. Records maintained by the institution.
Take Note!
4. Internal reports of the government
- Representativeness, not size, is the more
departments.
important consideration.
5. Information from official publications.
- Use no less than 30 subjects if possible.
Take Note!
- If you use complex statistics, you may need
• Always investigate the validity and reliability a minimum of 100 or more in your sample
of the data by examining the collection (varies with method).
method employed by your source.
SAMPLE SIZE
Desired Confidence
Z - Score
Level
80% 1.28
85% 1.44
90% 1.65
95% 1.96
99% 2.58
3. Degree of Variability
( e )
Three criteria need to be specified to
Zσ
determine the appropriate sample size: n≥
1. Level of Precision
where:
Also called sampling error, the level of
precision, is the range in which the true value Z is the z-score corresponding to level of
of the population is estimated to be. confidence.
• Estimating Proportion (Infinite The conservative formula using the strong law
Population) of large number.
The sample size required to obtain a 2
1 Z
4 (e)
confidence interval for p with specified margin n≥ ≈ 385
of error e is given by
2 Where:
(e)
Z
n≥ p(1 − p)
Confidence level is 95%.
N
n≥
1 + Ne 2
Where:
Example:
The researcher need to survey 286 BS stat - Important that the individuals included in a
students. sample represent a cross section of
individuals in the population.
• Finite Population Correction
- If sample is not representative it is biased.
If the population is small then the sample size You cannot generalize to the population from
can be reduced slightly your statistical data.
n0
n≥
n −1
Some definitions are needed to make the
1+ o notion of a good sample more precise.
N
- They require the use of a complete listing of - Most basic method of drawing a probability
the elements of the universe called the sample.
sampling frame.
- Assigns equal probabilities of selection to
- The probabilities of selection are known. each possible sample.
Sampling Procedure
N PopulationSize
k= =
n SampleSize
Solution:
Given:
50
(N) ( 500 )
n
n1 = N1 = 200 = 20
50
(N) ( 500 )
n
n2 = N2 = 300 = 30
Example:
Disadvantage: In actual field applications, 1. Organize the sampling process into stages
adjacent households tend to have more similar where the unit of analysis is systematically
characteristics than households distantly apart. grouped.
Example:
Used probability sampling if the main objective • Purposive Sampling - It is based on certain
of the sample survey is making inferences criteria laid down by the researcher. People
about the characteristics of the population who satisfy the criteria are interviewed. It is
under study. used to determine the target population of
those who will be taken for the study.
1. that the population standard deviation is σ completed and returned at the end of the
= 1.3 and precision level is 0.05. program.
Population
School Sample
per School
Antipolo National
3,360
High School
Bagong Nayon
National 2,540
High School
Dela Paz National
2,122
High School
Sta. Cruz National
1,290
High School
Tubigan National
1,367
High School
Total 10,679
REFERENCES:
https://data36.com/statistical-bias-types-
explained/
Data Presentation
Data are usually collected in a raw format and thus
the inherent information is difficult to understand.
Therefore, raw data need to be summarized,
processed, and analyzed to usefully derive
information from them. However, no matter how well
manipulated, the information derived from the raw
data should be presented in an effective format,
otherwise, it would be a great loss for both authors
and readers. Planning how the data will be presented
is essential before appropriately processing raw data.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Presentation of Data
Presentation of data refers to an exhibition
or putting up data in an attractive and useful
manner such that it can be easily interpreted.
The three main forms of presentation of data
are:
Textual Presentation
Tabular Presentation
Graphical Presentation
Polytechnic University of the Philippines
College of Science
Downloaded by isu director for publication ([email protected])
Department of Mathematics and Statistics
lOMoARcPSD|52393784
Textual Presentation
• All the data is presented in the form of text,
phrases, or paragraphs.
• It involves enumerating important
characteristics, emphasizing significant figures
and identifying important features of data.
• Text is the principal method for explaining
findings, outlining trends, and providing
contextual information.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Example:
A researcher is asked to present the performance of a section in
the statistics test. The following are the test scores:
34 42 20 50 17 9 34 43
50 18 35 43 50 23 23 35
37 38 38 39 39 38 38 39
24 29 25 26 28 27 44 44
49 48 46 45 45 46 45 46
The data presented in textual form would be like this:
In the statistics class of 40 students, 3 obtained the perfect
score of 50. Sixteen students got a score 40 and above,
while only 3 got 19 and below. Generally, the students
performed well in the test with 23 or 70% getting a passing
score of 38 and above.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Remember!
✦ Keep your paragraphs simple and short.
Tabular Presentation:
• It is a systematic and logical arrangement of
data in the form of Rows and Columns with
respect to the characteristics of data.
• A table is best suited for representing individual
information and represents both quantitative
and qualitative information.
Advantage of Tabular
Presentation
✦ More information may be presented.
✦ Exact values can be read from a table to
retain precision.
✦ Flexibility is maintained without
distortion of data.
✦ Less work and less cost are required in
the preparation.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Preparing Tables
The making of a compact table itself is an art. This should
contain all the information needed within the smallest possible
space. What the purpose of tabulation is and how the tabulated
information is to be used are the main points to be kept in mind
while preparing for a statistical table. An ideal table should
consist of the following main parts:.
A. Title: The title must tell as simply as possible what is in the
table. It should answer the questions:
✦ Who? White females with breast cancer, black males with
lung cancer.
✦ What are the data? Counts, percentage distributions, rates.
https://byjus.com/commerce/tabular-presentation-of-data/
Solution:
To answer this question we need to construct a frequency
distribution to determine how many female and male
respondents participated in the study.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Procedure in Constructing
Frequency Table
✦ If the data is in the form of qualitative data
To construct the frequency distribution using
excel use the command:
=frequency(data_array,bins_array)
Then Ctrl → Shift → Enter
{=frequency(data_array,bins_array)}
Polytechnic University of the Philippines
College of Science
Downloaded by isu director for publication ([email protected])
Department of Mathematics and Statistics
lOMoARcPSD|52393784
Final Output
Procedure in Constructing
Frequency Table
✦If the data is in the form of quantitative data
Steps
1. Set an interval or range for your data. It is
needed for the “BIN RANGE”.
2. Click “DATA” on the menu bar and Click
“DATA ANALYSIS” on the tool bar
3. The dialog box “DATA ANALYSIS” will appear
and choose “HISTOGRAM” on the dialog box
then click OK.
Polytechnic University of the Philippines
College of Science
Downloaded by isu director for publication ([email protected])
Department of Mathematics and Statistics
lOMoARcPSD|52393784
Procedure in Constructing
Frequency Table
✦If the data is in the form of quantitative data
Steps
4. Highlight your data for the “INPUT RANGE”.
5. Highlight your data for the “BIN RANGE”.
6. Click the box of “LABELS IN FIRST ROW”
then click “OK”.
7. The result will appear on the new worksheet of
the excel file. Get the Percentage and total.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Final Output
Answer:
✦ Useless Information – Don’t show decimals if they are not
needed.
✦ Poor Alignment – Make sure alignment makes sense.
• Don’t center numbers, always right justify – try to align
decimal points.
• Consider the appropriate placement of row titles.
✦ Difficult to Read – Use commas used when the number exceeds
a thousand.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Graphical Presentation
✦ A graph is a very effective visual tool as it displays data at
a glance, facilitates comparison, and can reveal trends and
relationships within the data such as changes over time,
and correlation or relative share of a whole.
✦ It is considered an important medium of communication
because we are able to create a pictorial representation of
the numerical figures.
✦ Suited when we need to show the results of the study to
nonprofessionals and or people who dislike numbers and too
lengthy texts.
Polytechnic University of the Philippines
College of Science
Downloaded by isu director for publication ([email protected])
Department of Mathematics and Statistics
lOMoARcPSD|52393784
Bar Graph
✦ It is constructed by labeling each category
of data on either the horizontal or vertical
axis and the frequency or relative frequency
of the category on the other axis. Rectangles
of equal width are drawn for each category.
The height of each rectangle represents the
category’s frequency or relative frequency.
✦ It is use to organize discrete data.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Remember!
• Bar graphs may also be drawn with horizontal
bars. Horizontal bars are preferable when
category names are lengthy.
• In bar graphs, the order of the categories does
not usually matter. However, bar graphs that
have categories arranged in decreasing order
of frequency help prioritize categories for
decision-making purposes in areas such as
quality control, human resources, and
marketing.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Histogram
✦ It is constructed by drawing rectangles for each class of
data. The height of each rectangle is the frequency or
relative frequency of the class. The width of each rectangle
is the same and the rectangles touch each other.
✦ It is a graph used to present quantitative data, is similar to
the bar graph.
✦ It is use to organize continuous data.
Pie Chart
✦ It is a circle divided into sectors. Each sector represents a
category of data.The area of each sector is proportional to
the frequency of the category.
✦ Pie charts are typically used to present the relative
frequency of qualitative data. Inmost cases the data are
nominal, but ordinal data can also be displayed in a pie
chart.
Line Graph
✦ A graph that shows information that is
connected in some way (such as change over
time)
✦ Line segments are then drawn connecting the
points. It is use to organize continuous data.
✦ Very useful in identifying trends in the data
over time.
✦ It is rigidly defined.
where: where:
∑i=1 xi ∑i=1 fxi
xi = data values n xi = data values r
n = no. of
x̄ = f = frequency x̄ =
sample n n = no. of n
observations sample
observations
Population Mean
where:
∑i=1 xi xi = data values ∑i=1 fxi
N where: r
xi = data values
N = no. of μ= f = frequency
μ=
observations N N
N = no. of
observations
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
(2 )
1. Arrange the data from n
− < cf i
lowest to highest (or highest
x̃ = LB +
to lowest). f
where:
2. For an odd number of LB = lower boundary of the
data, the median of a data median class
set is the “middle i = class width
observation”. When the n = no. of observations
number of data is even, the < cf = less than the cumulative
median is the “average of frequency of the class
the two middle scores”. preceding the median class
f = frequency of the median
Polytechnic University of the Philippines
class
College of Science
Downloaded by isu director for publication ([email protected])
Department of Mathematics and Statistics
lOMoARcPSD|52393784
( d1 + d2 )
d1
1.Obtain a frequency x ̂ = LB + i
distribution of the distinct
values of the data. where:
LB = lower boundary of the
2.The mode is the most modal class
i = class width
frequently occurring data
d1 = difference between the
(if there is one).
frequency of the modal class
and the class preceding it
d2 = difference between the
frequency of the modal class
and the class following it
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Remember!
• Whenever you hear the word average, be aware that
the word may not always be referring to the mean.
One average could be used to support one position,
while another average could be used to support a
different position.
• Mode is not always present in the data sets unlike
mean and median.
Notice how the mean of the second data set has been
influenced by the presence of an unusual case/outlier in the
data set. If we were to say the mean is equal to 132.5 for the
second data set and it represents a typical case, this will not
make much sense because the majority of data values are less
than 120. Therefore, the mean should not be used when
unusual, or outlying, data values are present in the data set, as
the mean tends to be extremely sensitive to the unusual
values. Rather, the median should be reported in this case.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Example:
The data given below is the age of the residents in
Barangay 634, Sta. Mesa, Manila. Compute mean,
median and mode.
Class Interval Frequency
55 - 59 55
50 - 54 23
45 - 49 37
40 - 44 37
35 - 39 48
30 - 34 42
25 - 29 27
Solution:
To compute mean of grouped data, first you need to
fill out this table.
Class Frequency
x fx
Interval (f)
55 - 59 3
It is the midpoint of
50 - 54 6 every class interval.
45 - 49 7
To compute this:
LC + UP
40 - 44 9
x=
35 - 39 6
30 - 34 4
2
25 - 29 5 Ex:
7 55 + 59
fxi = x= = 57
Total n=
∑ 2
50 + 54
i=1
x= = 52
Polytechnic University of the Philippines
2
College of Science
Department of Mathematics and Statistics
Solution:
7
∑i=1 fxi
Frequency
Class Interval x fx
x̄ =
(f)
55 - 59 3 57 171
50 - 54 6 52 312 n
1,675
45 - 49 7 47 329
=
40 - 44 9 42 378
40
35 - 39 6 37 222
30 - 34 4 32 128
= 41.88
25 - 29 5 27 135
7
fxi = 1,675
Total n = 40 ∑
i=1
Solution:
To compute median and mode of grouped data, first
you need to fill out this table.
Class
f LB < cf
Interval To compute the lower
55 - 59 3
50 - 54 6
b o u n d a r y, a l w a y s
45 - 49 7 subtract 0.5 to lower
40 - 44 9 class limit (LC).
35 - 39 6
Ex:
55 − 0.5 = 54.5
30 - 34 4
25 - 29 5
Total n= 50 − 0.5 = 49.5
45 − 0.5 = 44.5
Solution:
Class n
First, compute , it will help us to
2
f LB < cf
Interval
55 - 59 3 54.5 40 determine the median class and the
50 - 54 6 49.5 37 < cf.
n 40
= = 20
45 - 49 7 44.5 31
40 - 44 9 39.5 24 2 2
35 - 39 6 34.5 15
30 - 34 4 29.5 9
The median class is the class
25 - 29 5 24.5 5 containing the 20th item. Hence, the
Total n = 40 median class is 40 - 44.
(2 )
n
− < cf i
(20 − 15)5
x̃ = LB + x̃ = 39.5 + = 42.28
f 9
Solution:
Class
Interval
f LB < cf The modal class is the class interval
55 - 59 3 54.5 40
with the highest frequency. The
50 - 54 6 49.5 37
modal class is 40 - 44.
45 - 49 7 44.5 31
40 - 44 9 39.5 24
If there are two class interval that
contains the highest frequency,
35 - 39 6 34.5 15
always choose the highest class
30 - 34 4 29.5 9
interval.
25 - 29 5 24.5 5
d1 = 9 − 6 = 3
( d1 + d2 )
d1
x ̂ = LB + i
d2 = 9 − 7 = 2
3
(3 + 2)
x ̂ = 39.5 + 5 = 42.5
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Quartiles - split
the ordered data
into four quarters.
Percentiles - split
the ordered data
into 100 equal
parts.
Polytechnic University of the Philippines
College of Science
Downloaded by isu director for publication ([email protected])
Department of Mathematics and Statistics
lOMoARcPSD|52393784
(4 )
nk
1. Arrange the data from − < cf i
lowest to highest. Then use
Qk = LB +
this formula. f
nk
Qclass = + 0.5
where:
4 LB = lower boundary of the
quartile class
2. If the resulting positioning i = class width
point is an integer, the
n = no. of observations
particular numerical
k = quartile position
observation corresponding
< cf = less than the cumulative
to that point is chosen for
frequency of the class
the quartile. If not, use preceding the quartile class
interpolation. f = frequency of the quartile
class
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
( 10 )
1. Arrange the data from nk
lowest to highest. Then use − < cf i
this formula. Dk = LB +
f
nk
Dclass = + 0.5 where:
10 LB = lower boundary of the
2. If the resulting decile class
positioning point is an i = class width
integer, the particular n = no. of observations
numerical observation k = decile position
corresponding to that point < cf = less than the cumulative
is chosen for the decile.If frequency of the class
preceding the decile class
not, use interpolation.
Polytechnic University of the Philippines
f = frequency of the decile class
College of Science
Department of Mathematics and Statistics
( 100 )
1. Arrange the data from nk
− < cf i
lowest to highest. Then use
this formula. Pk = LB +
f
nk
Pclass = + 0.5 where:
100 LB = lower boundary of the
2. If the resulting percentile class
positioning point is an i = class width
n = no. of observations
integer, the particular
k = percentile position
numerical observation
< cf = less than the cumulative
corresponding to that point
frequency of the class
is chosen for the percentile. preceding the percentile class
If not, use interpolation. f = frequency of the percentile
Polytechnic University of the Philippines
College of Science class
Downloaded by isu director for publication ([email protected])
Department of Mathematics and Statistics
lOMoARcPSD|52393784
Example 1:
The data given below is the total number of hours
lost due to tardiness and absences of employees in a
company in a given year.
Month Hour Lost (x)
Find Q3, D4 and P55. January 55
February 23
March 37
April 37
May 48
June 42
July 27
August 20
September 30
October 32
November 24
December 40
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
(12)(3)
Qclass = = 9.5
4
2. Use interpolation since the computed Qclass is not an integer.
20 23 24 27 30 32 37 37 40 42 48 55
1 2 3 4 5 6 7 8 9 10 11 12
Q3 = 40 + 0.5(42 − 40)
= 41
D4 = 30 + 0.3(32 − 30)
= 30.6
Polytechnic University of the Philippines
College of Science
Downloaded by isu director for publication ([email protected])
Department of Mathematics and Statistics
lOMoARcPSD|52393784
(12)(55)
Pclass = + 0.5 = 7.1
100
2. Use interpolation since the computed Pclass is not an integer.
20 23 24 27 30 32 37 37 40 42 48 55
1 2 3 4 5 6 7 8 9 10 11 12
Example 2:
The data given below is the age of the residents in
Barangay 634, Sta. Mesa, Manila. Compute Q1, D7, and
P10.
Class Interval Frequency
55 - 59 55
50 - 54 23
45 - 49 37
40 - 44 37
35 - 39 48
30 - 34 42
25 - 29 27
Solution:
To compute Q1, D7, and P10 of grouped data, first you
need to fill out this table.
Class
f LB < cf
Interval To compute the lower
55 - 59 3
50 - 54 6
b o u n d a r y, a l w a y s
45 - 49 7 subtract 0.5 to lower
40 - 44 9 class limit (LC).
35 - 39 6
Ex:
55 − 0.5 = 54.5
30 - 34 4
25 - 29 5
Total n= 50 − 0.5 = 49.5
45 − 0.5 = 44.5
Solution:
Class nk
First, compute , it will help us to
4
f LB < cf
Interval
55 - 59 3 54.5 40 determine the quartile class and the
50 - 54 6 49.5 37
< cf. nk (40)(1)
= = 10
45 - 49 7 44.5 31
40 - 44 9 39.5 24 4 4
35 - 39 6 34.5 15
30 - 34 4 29.5 9
The quartile class is the class
25 - 29 5 24.5 5 containing the 10th item. Hence, the
Total n = 40 quartile class is 35 - 39.
(4 )
nk
− < cf i
(10 − 9)5
Qk = LB + Q1 = 34.5 + = 35.33
f 6
Solution:
Class nk
First, compute , it will help us to
10
f LB < cf
Interval
55 - 59 3 54.5 40 determine the decile class and the
50 - 54 6 49.5 37
< cf. nk (40)(7)
= = 28
45 - 49 7 44.5 31
40 - 44 9 39.5 24 10 10
35 - 39 6 34.5 15
30 - 34 4 29.5 9
The decile class is the class
25 - 29 5 24.5 5 containing the 28 item. Hence, the
Total n = 40 decile class is 45 - 49.
( 10 )
nk
− < cf i
(28 − 24)5
Dk = LB + D7 = 44.5 + = 47.36
f 7
Solution:
nk
First, compute , it will help us to
100
Class
f LB < cf
Interval
55 - 59 3 54.5 40
determine the percentile class and
50 - 54 6 49.5 37
the
< cf. nk (40)(10)
= =4
45 - 49 7 44.5 31
40 - 44 9 39.5 24 100 100
35 - 39 6 34.5 15
30 - 34 4 29.5 9
The percentile class is the class
25 - 29 5 24.5 5 containing the 4th item. Hence, the
Total n = 40 percentile class is 25 - 29.
( 100 )
nk
− < cf i (5 − 0)5
P10 = 24.5 + = 29.5
Pk = LB + 5
f
Example 2:
The ages of the town’s people in a certain community
is as follows:
Class Interval Frequency
18 - 24 28
25 - 31 54
32 - 38 38
39 - 45 20
46 - 52 17
53 - 59 3
Solution:
To compute Q2, D5, and P50 of grouped data, first you
need to fill out this table.
Class
f LB < cf
Interval To compute the lower
18 - 24 28 b o u n d a r y, a l w a y s
25 - 31 54
subtract 0.5 to lower
32 - 38 38
class limit (LC).
39 - 45 20
Ex:
18 − 0.5 = 17.5
46 - 52 17
53 - 59 3
Total n= 25 − 0.5 = 24.5
32 − 0.5 = 31.5
Solution:
Class nk
First, compute , it will help us to
4
f LB < cf
Interval
18 - 24 28 17.5 28 determine the quartile class and the
nk (160)(2)
25 - 31 54 24.5 82 < cf.
= = 80
4 4
32 - 38 38 31.5 120
39 - 45 20 38.5 140
46 - 52 17 45.5 157 The quartile class is the class
53 - 59 3 52.5 160 containing the 80th item. Hence, the
Total n = 160 quartile class is 25 - 31.
(4 )
nk
− < cf i
(80 − 28)7
Qk = LB + Q2 = 24.5 + = 31.24
f 54
Solution:
Class nk
First, compute , it will help us to
10
f LB < cf
Interval
18 - 24 28 17.5 28 determine the decile class and the
< cf. (160)(5)
25 - 31 54 24.5 82
nk
= = 80
10 10
32 - 38 38 31.5 120
39 - 45 20 38.5 140
46 - 52 17 45.5 157 The decile class is the class
53 - 59 3 52.5 160 containing the 80th item. Hence, the
Total n = 160 decile class is 25 - 31.
( 10 )
nk
− < cf i
(80 − 28)7
Dk = LB + D5 = 24.5 + = 31.24
f 54
Solution:
nk
First, compute , it will help us to
Class
f LB < cf 100
Interval
determine the percentile class and
18 - 24 28 17.5 28
the
(160)(50)
25 - 31 54 24.5 82
< cf. nk
= = 80
100 100
32 - 38 38 31.5 120
39 - 45 20 38.5 140
46 - 52 17 45.5 157 The percentile class is the class
53 - 59 3 52.5 160 containing the 80th item. Hence, the
Total n = 160 percentile class is 25 - 31.
( 100 )
nk
− < cf i (80 − 28)7
Pk = LB + P50 = 24.5 + = 31.24
f 54
Sample Interpretation:
1. Jennifer just received the results of her SAT exam. Her
SAT Mathematics score of 600 is in the 74th percentile. What
does this mean?
A percentile rank of 74% means that 74% of SAT
Mathematics scores are less than or equal to 600 and 26%
of the scores are greater. So 26% of the students who took
the exam scored better than Jennifer.
Measures of Dispersion/Variability
Based on the figure below, determine which between the
two scatter diagram illustrate larger variability?
Figure 1 Figure 2
Measures of Dispersion/Variability:
RANGE
It is the difference between the largest and the smallest
observations or items in a set of data.
R = Xmax. − Xmin.
Measures of Dispersion/Variability:
STANDARD DEVIATION
• It is a measure of how far away items in a data set are from
the mean.
• The larger the standard deviation, the more variation there
is in the data set.
• The standard deviation can never be a negative number,
due to the way it’s calculated and the fact that it measures a
distance (distances are never negative numbers).
• The smallest possible value for the standard deviation is 0,
and that happens only in contrived situations where every
single number in the data set is exactly the same (no
deviation).
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
where: where:
∑i=1 (xi − x̄)2 xi = data ∑i=1 f(xi − x̄)2
n r
xi = data
values s =
n−1 values s = n−1
x̄ = mean x̄ = mean
n = no. of sample observations f = frequency
n = no. of sample observations
Population Standard Deviation
where: where:
xi = data
∑i=1 (xi − μ) 2 xi = data ∑i=1 f(xi − μ)2
N r
values σ = values σ =
μ = mean N μ = mean N
N = no. of observations f = frequency
N = no. of observations
Polytechnic University of the Philippines
College of Science
Downloaded by isu director for publication ([email protected])
Department of Mathematics and Statistics
lOMoARcPSD|52393784
Measures of Dispersion/Variability:
VARIANCE
It represents all data points in a set and is calculated
by averaging the squared deviation of each mean.
Example 1:
The data given below is the age of the residents in
Barangay 634, Sta. Mesa, Manila. Compute sample
standard deviation and sample variance.
Class Interval Frequency
55 - 59 55
50 - 54 23
45 - 49 37
40 - 44 37
35 - 39 48
30 - 34 42
25 - 29 27
Solution:
To compute SD and Var of grouped data, first you
need to fill out this table.
Class
Interval
f x fx (xi − x̄)2 f(xi − x̄)2
55 - 59 3
50 - 54 6
45 - 49 7
40 - 44 9
35 - 39 6
30 - 34 4
25 - 29 5
7 7
fxi = f(xi − x̄)2 =
Total n=
∑ ∑
i=1 i=1
Solution:
Class
Interval
f x fx (xi − x̄)2 f(xi − x̄)2
55 - 59 3 57 171 228.61
50 - 54 6 52 312 102.41
45 - 49 7 47 329 26.21
40 - 44 9 42 378 0.01
35 - 39 6 37 222 23.81
30 - 34 4 32 128 97.61
25 - 29 5 27 135 221.41
7 7
fx = f(xi − x̄)2 =
Total n = 40 ∑ i ∑
i=1 1,675 i=1
Solution:
Class
Interval
f x fx (xi − x̄)2 f(xi − x̄)2
55 - 59 3 57 171 228.61 685.83
50 - 54 6 52 312 102.41 614.46
45 - 49 7 47 329 26.21 183.47
40 - 44 9 42 378 0.01 0.09
35 - 39 6 37 222 23.81 142.86
30 - 34 4 32 128 97.61 390.44
25 - 29 5 27 135 221.41 1107.05
7 7
fx = f(xi − x̄)2 =
Total n = 40 ∑ i ∑
i=1 1,675 i=1 3,124.20
f(x1 − x̄)2 = 3(228.61) = 685.83
f(x2 − x̄)2 = 6(102.41) = 614.46
f(x3 − x̄)2 = 7(26.21) = 183.47
Polytechnic University of the Philippines
College of Science
Downloaded by isu director for publication ([email protected])
Department of Mathematics and Statistics
lOMoARcPSD|52393784
Solution: 7
∑i=1 f(xi − x̄)2
s=
n−1
Class
(xi − x̄) 2
f(xi − x̄) 2
3,124.20
Interval
55 - 59 228.61 685.83 s=
50 - 54 102.41 614.46 40 − 1
45 - 49 26.21 183.47 = 8.95
40 - 44 0.01 0.09
7
∑i=1 f(xi − x̄)2
35 - 39 23.81 142.86
s2 =
30 - 34 97.61 390.44
25 - 29 221.41 1107.05
n−1
7
f(xi − x̄)2 = 3,124.20
Total
∑
3,124.20 s2 =
40 − 1
i=1
= 80.11
Shape of Distribution
These two statistics give you insights into the shape of
the distribution.
✦ Skewness is the degree of distortion from the
symmetrical bell curve or the normal distribution. It
measures the lack of symmetry in data distribution.
✦ Kurtosis is a measure of the combined sizes of the
two tails. It tells you how tall and sharp the central
peak is, relative to a standard bell curve.
Skewness
A symmetrical distribution will have a skewness of 0.
So, a normal distribution will have a skewness of 0.
In a symmetrical distribution, the Mean, Median and
Mode are equal to each other and the ordinate at
mean divides the distribution into two equal parts.
3(x̄ − x̃)
where:
x̄ is the mean Sk =
x̃ is the median
s
s is the sample standard deviation
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Kurtosis
It is actually the measure of outliers present in the
distribution. The outliers in a sample, therefore, have
even more effect on the kurtosis than they do on the
skewness.
Higher kurtosis means more of the variance is the
result of infrequent extreme deviations, as opposed to
frequent modestly sized deviations. In other words, it’s
the tails that mostly account for kurtosis, not the
central peak.
The kurtosis decreases as the tails become lighter. It
increases as the tails become heavier.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Normal Distribution
✦ The normal distribution is sometimes called the bell curve
because the graph of its probability density looks like a
bell.
Normal Curve
50 100 150
The red curve is a model called the normal curve ,
which is used to describe continuous random variables
that are said to be normally distributed.
A continuous random variable is normally distributed,
or has a normal probability distribution, if its relative
frequency histogram has the shape of a normal curve.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
and μ + σ.
Mean:
✦ Changing the mean shifts the entire
curve left or right on the X-axis.
Standard Deviation:
✦ Changing the standard deviation
either tightens or spreads out the
μ1 < μ2, σ1 = σ2
width of the distribution along the X-
axis.
Larger standard deviations produce distributions that are more
spread out.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
A. C.
B. D.
Remember!
Positive values of z-score indicate how far above
the mean a score falls and negative values
indicate how far below the mean a score falls.
= -
z1 z2 0 z1 0 z2
1 − Area 1 − Area
Polytechnic University of the Philippines
College of Science
Downloaded by isu director for publication ([email protected])
Department of Mathematics and Statistics
lOMoARcPSD|52393784
Using Table 1
D. Area to the right of a positive z value or to the left of a
negative z value.
= -
0 z1 0 0 z1
Area = 1
= -
0 z1 0 z1 0
Area = 0.50
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Using Table 2
A. Area to the right of a positive z value or to the left of a
negative z value.
Use Table 2 directly
z1 0 0 z1
B. Area between z values on same side of 0.
= -
z1 z2 0 z1 0 z2
= +
z1 0 z2 0 z2 z1 0
0.50 − Area 0.50 − Area
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Using Table 2
D. Area to the right of a negative z value or to the left of a
positive z value.
= +
z1 0 z1 0 0
0.50 − Area Area = 0.50
E. Area between a given z value and 0.
= -
0 z1 0 0 z1
Area = 0.50
Polytechnic University of the Philippines
College of Science
Downloaded by isu director for publication ([email protected])
Department of Mathematics and Statistics
lOMoARcPSD|52393784
Example 1:
Scores on a standardized college entrance examination (CEE)
are normally distributed with mean 510 and standard
deviation 60. A selective university considers for admission
only applicants with CEE scores over 560. Find proportion of
all individuals who took the CEE who meet the university's
CEE requirement for consideration for admission.
Solution:
Given: μ = 510,σ = 60 and x = 560
Area = P(X > 560)
Step 1: Draw a normal curve and
shade the desired area.
X
450 510 570
Polytechnic University of the Philippines
560
College of Science
Department of Mathematics and Statistics
Example 2:
A pediatrician obtains the heights of her three-year-old female
patients. The heights are approximately normally distributed,
with mean 38.72 inches and standard deviation 3.17 inches.
Determine the proportion of the three-year-old females that
have a height less than 35 inches.
Solution:
Given: μ = 38.72,σ = 3.17 and x = 35
Step 1: Draw a normal curve and shade
the desired area.
Area = P(X < 35)
X
35.55 38.72 41.89
Polytechnic University of the Philippines
35
College of Science
Department of Mathematics and Statistics
−1.17
Use “TRUE”
for cumulative
since we want
the area under
the normal
curve.
Example 3:
A pediatrician obtains the heights of her three-year-old female
patients. The heights are approximately normally distributed,
with mean 38.72 inches and standard deviation 3.17 inches.
Determine the probability that a randomly selected three-year-
old girl is between 35 and 40 inches tall, inclusive.
Solution:
Given: μ = 38.72,σ = 3.17, and 35 ≤ X ≤ 40
Area = P(35 ≤ X ≤ 40)
Step 1: Draw a normal curve and
shade the desired area.
X
35.55 38.72 41.89
Polytechnic University of the Philippines
35 40
College of Science
Downloaded by isu director for publication ([email protected])
Department of Mathematics and Statistics
lOMoARcPSD|52393784
X
−2 −1 0 1 2
Polytechnic University of the Philippines −1.17 0.40
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
ACTIVITIES/ASSESSMENTS:
2. What features
of the ‘Good
Presentation’
make it better
than the ‘Bad
Presentation’?
A.
B.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
3. Review the table and consider questions such as the
following.
Needs
Origin / Rating Poor Satisfactory V Good Excellent Total
Improvement
External 0% 2% 12% 19% 9% 41%
Internal 4% 8% 15% 23% 9% 59%
Grand Total 4% 10% 27% 41% 17% 100%
1. What percentage of the employees originated from within the
organization?
2. What percentage of the employees are both internal and rated
‘Very Good’?
3. What percentage of the employees received ‘Needs Improvement’
or ‘Poor’?
4. What category contains the greatest number of employees?
5. Do you see any notable differences in the percentage by category?
Polytechnic University of the Philippines
College of Science
Downloaded by isu director for publication ([email protected])
Department of Mathematics and Statistics
lOMoARcPSD|52393784
ACTIVITIES/ASSESSMENTS:
4. Consider the above Frequency Distribution of
Salaries.
Salary Frequency Percentage
41,000 - 50,000 1 1%
51,000 - 60,000 20 13%
61,000 - 70,000 53 35%
71,000 - 80,000 43 29%
81,000 - 90,000 26 17%
91,000 - 100,000 6 4%
101,000 - 110,000 1 1%
Total 150 100%
1.What percentage of the employees earns less than or
equal 80,000?
2.What is the salary range of values?
3.What salary categories have percentage less than 5?
4.What salary category includes the most employees?
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
5. The length of life of an instrument produced by a machine has a normal
distribution with a mean of 12 months and standard deviation of 2 months.
Find the probability that an instrument produced by this machine will last
A. less than 7 months.
B. between 7 and 12 months.
Be sure to draw a normal curve with the area corresponding to the
probability shaded.
6. The lengths of human pregnancies are approximately normally distributed,
with mean μ = 266 days and standard deviation σ = 16 days.
What proportion of pregnancies lasts more than 270 days?
B. What proportion of pregnancies lasts less than 250 days?
C. What proportion of pregnancies lasts between 240 and 280 days?
D. What is the probability that a randomly selected pregnancy?
lasts more than 280 days?
Be sure to draw a normal curve with the area corresponding to the
probability shaded.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
7. Construct frequency distribution table based on the
scores of 75 randomly selected students.
37 46 37 26 30 41 28 49 29 34 46 50 38 35 42
35 46 45 27 41 26 45 39 43 46 36 32 46 36 48
49 47 30 43 31 34 38 41 39 45 28 43 37 39 26
38 30 29 38 26 31 42 44 48 43 37 46 38 27 50
42 33 42 42 43 39 39 31 46 46 48 48 50 45 31
Scores Frequency Percentage (%)
26 to 30
31 to 35
36 to 40
41 to 45
46 to 50
Total
Polytechnic University of the Philippines
College of Science
Downloaded by isu director for publication ([email protected])
Department of Mathematics and Statistics
lOMoARcPSD|52393784
ACTIVITIES/ASSESSMENTS:
A. Based on the frequency distribution, compute measures of
central tendency, measures of variation, Q1, D9, P10 , Skewness
and kurtosis.
B. Based on the raw data, compute measures of central
tendency, measures of variation, Skewness and kurtosis using
Excel.
C. Compute Skewness and kurtosis of grouped and ungrouped
data. Make sure to describe the shape of the distribution
D. Do you think that computed value for grouped and
ungrouped data are the same?
8. Begin with the following set of data, call it Data Set I.
5, −2, 6, 14, −3, 0, 1, 4, 3, 2, 5
A. Compute the sample standard deviation and sample mean of
Data Set I.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
B. Form a new data set, Data Set II, by adding 3 to each
number in Data Set I. Calculate the sample standard deviation
and sample mean of Data Set II.
C. Form a new data set, Data Set III, by subtracting 6 from
each number in Data Set I. Calculate the sample standard
deviation and sample mean of Data Set III.
D. Comparing the answers to parts (a), (b), and (c), can you
guess the pattern? State the general principle that you expect
to be true.
References
https://prezi.com/rirrca9ckuiz/textual-
presentation-of-data/
https://www.toppr.com/guides/economics/
presentation-of-data/textual-and-tabular-
presentation-of-data/
Statistics. Informed Decision using Data by
Michael Sullivan, III,. Fifth Edition
What is HYPOTHESIS?
•A statement or claim regarding a characteristic of
one or more populations.
•A preconceived idea, assumed to be true but has to
be tested for its truth or falsity.
Reminders:
If you are conducting a research study and you want
to use a hypothesis test to support your claim, the
claim must be stated in such a way that it becomes
the alternative hypothesis, so it cannot contain the
condition of equality.
✦ Right tailed
Example:
H0: The defendant is innocent.
Ha: The defendant is not innocent.
Answer:
A type I error is like putting an innocent person in
jail.
A type II error is like letting a guilty person go free.
Reminders:
It is important to note that we want to set
( α ) before we start our study because the
Type I error is the more ‘grevious’ error to
make.
The smaller (α ) is, the smaller the region
of rejection.
Decision Rule:
✦ Using Confidence Interval
Traditional Approach
Rejection of region
or critical region is
the set of all values of
the test statistic
which will lead to the
rejection of H0.
Acceptance Region is
the set of all values of
the test statistic that
leads the researcher to
retain H0.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
-2 0 2 -2 0 2
Two-tailed
Ha : μ1 ≠ μ2
Rejection Region
Rejection Region
Reminders:
Graphical methods are typically not
very useful when the sample size is
small.
STEP 1:
Rearrange
the data in
ascending
order.
Use "=DEVSQ( )”
function in excel
Polytechnic
Polytechnic University
University of the Philippines
of the Philippines
College
College of Science
of Science
Department
Department of Mathematics and Statistics
of Mathematics and Statistics
∑ i ( n+1−i
STEP 3: Calculate b as follows: b = a x − xi)
i=1
n is the number of
observation
If n is even:
n
m=
2
If n is odd:
n−1
m=
2
Since n is even in this
example, m=8. That’s
College of Science why we used a1 to a8
Polytechnic University of the Philippines
STEP 5:
Find the value in the table of Shapiro - Will (for a
given value of n) that is closest to W, interpolating if
necessary. This is the p-value for the test.
We choose this
interval in the table of
Shapiro - Wilk,
because our n=16 and
our test statistic
(W=0.955) is within
Polytechnic University of the Philippines
this interval.
College of Science
Department of Mathematics and Statistics
Result
Inferential Statistics
1. Parametric Tests
✦ Assume underlying statistical distributions in the data.
Therefore, several conditions of validity must be met
so that the result of a parametric test is reliable.
✦ Apply to data in ratio scale, and some apply to data in
interval scale.
2. Non Parametric Test
✦ Refer to a statistical method in which the data is not
Example:
Determine whether the sample is independent or dependent.
1. An urban economist believes that commute times to
work in the South are less than commute times to work
in the Midwest. He randomly selects 40 employed
individuals in the south and 45 employed individuals in
the Midwest and determines their commute times.
Answer: Independent
2. In an experiment conducted in biology class, Prof.
Rhea measured the time required for 12 students to
catch a failing meter stick using their dominant hand
and nondominant hand. The goal of the study was to
determine whether the reaction time in an individual’s
dominant hand is different from the reaction time in
the non dominant hand. Answer: Dependent
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Example:
Determine whether the sample is independent or
dependent.
3. A researcher wants to know if the mean
length of stay in for-profit hospitals is different
from the mean length of stay in not-for-profit
hospitals. He randomly selected 20 individuals in
the for-profit hospital and matched them with 20
individuals in the not-for-profit by diagnosis.
Answer:
Dependent
Polytechnic University of the Philippines
College of Science
Downloaded by isu director for publication ([email protected])
Department of Mathematics and Statistics
lOMoARcPSD|52393784
Assumptions
1. Your dependent variable should be measured at
the interval or ratio level (i.e., they are
continuous).
2. Your independent variable should consist of two
categorical, "related groups" or "matched pairs”.
3. There should be no significant outliers in the
differences between the two related groups.
4. The distribution of the differences in the
dependent variable between the two related
groups should be approximately normally
distributed.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Example:
A teacher is interested to know if the new learning program
will help to increase the number of correct remembered
words. 10 Subjects learn a list of 50 words. Learning
performance is measured using a recall test.
After the first test all subjects
are instructed how to use the
learning program and then
learn a second list of 50 words.
Learning performance is again
measured with the recall test. In
the following table the number
of correct remembered words
are listed for both tests.
Polytechnic University of the Philippines
College of Science
Downloaded by isu director for publication ([email protected])
Department of Mathematics and Statistics
lOMoARcPSD|52393784
Reject Ho
6. Draw Conclusion
There is sufficient evidence to support that the new
learning program help to increase the number of
correct remembered words.
Exercises:
Apply the procedure in testing the hypothesis.
Professor Rhea measured the time (in second) required to
catch a falling meter sticks for 10 randomly selected
students' dominant hand and non-dominant hand. Professor
Rhea claims that the reaction time in an individual's
dominant hand is less than the reaction time in
their non-dominant hand.
Test the claim at the level
of significance. The data
obtained are presented:
Result
Assumptions
1. Your dependent variable should be measured on a
continuous scale (i.e., it is measured at the interval or
ratio level).
2. Your independent variable should consist of two
categorical, independent groups.
3. You should have independence of observations, which
means that there is no relationship between the
observations in each group or between the groups
themselves.
4. There should be no significant outliers.
5. Your dependent variable should be approximately
normally distributed for each group of the independent
variable.
6. There needs to be homogeneity of variances.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Example:
Researchers wanted to know whether there was a difference in
comprehension among students learning a computer program
based on the style of the text. They randomly divided 18
students into two groups of 9 each. The researchers verified
that the 18 students were similar in terms of educational level,
age, and so on. Group 1 individuals learned the software using
visual manual (multimodal
instruction), while Group 2
individual learned the software
using textual manual (Unimodal
instruction). The following data
represent scores the students
received on an exam given to them
they studied from the manuals.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Determine if the
variances are equal
or not equal.
Failed to
Reject Ho
Since we failed to reject Ho, we will proceed to t-test: Two
Sample Assuming Equal Variances.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Result
Failed to
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Reject Ho
6. Draw Conclusion
There is no enough evidence to support that
there is a difference in comprehension among
students learning a computer program based on
the style of the text.
Proper Presentation of Results
Exercises:
Apply the procedure in testing the hypothesis.
Twenty participants were given a list of 20 words to
process. The 20 participants were randomly assigned to
one of two treatment conditions. Half were instructed to
count the number of vowels in each word (shallow
processing). Half were instructed to judge whether the
object described by each word would be useful if one
were stranded on a desert island (deep processing).
After a brief distractor task, all subjects were given a
surprise free recall task. Did the instruction affect the
level of recall?The number of words correctly recalled
was recorded for each subject. Here are the data:
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Result
Assumptions
1. Your dependent variable should be measured at the
interval or ratio level (i.e., they are continuous).
2. Your independent variable should consist of two or more
categorical, independent groups.
3. You should have independence of observations, which
means that there is no relationship between the
observations in each group or between the groups
themselves.
4. There should be no significant outliers.
5. Your dependent variable should be approximately
normally distributed for each category of the independent
variable.
6. There needs to be homogeneity of variances.
Polytechnic University of the Philippines
College of Science
Downloaded by isu director for publication ([email protected])
Department of Mathematics and Statistics
lOMoARcPSD|52393784
Example:
A Researchers wanted to compare math test scores of
students at the end of secondary school from various cities.
Eight randomly selected students from Makati, Manila,
and Quezon City each were administered the same exam;
the results are presented in the following table. Can the
researchers conclude
that the distribution of
exam scores is different
for each city at the
level of significance?
Determine if the
variances are equal
or not equal.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Failed to
Reject Ho
E q u a l
Variances
Polytechnic University of the Philippines
College of Science
Assumed
Downloaded by isu director for publication ([email protected])
Department of Mathematics and Statistics
lOMoARcPSD|52393784
Failed to
Reject Ho
E q u a l
Variances
Polytechnic University of the Philippines
College of Science
Assumed
Department of Mathematics and Statistics
Failed to
Reject Ho
E q u a l
Variances
Polytechnic University of the Philippines
College of Science
Assumed
Department of Mathematics and Statistics
Result
Reject Ho
6. Draw Conclusion
There is enough evidence to support that the
distribution of exam scores of students in
mathematics is different for each city.
Exercises:
Apply the procedure in testing the hypothesis.
A teacher is concerned about the level of
knowledge possessed by PUP students regarding
Philippine history. Students completed a high
school senior level standardized history exam.
Academic major of the students was also recorded.
Data in terms of percent correct is recorded below
for 24 students. Is there a significant difference
between the levels of knowledge possessed by PUP
students regarding Philippine history when
grouped according to their academic major?
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Result
Features of r
• Unit free
• Range between -1 and 1
• The closer to -1, the stronger the negative
linear relationship.
• The closer to 1, the stronger the positive
linear relationship.
• The closer to 0, the weaker the linear
relationship.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
X X X
r = -1 r = -.6 r =0
Y Y
r = .6 r=1
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Reminders:
• Correlation does not imply causation.
• Watch out for hidden (lurking) variables.
Lurking Variable
• A variable that is not included as an explanatory
or response variable in the analysis but can affect
the interpretation of relationships between
variables.
• Can falsely identify a strong relationship between
variables or it can hide the true relationship.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Assumptions
1. Your two variables should be measured at the
interval or ratio level (i.e., they are
continuous).
2. There is a linear relationship between your
two variables.
3. There should be no significant outliers.
4. Your variables should be approximately
normally distributed.
Test Statistic:
df
t=r
1 − r2
where:
df = degrees of freedom
r = correlation coefficient of Pearson r
Note:
df = n − 2
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Example:
A dietetics student wanted to look at the
relationship between calcium intake and
knowledge about calcium in sports
science students. Table shows the data
she collected. Is there a relationship
between calcium intake and knowledge
about calcium in sports science
students?
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
df
t=r
1 − r2
df = n − 2
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Result
Polytechnic
Polytechnic University
University of the Philippines
of the Philippines
College
College of Science
of Science
Department
Department of Mathematics and Statistics
of Mathematics and Statistics
6. Draw Conclusion
There is sufficient evidence to conclude that there
is significant relationship between the calcium
intake and knowledge about calcium in sports
science students.
Proper Presentation of Results
Exercises:
Apply the procedure in testing the hypothesis.
Result
Chi-Square Distribution
Definition:
The chi-square distribution is
written as χ 2 distribution.
The symbol χ is the Greek letter
“chi”, pronounced as “ki”.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Assumptions
1. There are 2 variables, and both are measured as
categories, usually at the nominal level.
2. The two variables should consist of two or more
categorical, independent groups.
3. The data in the cells should be frequencies, or counts
of cases rather than percentages or some other
transformation of the data.
4. For a 2 by 2 table, all expected frequencies > 5.
5. For a larger table, all expected frequencies > 1 and
no more than 20% of all cells may have expected
frequencies < 5.
Polytechnic University of the Philippines
College of Science
Downloaded by isu director for publication ([email protected])
Department of Mathematics and Statistics
lOMoARcPSD|52393784
Example:
1. A doctor who knows that hypertension depends
on smoking habits can tell his smoking patients what
they should do.
2. If the traffic condition (light, moderate, heavy,
standstill) is found to be dependent on vehicle plate
numbers (odd, even) a traffic officer may decide to
revise traffic law enforcement.
Reminders:
The word contingency refers to
dependence, but this is only a
statistical dependence and cannot be
used to establish a direct cause-and-
effect link between the two variables in
question.
Example:
Educators are always looking for novel ways in
which to teach statistics to undergraduates as part
of a non-statistics degree course (e.g., psychology).
With current technology, it is possible to present
how-to guides for statistical programs online
instead of in a book. However, different people
learn in different ways. An educator would like to
know whether gender (male/female) is associated
with the preferred type of learning medium (online
vs. books). Use “Data_Example and Exercises file”.
Polytechnic University of the Philippines
College of Science
Downloaded by isu director for publication ([email protected])
Department of Mathematics and Statistics
lOMoARcPSD|52393784
Row Total
Grand Total
Column Total
6. Draw Conclusion
There is sufficient evidence to conclude that there
gender is associated with the preferred type of
learning medium.
Proper Presentation of Results
Exercises:
Apply the procedure in testing the hypothesis.
A survey was conducted at a community college of 102
randomly selected students who dropped a course in the
current semester to learn why students drop courses.
Personal drop reasons include financial, transportation,
family issues, health issues, and lack of child care. Course
drop reasons include reducing ones load, being unprepared
for the course, the course was not what was expected,
dissatisfaction with teaching, and not getting the desired
grade. Work drop reasons include an increase in hours, a
change in shift, and obtaining full-time employment. Test
whether gender is independent of drop reason at the 1%
level of significance. Use “Data_Example and Exercises
file”.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
Result
ACTIVITIES/ASSESSMENTS:
Determine whether the sampling is dependent or independent.
________1. A researcher wishes to compare academic
aptitudes of married mathematicians and their spouses. She
obtains a random sample of 287 such couples who take an
academic aptitude test and determines each spouses academic
aptitude.
________2. A political scientist wants to know how a random
sample of 18- to 25-year-olds feel about Democrats and
Republicans in Congress. She obtains a random sample of
1030 registered voters 18 to 25 years of age and asks, Do you
have favorable/unfavorable opinion of the Democratic/
Republican party? Each individual was asked to disclose his
or her opinion about each party.
Polytechnic University of the Philippines
College of Science
Downloaded by isu director for publication ([email protected])
Department of Mathematics and Statistics
lOMoARcPSD|52393784
ACTIVITIES/ASSESSMENTS:
________3. An educator wants to determine whether a new
curriculum significantly improves standardized test scores for third
grade students. She randomly divides 80 third-graders into two
groups. Group 1 is taught using the new curriculum, while group 2 is
taught using the traditional curriculum. At the end of the school year,
both groups are given the standardized test and the mean scores are
compared.
________4. A stock analyst wants to know if there is difference
between the mean rate of return from energy stocks and that from
financial stocks. He randomly select 13 energy stocks and computes
the rate of return for the past year. He randomly selects 13 financial
stocks and compute the rate of return for the past year.
________5. An urban economist believes that commute times to work
in the South are less than commute times to work in the Midwest. He
randomly selects 40 employed individuals in the south and 45
employed individuals in the Midwest and determines their commute
times.
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
Solve the following problems. Make sure to follow the 6 steps
procedure.
1. A study is designed to test whether there is a difference in mean daily
calcium intake in adults with normal bone density, adults with
osteopenia (a low bone density which may lead to osteoporosis) and
adults with osteoporosis. Adults 60 years of age with normal bone
density, osteopenia and osteoporosis are selected at random from
hospital records and invited to participate in the study. Each
participant's daily calcium intake is measured based on reported food
intake and supplements. The data are shown below.
I s t h e r e a s t a t i s t i c a l l y Normal Bone Osteopenia Osteoporosis
significant difference in mean Density
1200 1000 890
calcium intake in patients 1000 1100 650
with normal bone density as 980 700 1100
compared to patients with 900 800 900
osteopenia and osteoporosis? 750 500 400
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
800 700 350
ACTIVITIES/ASSESSMENTS:
2. Some studies have shown that in the United Men Women
(in $) (in $)
States, men spend more than women buying gifts
and cards on Valentine’s Day. Suppose a researcher 107.48 125.98
wants to test this hypothesis by randomly sampling 143.61 45.53
nine men and 10 women with comparable
demographic characteristics from various large cities 90.19 56.35
across the United States to be in a study. Each study 125.53 80.62
participant is asked to keep a log beginning one
70.7 46.37
month before Valentine’s Day and record all
purchases made for Valentine’s Day during that one- 83 44.34
month period. The resulting data are shown below.
129.63 75.21
Use these data and a 1% level of significance to test
to determine if, on average, men actually do spend 154.22 68.48
significantly more than women on Valentine’s Day.
93.8 85.82
Assume that such spending is normally distributed
in the population and that the population variances 126.11
are equal.
Polytechnic University of the Philippines
College of Science
Downloaded by isu director for publication ([email protected])
Department of Mathematics and Statistics
lOMoARcPSD|52393784
ACTIVITIES/ASSESSMENTS:
3. A researcher is interested whether a training course increases
the teaching performance of the teachers who attended the
training courses. Test at 10% level of significance. The data are
shown below:
Case Before After Case Before After
1 85 95 11 89 97
2 84 98 12 87 98
3 86 97 13 82 95
4 87 92 14 81 95
5 89 96 15 86 92
6 82 93 16 89 91
7 80 94 17 89 94
8 84 95 18 84 95
9 86 90 19 85 96
10 82 82
Polytechnic University of the Philippines
20 88 97
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
Head
4. A pediatrician wants to Height
Circumference
determine the relation that may (inches)
(inches)
exist between a child’s height 27.75 17.5
and head circumference. She 24.5 17.1
randomly selects eleven 3- 25.5 17.1
yearold children from her 26 17.3
practice, measures their heights 25 16.9
and head circumference, and 27.75 17.6
obtains the data shown in the
26.5 17.3
table below.
27 17.5
26.75 17.3
26.75 17.5
27.5 17.5
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
ACTIVITIES/ASSESSMENTS:
5. The following data represent the smoking status from a
random sample of 1054 U.S. residents 18 years or older by
level of education.
No. Of Years Smoking Status
of Education Current Former Never
Less than 12 178 88 208
12 137 69 143
13 - 15 44 25 44
16 or more 34 33 51
ACTIVITIES/ASSESSMENTS:
Head
6. A pediatrician wants to Height
Circumference
determine the relation that may (inches)
(inches)
exist between a child’s height 27.75 17.5
and head circumference. She 24.5 17.1
randomly selects eleven 3- 25.5 17.1
yearold children from her 26 17.3
practice, measures their heights 25 16.9
and head circumference, and 27.75 17.6
obtains the data shown in the
26.5 17.3
table below.
27 17.5
26.75 17.3
26.75 17.5
27.5 17.5
Polytechnic University of the Philippines
College of Science
Department of Mathematics and Statistics
References
h t t p s : / / w o l f w e b . u n r. e d u / h o m e p a g e / a n i a /
stat352f12lectures/352lecture21f12.pdf
Statistics. Informed Decision using Data by
Michael Sullivan, III,. Fifth Edition
http://www.real-statistics.com/tests-normality-
and-symmetry/statistical-tests-normality-
symmetry/shapiro-wilk-test/