INTRODUCTION
TO
STATISTICS
INTRODUCTION TO STATISTICS
1. Origin and Development of Statistics
2. Definition
3. Uses
4. Branches
5. Constant and Variables
6. Data and Information
7. Population and Sample
8. Census and Sampling Techniques
History of Statistics - Timeline
Time Contributor Contribution
Ancient
Philosophers Ideas - no quantitative analyses
Greece
studied affairs of state, vital statistics of
Graunt, Petty
populations
17th
Century Pascal,
studied probability through games of chance,
Bernoulli
gambling
18th Laplace, normal curve, regression through study of
Century Gauss astronomy
astronomer who first applied statistical analyses
Quetelet
to human biology
19th
Century
studied genetic variation in humans(used
Galton
regression and correlation)
Time Contributor Contribution
Pearson studied natural selection using correlation,
formed first academic department of statistics,
Biometrika journal, helped develop the Chi
Square analysis
Gossett studied process of brewing, alerted the
(Student) statistics community about problems with small
sample sizes, developed Student's test
20th Century
(early)
evolutionary biologists - developed ANOVA,
Fisher stressed the importance of experimental
design
biochemist studied pesticides, non-parametric
Wilcoxon equivalent of two-samples test
Kruskal, economist who developed a non-parametric
Wallis equivalent of the ANOVA
Time Contributor Contribution
Spearman
psychologist who developed a non-parametric
Kendall equivalent of the correlation coefficient
statistician who developed another non-
Tukey parametric equivalent the correlation coefficient
statistician who developed multiple comparisons
procedure
Dunnett biochemist who studied pesticides, developed
20th Century
(later) multiple comparisons procedure for control groups
agronomist who developed multiple comparisons
Keuls procedure
provided many advantages over calculations by
Computer hand or by calculator, stimulated the growth of
Technology investigation into new techniques
Statistics
the art or science of collecting
and analyzing numerical data in
large quantities, especially for the
purpose of inferring proportions in
a whole from those in a
representative sample.
Statistics
It is defined as a branch of
science dealing with the
methods of collecting,
organizing, presenting,
analyzing and interpreting
quantitative data.
Statistics is the science of collecting,
organizing, presenting, analyzing, and
interpreting numerical data for the purpose of
assisting in making a more effective decision.
Art, on the other hand, refers to the skill of
handling facts so as to achieve a given
objective. It is concerned with ways and means
of presenting and handling data making
inferences logically and drawing relevant
conclusions.
USES
MISUSES
Bad Samples Refusals
Small Samples Correlation & Causality
Misleading Graphs Self Interest Study
Pictographs Precise Numbers
Distorted Percentages Partial Pictures
Loaded Questions Deliberate Distortions
Order of Questions
Application of statistics
• Education
• Medicine
• Business
• Psychology
• Agriculture
Braches of statistics
DESCRIPTIVE STATISTICS
• Those methods involving
summarization, presentation,
computation, and
interpretation of data in order
to describe the various features
of the set of data properly.
Descriptive Statistics: collection,
presentation, and description of sample data.
Concerned with:
Percentage distribution of dependents
Average or typical characteristics of the group
Homogeniety and heterogeneity of characteristics
Degree of relationships of group characteristics.
Tools commonly used:
Measures of location, variability and tendencies
Examples
1. At least 5% of all fires reported last year in
metro manila were deliberately set by
arsonist.
2. Of all babies who have receive BCG
vaccine at the Malolos provincial hospital
in the first quarter of 2010, 75% did not
develop fever.
3. Of the 50 students in statistics who took the
final examination only 5% did not make it.
INFERENTIAL STATISTICS
• Those methods that aim to give
inferences or implication
regarding the characteristics of
the population by studying its
representative sample.
Inferential Statistics: making decisions and
drawing conclusions from the data collected.
Concerned with:
Testing the significant difference and independence
between two or more variables.
Assertion or hypothesis about the population is made
and is intended to be accepted or rejected depending on
the result of the test based from available samples.
Tools commonly used:
Normal distribution
Estimation
Sampling distribution
Hypothesis testing
Probability
Examples
1. As a result of the recent survey, president
Aquino’s popularity has gone down
dramatically.
2. As a result of the recent implementation of
the E-Vat law, we can expect the prices to
go higher during the holiday seasons.
3. Based on the results of the annual
population census from 2005-2010, the
Philippine population is expected to
increase by 35% by the year 2015.
POPULATION
Is the totality of objects, individual
or things under consideration.
SAMPLE
Is the part of the population
that is selected for analysis.
Parameter
The numerical value that
describes the characteristics of a
population.
Statistic
Any quantity obtained from a
sample.
Examples
1. Suppose you draw conclusions about the weights of
10000 students in your school by examining only 200
students selected from the population. after collecting
data, you observe that most of the 200 students are 17
years. The statistic here is 17 years old.
2. After a nationwide survey, the National Statistics
Office(NSO) reported that the average size of a
Filipino family is six person. Here, the parameter is 6
persons.
Constant and Variable
Constants - the fundamental quantities that do
not change in value.
Variables – the quantities in which the values can
vary from one entity or another.
.
.
"Age" is a variable. It can take on many
different values, such as 18, 49, 72, and so on.
"Gender" is a variable. It can take on two
different values, either male or female.
"Place" (in a race) is another variable. It can
take on values such as 1st place, 2nd place,
3rd place, and so on.
Kinds of Variables
Quantitative Variable
A variable measured numerically.
1.Discrete Variable - a quantitative variable
with a finite number of values.
For example, imagine you rolled a six-sided
die four times and measured how many times
you rolled an even number. What are your
possible outcomes? {0, 1, 2, 3, 4}
2. Continuous Variable - a quantitative
variable with an infinite number of values.
For example, temperature can take on an
infinite number of values, such as 80
degrees, or 80.01 degrees, or
80.0050592359 degrees.
Qualitative /Categorical variable
A variable based on some characteristic
1.Dichotomous - a qualitative variable
that may choose one of the two values.
“Male” or “Female”
2. Trichotomous - a qualitative variable
that may choose one of the three
values.
“For”, “Against” or “Undecided”
3. Multinomous - a qualitative variable
that may choose one of the many
values.
“Always”, “Often”, “Seldom” or “Never”
Independent and Dependent Variables
Independent Variable - any variable that
is being manipulated.
Dependent Variable - any variable that is
being measured.
Imagine that researchers want to test the
effectiveness of a new weight loss medication.
They split participants into three groups: one group
gets a 0mg dosage (control), one group gets a 50mg
dosage, and the last group gets a 100mg dosage.
After six months, the participants’ weights are
measured.
What are the independent and dependent variables in
this experiment?
Imagine that researchers want to test the
effectiveness of a new weight loss medication.
They split participants into three groups: one group
gets a 0mg dosage (control), one group gets a 50mg
dosage, and the last group gets a 100mg dosage.
After six months, the participants’ weights are
measured.
What are the independent and dependent variables in
this experiment?
The independent variable would be dosage, because
dosage is being manipulated.
The dependent variable would be weight, because
weight is being measured.
Data and Information
Data – refers to facts about things such
as status in life of people, defectiveness
of objects or effect of an event in the
society
Information – a set of data that have
been processed and presented in a form
suitable for human interpretation, for the
purpose of revealing trends or patterns
about the population.
Measurement Scales
1.Nominal data (also known as
qualitative/categorical data) is data that is
split into categories.
For example: what kind of data would you
collect for the variable "Color"? You would end
up with information such as "red", "green", "blue",
and so on. This qualitative information is called
nominal data.
2. Ordinal data is data where order matters,
but distance between values does not.
For example: imagine three people in a race.
One finishes in 1st place, one in 2nd place,
and the last in 3rd place. This data can be
placed in order, but we can’t necessarily
measure the distance between values (maybe
1st place finished four seconds ahead of 2nd
place, and 2nd place finished nineteen
seconds ahead of 3rd place).
3. Interval data is data where order matters,
and distances between values are equal
and meaningful, and a natural zero is not
present.
For example: temperature (in Fahrenheit or
Celsius) is interval data. The difference between
10 degrees and 20 degrees is 10 degrees. The
difference between 80 degrees and 90 degrees is
10 degrees. The scale at any given point is
constant, while a measurement of 0 degrees does
not reflect a true "lack of temperature".
4. Ratio data is data where order matters,
distances between values are equal and
meaningful, and a natural, meaningful
zero is present.
For example: mass is ratio data. The difference
between 140 grams and 155 grams is 15
grams. The difference between 280 grams and
295 grams is 15 grams. The scale at any given
point is constant, and a measurement of 0
reflects a complete lack of mass.
Sources of Data
1.Primary source – first hand information
obtained usually through personal
interview or actual observation.
2.Secondary source – information taken
from works, reports, readings.
Methods of Collecting Data
• Direct or Interview Method – a person-to-person
interaction between interviewer and interviewee either
tape recorded or written to obtain exact information.
Advantage: Precise and consistent answers can be
obtained by modifying or rephrasing the questions
especially to illiterate respondents or to children under
study.
Disadvantage: Time, money and effort consuming and
applicable only to small population (except when
conducting a census).
• Indirect or Questionnaire Method – written
responses are obtained by distributing
questionnaires.
Advantage: Lesser time, money and effort are
consumed.
Disadvantage: Many responses may not be
consistent due to poor construction of the
questionnaire. The meaning of the questions vary
from respondents. Inconsistent responses can no
longer be modified, thus reducing valid numbers of
respondents.
• Fixed alternative question- limit the
subject response.
Example:
How often do you go to the library?
Once a day when there are
assign
Every day never at all
Once a week every other day
• Open ended questions- permit
free response from the subjects.
Example:
If you win a lotto what will you do
with your prize?
What are the common problems
you encounter in studying
statistics?
• Registration Method – enforced by both private and
public organization for recording purposes
Advantage: Organized data from an institution
can serve as a ready reference for future study or
for personal claims of people’s records.
Disadvantage: Problem arises only when an
agency doesn’t have a Management Information
System and if the system or process of
registration is not implemented well.
• Observation Method – scientific method of
investigation that makes possible use of all senses to
measure or obtain outcomes/ responses from the
object of study.
Advantage: Usually applied to respondents that
cannot be asked or need not speak, especially
when behaviors of persons/ culture of organization/
performance outcomes of employees/ students are
to be considered.
Disadvantage: Subjectivity of information sought
cannot be avoided.
• Experimentation – used when the objective is to
determine the cause and effect of a certain
phenomenon under some controlled conditions.
Advantage: There is objectivity of information
since a scientific method of inquiry is used. An
equal number of respondents with relatively
similar characteristics are being examined to
obtain the different effects of something applied to
the experimental group.
Disadvantage: It’s too difficult to find respondents
with almost similar characteristics. The whole
method must be repeated if the desired outcome
is not reached.
Population and Sample
Population: A collection, or set, of individuals or objects
or events whose properties are to be analyzed.
Two kinds of populations: finite or infinite.
Sample: A subset of the population. A selected group of
information taken from the population.
.
Let’s say you want to find the average GPA of a student at
your university. Your university has 20,000 students, and you
randomly select 100 students and ask them their GPAs.
Which is your population and which is your sample?
Let’s say you want to find the average GPA of a student at
your university. Your university has 20,000 students, and you
randomly select 100 students and ask them their GPAs.
Your population is the group you’re interested in studying (the
20,000 students), and your sample is a small group or a
subset (100 students)you’ve taken from the population.
Slovin’s Formula
used to calculate the sample size (n) given the population size (N) and
a margin of error (e).
it's a random sampling technique formula to estimate sampling size
It is computed as n =
whereas:
n = no. of samples
N = total population
e = error margin / margin of error
N
(1+Ne2)
.
Let’s say you want to find the average
GPA of a student at your university.
Your university has 20,000 students,
and you select 100 students and ask
them their GPAs.
What are N and n in this example?
Let’s say you want to find the average
GPA of a student at your university.
Your university has 20,000 students,
and you select 100 students and ask
them their GPAs.
What are N and n in this example?
N, the size of your population, is 20,000
n, the size of your sample, is 100
Sampling
• A process of choosing a
representative from a
population.
Census and Sampling Techniques
Census (Complete Enumeration) Collection of data
from a whole population rather than just a sample.
Example:
Doing a survey of travel time by MAEd students
Asking everyone at school is a census (of the
school).
But asking only 50 randomly chosen people is
a sample.
PROBABILITY SAMPLING
Is a sampling procedure where
every element of the population
is given a non zero chance of
being included in the sample.
Simple Random Sampling
Every member of the population(N) has
an equal chance of being selected for
your sample(n).
The best sampling method, as your sample
is almost guaranteed to be representative of
your population. However, it is rarely ever
used due to being too impractical.
• Fish bowl sample
Sampling without replacement-
the paper will never be return to
the container.
Sampling with replacement- the
paper will return to the
container.
• Table of random numbers
Stratified Sampling
With this method, the population(N) is split
into non-overlapping groups ("strata"), then
simple random sampling is done on each
group to form a sample(n).
Example: Splitting a population of students into
men and women, then sampling from each of the
two groups. This may allow us to collect the
same amount of information as simple random
sampling, but use less people.
Systematic Sampling
.
In this method, every nth individual from
the population(N) is placed in the
sample(n).
For example, if you add every 7th individual
to walk out of a supermarket to your sample,
you are performing systematic sampling.
Convenience Sampling
Easily obtained individuals from the
population(N) are placed in the sample(n).
Pick the easiest way of getting your sample.
This type of sampling is sometimes called
voluntary response sampling, because
individuals often select to be a part of the
sample. This can be a problem, because there
may be a difference between people who
choose to participate and people who don’t.
Cluster Sampling
Is suitable procedure if the
population is spread out over a
wide geographical area.
Non PROBABILITY SAMPLING
Is a sampling procedure where not
all of the elements of the
population are given a non zero
chance of being chosen for the
sample.
Quota Sampling
A sampling technique used popularly in
the field of opinion research.
Example:
a researcher wishes to interview the
thrilled avid fans of the PBA during a
championship game.
-" Do you mind being asked that question"
Disadvantage: Interviewers choose who
they like and may therefore select
impossible to estimate accuracy
Purposive Sampling
Purposive sampling is done when the
subject satisfies the criteria lay down by
the researcher.
Example:
A researchers wanted to find out, how the
drugs addicts did able to overcome the
treatment they experience inside the
rehabilitation center.
Incidental/ accidental Sampling
This method of drawing a sample is
very popular in market research.
Example:
A certain company who produces a brand of
cheese curl wanted to monitor the acceptance
of this product to the buying customers.