Submitted by:
HIMANI KALRA
MBA (GENERAL)
35
SUBMITTED TO:
DR. SIMMI
UNIVERSITY SCHOOL OF MGT. ,
K.U.K
Chapter 11 1
Sampling
Concept of sampling
Aims of Sampling
Merits and demerits of sampling
Types of sampling methods
Sampling errors
Sampling Distributions
Probability Distributions
The Central Limit Theorem
The method of selecting out of a given
population is called sampling.
Sampling method has three main
stages:
To select a sample.
To collect information from it.
To make inferences regarding the
characteristics of population.
Reduces time & cost of researcher (e.g.
political polls)
Generalize about a larger population
(e.g., benefits of sampling city
neighborhood)
In some cases (e.g. industrial
production) analysis may be
destructive, so sampling is needed
Saving of time & money
Intensive study
Organizational convenience
More reliable results
More scientific Methods
Less accurate
Wrong conclusions
Less reliable
Need of specified knowledge
(A)
Probability sampling methods
(4)
Simple random sampling
Stratified random sampling
Systematic random sampling
Cluster sampling
(A)
Non-profitability sampling methods
(1)
(2)
(3)
(1)
(2)
(3)
(4)
Convenience sampling
Quota sampling
Judgment sampling
snowball sampling
Every subset of a specified size n from the population has an equal
chance of being selected
Math
Alliance
Project
1. Get a list or sampling frame
a. This is the hard part. It must not
systematically exclude any one.
2. Generate random numbers.
3. Select one persons per random numbers.
The population is divided into two or more groups called strata,
according to some criterion, such as geographic location, grade level,
age, or income, and subsamples are randomly selected from each
strata.
Math
Alliance
Project
a.
b.
Select a random number, which will be known as k.
Get a list of people , or observe a flow of people (pedestrians
on a corner).
Select every kth person
Carefull that there is no systematic rhythm to the flow or list of
people.
If every 4th person on the list is , say rich or sincere or some
other consistent method avoid this method.
Every kth member ( for example: every 10th person) is
selected from a list of all population members.
Math
Alliance
Project
The population is divided into subgroups (clusters)
like families. A simple random sample is taken of
the subgroups and then all members of the cluster
selected are surveyed.
Math
Alliance
Project
Selection of whichever individuals are
easiest to reach
It is done at the convenience of the
researcher
Math
Alliance
Project
In it selection criteria based on
personal judgement that the element is
representative of the population under
study.
This is used primarily when there is
limited number of people that have
expertise in the area being researched.
In it selection of additional respondents
is based on refferals from the initial
respondents.
Determine what the population looks
like in terms of specific qualities.
Create quota based on those qualities.
Select people for each quota.
1.
2.
3.
4.
Sampling errors are those which arise due to the method of
sampling.they occur due to
faulty selection of sampling methods.
Substituting one sample for the other sample due to the difficulties
in collecting the samples.
Faulty demarcation of sampling units.
Variability of population which has differentcharateristics.
1.
2.
3.
4.
5.
6.
7.
8.
These are those which creep in due to human factors which always
varies from one investigator to another. These errors arise due to
Faulty planning.
Faulty selection of the sample units.
Lack of trained staff.
Negligence on the part of respondents.
Errors in compilations.
Errors due to wrong statistical measures.
Framing of wrong questionaires.
Incomplete investigation of the sample surveys.
Population - Entire group of items/individuals we want
information about.
Sample - The part of the population we actually examine in
order to gather information.
A parameter is a number that describes the population. It is
fixed, but we dont know its value.
A statistic is a number that describes a sample. Its value is
known, but it varies from sample to sample.
We often use statistics to estimate the unknown parameter
Statistical inference draws conclusions about a population on the basis
of data from a sample.
It also provides us with a statement of how much confidence we can
place in our conclusions.
We are in many cases interested in the mean value a variable takes in
the population.
Individual scores are random draws from a population
The sample mean is a guess about the true population mean
But how accurate (or efficient) is the sample mean?
Or, I could say, what is the standard deviation of the sample mean
I want to estimate the SD of the mean of n observations, i.e., how
much the mean is expected to vary from sample to sample
But I only get to observe one sample
Imagine that you could draw a sample and calculate a mean or
median or SD or whatever statistic again and again from a
population.
What would that distribution of this statistic look like?
Youre conceptualizing a sampling distribution.
What is its expected value and standard deviation?
If you know this, you can answer how likely it is that a sample
with a given mean (or median or SD) was drawn from a
population with known mean (or median or SD)
is a distribution of sample statistics (means, medians, etc.)
is a theoretical distribution that describes all possible means,
medians, etc., and the probability of obtaining each value.
can be visualized using simulations, but must be imagined when
collecting real data.
1.
They are approximately normal
When data in population are normally
distributed and even if they are not,
assuming large n
2.
They are centered at of the
population they are drawn from
Mean is unbiased
3.
Their standard deviation equals the
standard deviation of the individual
scores divided by the square root of
the sample
size (standard error of
SEM
X
themean)
n
100; 15
Assume IQ:
Sampling Distribution of Sample Means if n = 25
E ( X ) 100; X
15
3
25
Sampling Distribution of Sample Means if n = 100
Normal
Normal
E ( X ) 100; X
15
1.5
100
Sampling Distribution of Sample Means if n = 400
Normal
15
E ( X ) 100; X
0.75
400
n = 25
n = 100
n = 400
X 103 100
1.00
X
3
p = .1587
X 103 100
2.00
X
1.5
p = .0228
z103
z103
z103
X 103 100
4.00 p < .0001
X
.75
P-value= 0.05 level
0.6
0.5
Probability
0.4
n = 25
n = 100
n = 400
0.3
0.2
0.1
0
90.0
91.0
92.0
93.0
94.0
95.0
96.0
97.0
98.0
99.0 100.0 101.0 102.0 103.0 104.0 105.0 106.0 107.0 108.0 109.0 110.0
Sample Means
What Z score in a normal distribution separates the most extreme 5%
of the scores from the middle-most 95% of the scores?
1.96
n = 25; standard error of the mean = 3.00
X X 100
1.96
X
3
X 100 1.96 3.00 94.12
X 100 1.96 3.00 105.88
0.6
0.5
Probability
0.4
n = 25
0.3
0.2
0.1
0
90.0
91.0
92.0
93.0
94.0
95.0
96.0
97.0
98.0
99.0
100.0 101.0 102.0 103.0 104.0 105.0 106.0 107.0 108.0 109.0 110.0
Sample Means
What Z score in a normal distribution separates the most extreme 5%
of the scores from the middle-most 95% of the scores?
1.96
n = 100, standard error of the Mean = 1.50
X X 100
1.96
X
1.50
X 100 1.96 1.50 97.06
X 100 1.96 1.50 102.94
0.6
0.5
Probability
0.4
n = 100
0.3
0.2
0.1
0
90.0
91.0
92.0
93.0
94.0
95.0
96.0
97.0
98.0
99.0 100.0 101.0 102.0 103.0 104.0 105.0 106.0 107.0 108.0 109.0 110.0
Sample Means
What Z score in a normal distribution separates the most extreme 5%
of the scores from the middle-most 95% of the scores?
1.96
n = 400 , standard error of the Mean = 0.75
X X 100
1.96
X
.75
X 100 1.96 .75 98.53
X 100 1.96 .75 101.47
0.6
0.5
Probability
0.4
n = 400
0.3
0.2
0.1
0
90.0
91.0
92.0
93.0
94.0
95.0
96.0
97.0
98.0
99.0 100.0 101.0 102.0 103.0 104.0 105.0 106.0 107.0 108.0 109.0 110.0
Sample Means
Deriving the standard error of
the mean
This section is for your own edification regarding why
SEM = SD/sqrt(n).
You will not be tested on it.
If X is a random variable, var(X) is its variance
Sum of two variables X1 and X2 = X1 + X2
Variance sum law:
Var(X1 + X2) = var(X1) + var(X2)
Var(X1 - X2) = var(X1) + var(X2)
Constant multiplication rule:
If I multiply a random variable X by 2, I get 2X
Var(2X) = 22 * var(X)
Var(aX) = a2var(X)
Imagine I measure two subjects x1 and x2
They are drawn from random variables X1 and X2,
respectively
I assume they come from identical distributions
Their mean, or the sample mean, is (X1 + X2) / 2
What is the variance of that sample mean?
This tells me how accurate the sample mean is.
Why? Sqrt(var) = st. deviation = how far off the true mean I
typically am
X1 X2
find : var(
)
2
Variance of sampling distribution (for mean)
Assume independent
X1 and X2!
X1 X2
var(
) (1/2) 2 (var(X1) var(X2))
2
var(X1) var(X2) var(X)
Assume X1 & X2 have
identical distribution, with
same variance!
X1 X 2
2 var( X )
2
var(
) (1/ 2) * 2* var( X )
2
4
define : var(X1) var(X2) 2X
X1 X2 2 X
var(
)
2
2
Variance of sampling
distribution for mean of 2
subjects
Standard deviation of sampling distribution (for mean)
X1 X2 2 X
var(
)
2
2
X1 X 2
2X X
SD(
)
Std. of mean of 2 variables
2
2
2
X 1
X1 X2...
1
1
var(
N2
var(X1 X2 ...X N )
N2
var(X1) var(X2) ...var(X ) N
N
N N
2
Std. of mean of n variables
X 1 X 2... Xn X
SD(
)
n
n
Each subject is a random
variable
--> n subjects
This means
If
, or s, is our estimate of the sample standard deviation
(average deviation of an individual from the sample mean)
Is our estimate of how far off the sample mean is, on average, from
the true population mean
This is the standard error of the mean
our estimate of the standard deviation of the sampling distribution of means
The standard deviation of the sampling
distribution is called the standard error
No matter what we are measuring, the
distribution of any measure across all
possible samples we could take
approximates a normal distribution, as
long as the number of cases in each
sample is about 30 or larger.
If we repeatedly drew samples from a population and
calculated the mean of a variable or a percentage or, those
sample means or percentages would be normally
distributed.
The Central Limit Theorem
Standard error can be estimated from a single sample:
Where
s is the sample standard deviation (i.e., the
sample based estimate of the standard deviation of
population), and
the
n is the size (number of observations) of the sample.
Confidence intervals
Because we know that the sampling distribution is normal,
we know that 95.45% of samples will fall within two
standard errors.
95% of samples fall within 1.96
standard errors.
99% of samples fall within
2.58 standard errors.
www.slideshare.com
www.google.com
Fundamentals of statistics , s.c.gupta
Statistics for mba , t.r.jain
www.investopedia.com