Revisiting Sampling Concepts
Population A  population  is all the possible members of a category  Examples:  the heights of every male or every female the temperature on every day since the beginning of time Every person who ever has, and ever will, take a particular drug
Sample A  sample  is some subset of a population Examples: The heights of 10 students picked at random The participants in a drug trial Researchers seek to select samples that accurately reflect the broader population from which they are drawn.
Population Sample Sample  Statistics Population Parameters Inference Samples are drawn to infer something about population
Reasons to Sample Ideally a decision maker would like to consider every item in the population but; To Contact the whole population would be time consuming e.g. Election polls The cost of such study might be too high  In many cases whole population would be  consumed  if every part of it was considered The Sample results are adequate
Probability Vs Non Probability Sampling Probability Sampling Drawing Samples in Random manner Using random numbers  Writing names on identical cards or slips and then drawing randomly Choosing every nth item of the population First dividing the population into homogeneous groups and then drawing samples randomly
Probability Vs Non Probability Sampling Non Probability Sampling man-on-the-street interviews call-in surveys readership surveys web surveys
Types of Variables Qualitative Quantitative Discrete Continuous Categorical  Numerical
Sampling Error “Sampling error is simply the difference between the estimates obtained from the sample and the true population value.” Sampling Error = X -  µ Where X   = Mean of the Sample µ  = Mean of the Population
Validity of Sampling Process
Sampling Distributions A distribution of  all possible  statistics calculated from  all possible  samples of size n   drawn from a population is called a  Sampling Distribution. Three things we want to know about  any  distribution? –   Central Tendency –  Dispersion –  Shape
Sampling Distribution of Means Suppose a population consists of three numbers 1,2 and 3 All the possible samples of size 2 are drawn from the population Mean of the Pop ( µ)  = (1 + 2 + 3)/3  =  2 Variance Standard Deviation  = 0.82
Distribution of the Population
Sampling distribution of means n = 2
= µ   = 0.6 3 3,3 9 2 Mean of SD 2.5 3,2 8 2 3,1 7 2.5 2,3 6 2 2,2 5 1.5 2,1 4 2 1,3 3 1.5 1,2 2 1 1,1 1 Sample Mean Sample Sample #
 
The population’s distribution has far more variability than that of sample means As the sample size increases the dispersion becomes less and in the SD 0.6 < 0.8 = µ   <
The mean of the sampling distribution of ALL the sample means is equal to the true population mean. The standard deviation of a sampling distribution called  Standard Error  is calculated as
Central Limit Theorem …… The variability of a sample mean decreases as the  sample size increases If the population distribution is normal, so is the sampling distribution For ANY population (regardless of its shape) the distribution of sample means will approach a normal distribution as  n  increases It can be demonstrated with the help of  simulation .
Central Limit Theorem …… How large is a “large sample”? It depends upon the form of the distribution from which the samples were taken If the population distribution deviates greatly from normality larger samples will be needed to approximate normality .
 
Implications of CLT A light bulb manufacturer claims that the life span of its light bulbs has a mean of 54 months and a standard deviation of 6 months. A consumer advocacy group tests 50 of them. Assuming the manufacturer’s claims are true, what is the probability that it finds a mean lifetime of less than 52 months?
Implications of CLT Cont From the data we know that µ   =  54  Months  = 6 Months  By Central Limit Theorem =  µ   =  54 =
54 o -2.35 0.0094 52
To find  ,we need to convert to  z -scores: From the Area table  =  0.4906 Hence, the probability of this happening is 0.0094. We are 99.06% certain that this will not happen
What can go wrong  Statistics can be manipulated by taking biased samples intentionally Examples  Asking leading questions in Interviews and questionnaires  A survey which showed that 2 out 3 dentists recommend a particular brand of tooth paste Some time there is non response from particular portion of population effecting the sampling design
How to do it rightly Need to make sure that sample truly represents the population Use Random ways where possible Avoid personal bias Avoid measurement bias Do not make any decisions about the population based on the samples until you have applied statistical inferential techniques to the sample.

Sampling 1231243290208505 1

  • 1.
  • 2.
    Population A population is all the possible members of a category Examples: the heights of every male or every female the temperature on every day since the beginning of time Every person who ever has, and ever will, take a particular drug
  • 3.
    Sample A sample is some subset of a population Examples: The heights of 10 students picked at random The participants in a drug trial Researchers seek to select samples that accurately reflect the broader population from which they are drawn.
  • 4.
    Population Sample Sample Statistics Population Parameters Inference Samples are drawn to infer something about population
  • 5.
    Reasons to SampleIdeally a decision maker would like to consider every item in the population but; To Contact the whole population would be time consuming e.g. Election polls The cost of such study might be too high In many cases whole population would be consumed if every part of it was considered The Sample results are adequate
  • 6.
    Probability Vs NonProbability Sampling Probability Sampling Drawing Samples in Random manner Using random numbers Writing names on identical cards or slips and then drawing randomly Choosing every nth item of the population First dividing the population into homogeneous groups and then drawing samples randomly
  • 7.
    Probability Vs NonProbability Sampling Non Probability Sampling man-on-the-street interviews call-in surveys readership surveys web surveys
  • 8.
    Types of VariablesQualitative Quantitative Discrete Continuous Categorical Numerical
  • 9.
    Sampling Error “Samplingerror is simply the difference between the estimates obtained from the sample and the true population value.” Sampling Error = X - µ Where X = Mean of the Sample µ = Mean of the Population
  • 10.
  • 11.
    Sampling Distributions Adistribution of all possible statistics calculated from all possible samples of size n drawn from a population is called a Sampling Distribution. Three things we want to know about any distribution? – Central Tendency – Dispersion – Shape
  • 12.
    Sampling Distribution ofMeans Suppose a population consists of three numbers 1,2 and 3 All the possible samples of size 2 are drawn from the population Mean of the Pop ( µ) = (1 + 2 + 3)/3 = 2 Variance Standard Deviation = 0.82
  • 13.
  • 14.
  • 15.
    = µ = 0.6 3 3,3 9 2 Mean of SD 2.5 3,2 8 2 3,1 7 2.5 2,3 6 2 2,2 5 1.5 2,1 4 2 1,3 3 1.5 1,2 2 1 1,1 1 Sample Mean Sample Sample #
  • 16.
  • 17.
    The population’s distributionhas far more variability than that of sample means As the sample size increases the dispersion becomes less and in the SD 0.6 < 0.8 = µ <
  • 18.
    The mean ofthe sampling distribution of ALL the sample means is equal to the true population mean. The standard deviation of a sampling distribution called Standard Error is calculated as
  • 19.
    Central Limit Theorem…… The variability of a sample mean decreases as the sample size increases If the population distribution is normal, so is the sampling distribution For ANY population (regardless of its shape) the distribution of sample means will approach a normal distribution as n increases It can be demonstrated with the help of simulation .
  • 20.
    Central Limit Theorem…… How large is a “large sample”? It depends upon the form of the distribution from which the samples were taken If the population distribution deviates greatly from normality larger samples will be needed to approximate normality .
  • 21.
  • 22.
    Implications of CLTA light bulb manufacturer claims that the life span of its light bulbs has a mean of 54 months and a standard deviation of 6 months. A consumer advocacy group tests 50 of them. Assuming the manufacturer’s claims are true, what is the probability that it finds a mean lifetime of less than 52 months?
  • 23.
    Implications of CLTCont From the data we know that µ = 54 Months = 6 Months By Central Limit Theorem = µ = 54 =
  • 24.
    54 o -2.350.0094 52
  • 25.
    To find ,we need to convert to z -scores: From the Area table = 0.4906 Hence, the probability of this happening is 0.0094. We are 99.06% certain that this will not happen
  • 26.
    What can gowrong Statistics can be manipulated by taking biased samples intentionally Examples Asking leading questions in Interviews and questionnaires A survey which showed that 2 out 3 dentists recommend a particular brand of tooth paste Some time there is non response from particular portion of population effecting the sampling design
  • 27.
    How to doit rightly Need to make sure that sample truly represents the population Use Random ways where possible Avoid personal bias Avoid measurement bias Do not make any decisions about the population based on the samples until you have applied statistical inferential techniques to the sample.