0% found this document useful (0 votes)
96 views27 pages

Sampling: Lecture Notes

This document provides an overview of key concepts in sampling for veterinary epidemiological research. It defines important sampling terms and distinguishes between probability and non-probability sampling. The document outlines the steps to developing a sampling plan, including defining objectives, identifying the target population, determining sample size, and selecting a sampling method. It describes advantages of sampling over a complete census and characteristics of a good sample.

Uploaded by

Ronnie Domingo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
96 views27 pages

Sampling: Lecture Notes

This document provides an overview of key concepts in sampling for veterinary epidemiological research. It defines important sampling terms and distinguishes between probability and non-probability sampling. The document outlines the steps to developing a sampling plan, including defining objectives, identifying the target population, determining sample size, and selecting a sampling method. It describes advantages of sampling over a complete census and characteristics of a good sample.

Uploaded by

Ronnie Domingo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Lecture Notes

SAMPLING

Session Objectives:

Upon completion of the lecture sessions, each student should be able to:
1. Explain the importance of sampling in veterinary epidemiological research;
2. Distinguish between probability and nonprobability sampling;
3. Understand the factors to consider when determining sample size;
4. Understand the steps in developing a sampling plan

1
Disease Surveys
Two types of cross-sectional study are commonly performed.

1. Censuses- In this kind of study, the investigator includes every unit in the target
population. This is doable if the population is small. Some claim this is the most accurate
and effective way of conducting a survey. In reality however, most investigations involve
populations too large to study and too expensive to undertake. In addition, the sample
size can be too large to manage and measure accurately.
2. Sample surveys or Surveys- A survey examines only a small part (sample) of the target
population.

General views about disease investigations


1. We cannot study the whole population so we sample it.
2. Complete census is often unnecessary, wasteful, and the burden on the public.
3. Taking a sample leads to sampling error, which is measurable
4. Good design and quality assurance ensure accuracy or validity
5. Appropriate sample size will ensure precision
6. Probability samples are the only ones that allow use of statistics

Advantages of Sampling

1. SPEED. A smaller team can be trained and mobilized to collect sample data for a shorter
period of time. Sampling is faster!
2. COST. Since the study will only need several paid individuals to study a small segment
of the population for a limited period of time, sampling is definitely cheaper than a
census that covers the whole population.
3. QUALITY. It allows a more thorough investigation of the elements that would be
impossible to apply to the whole population.

Resources required for surveys


Skilled manpower
Transportation
Communication equipment
Office and laboratory supplies
Computers, measurement instruments, and other pieces of equipment

2
Definitions

1. Sample- A sample is a part of the population, selected by the investigator to gather


information (measures) on certain characteristics of the original population

Population Sample
Infinite/finite size Finite size
Characterized by unknown parameters Characterized by measurable
parameters (e.g., mean,
standard dev.)

2. Sampling- is the process of selecting a small number of units from a larger defined target
group of units such that the information generated from the small group will allow inferences to
be made about the larger group.

3. Target population: This is any complete, or the theoretically specified aggregation of study
elements. It is usually the ideal population or universe to which research results are to be
generalized. For example, all buffaloes in the Philippines.

4. Study population- The study population is the population to which the results of the study will
be inferred. For example, all buffaloes in the Philippines except those in remote mountains and
islands. The study population depends upon the research question:

Research question Study population


How many rabies immunizations do dogs Dog population of Davao city
receive each year in Davao city?
How many dogs are brought to the small Small animal clinics in Quezon
animal clinics in Quezon City? City (not the dogs)
How many veterinarians in Bulacan are
engaged in food animal practice? Veterinarians of Bulacan province

5. Sampling unit (Basic sampling unit, BSU)- the units which are chosen in selecting the sample.
Animals
Herds
Villages

3
6. Sampling frame- A list of sampling units from which units to be sampled can be selected. In
most situations, it is difficult to get an accurate list. Sample frame error occurs when certain
elements of the population are accidentally omitted or not included on the list.

7. Sampling scheme- Method used to select sampling units from the sampling frame.

8. Sample size- the number of elements in the obtained sample.

9. Inference is the process of assuming that the disease status of the population is similar to the
disease status of the sample.

10. Sampling error- the difference between the value of the parameter being investigated and the
estimates of this value based on the different samples. For example, the difference between the
sample mean and the population mean.

11. Confidence level- a statement of how often you could expect to find similar results if the
survey were to be repeated, or the degree of certainty of obtaining the same results. It often
informs about how often the findings will fall outside the margin of error.

12. Confidence interval is a range in which we are fairly certain that the population value lies.

13. Parameter-the summary description of a given variable in a population. Example- population


mean, population variance, etc.

14. Statistic-the summary description of a given variable in a sample. Example- sample mean,
sample variance.

4
5
Characteristics of a good sample
1. REPRESENTATIVE. Taken at random so that every member of the population of data
has an equal chance of selection. Unbiased by the sampling procedure or equipment. The
sample possesses the characteristics of the target population.
2. ADEQUATE. Large enough to give sufficient precision;
3. OBTAINABLE. The sample can be collected or measured according to the sampling
design.
4. AFFORDABLE. The individual or organization doing the survey can collect the data at
the least possible cost.

Improving representativeness of the Sample


1. The sample needs to be representative of the population in terms of time: seasonality, day
of the week, time of the day.
2. The sample needs to be representative of the population in terms of place: urban, rural.
3. The sample needs to be representative of the population in terms of animals: age, sex,
breed.

Steps in survey studies


(Note- The statistician, the epidemiologist, and those who will use the data from the planned
survey must consult each other in order to arrive at the final survey design)

1. Define the objective


2. Describe the plan for analysis
3. Identify the target population
4. Create the sampling frame
5. Select the applicable sampling method
6. Calculate the sample size
7. Design and field-test the survey materials
8. Recruit and equip survey personnel
9. Conduct the survey
10. Monitor the field work
11. Encode and analyze data
12. Package the output

General Categories of Sampling Methods

A. Probability sampling- Every unit in the population has a known probability of being
selected. The rules and procedures for selecting the sample and estimating the parameters
are clearly defined.
B. Non-probability sampling- Probability of being selected is unknown

6
Comparison of probability and non-probability sampling

Probability sampling Non- Probability


sampling
Prone to selection bias No Yes

Can generalize results to Yes No


survey population
Can estimate precision of Yes No
survey estimates (i.e., use
statistical techniques)
Requires sample frame Yes No

Requires observance of fixed yes No


procedures that are sometimes
costly or unfeasible
Method replicable (important Yes No
for measuring trends
Source: A Johnson, WHO

Types of Nonprobability Sampling Methods


Non-probability samples are used primarily because they are easy to collect.
1. Convenience sampling- relies upon convenience and access. Often, the sampling units are
selected because they happen to be in the right place at the right time: the animals along
the road; the farms nearest to the researchers office.
2. Judgment sampling- relies upon belief that selected sampling units fit characteristics
3. Quota sampling- emphasizes representation of specific characteristics. It may be viewed
as two-stage restricted judgmental sampling. The first stage consists of developing
control categories, or quotas, of population elements (ex. Quotas= 50% males and 50%
females). In the second stage, sample elements are selected based on convenience or
judgment.
4. Snowball sampling- relies upon respondent referrals of others with like characteristics
Example: A researcher is studying piggery management but can only find five piggery
owners. He asks these piggery owners if they know any more. They give five. He asks
these piggery owners if they know any more. They give him several further referrals, who
in turn provide additional contacts. In this way, he manages to contact sufficient piggery
owners.

7
Types of Probability Sampling Methods
1. Simple, random sampling
2. Systematic sampling
3. Stratified sampling
4. Cluster sampling
5. Multistage sampling

Source: Research Methods 1 by R Boughner

1. Simple random sampling


Description- Simple random sampling is a method of probability sampling in
which every unit has an equal nonzero chance of being selected

Procedure
Number all units
Randomly draw units

8
How to generate random numbers
Random numbers can be obtained using your calculator, a computer program for random
number generation, a spreadsheet, printed tables of random numbers, or by the more
traditional methods of drawing slips of paper from a hat, tossing coins or rolling dice.

Advantages
Simple
Sampling error easily measured
Disadvantages
Need complete list of units
Does not always achieve best representativity

Problems with simple random sampling

Problem 1: Can require the selection of a large number of random numbers.


Solution: Use systematic sampling (i.e., sample people at regular intervals down
the sample frame).
Problem 2: Sample frames for an entire target population rarely exist and are too
impractical to construct.
Solution: Develop a sampling frame of larger units (clusters). Randomly select
clusters and construct a sample frame of individuals in the selected clusters.
Randomly sample individuals within those clusters.
Problem 3: Populations can be spread over a wide area, making logistics difficult.
Solution: Use cluster sampling, as it concentrates fieldwork in specific clusters.
Problem 4: The population consists of distinct sub-groups that we are interested in.
Solution: Make precise estimates for each sub-group (strata) by using stratified
sampling (i.e., take a sample of adequate size from each strata). If we want an
estimate for the entire population, we can combine the estimates for the strata if
we know the proportion of the population in each strata.

2. Systematic sampling
Description- Systematic random sampling is a method of probability sampling
in which the defined target population is ordered and the sample is selected
according to position using a skip or sampling interval.

9
Procedure
1: Obtain a list of units that contains an acceptable frame of the target population (N)
2: Determine the number of units in the list and the desired sample size (n)
3: Compute the skip interval (sampling interval calculated as k = N/n)
4: Draw a random number ( k) for starting
5: Beginning at the start point, select the units by choosing each unit that corresponds to
the skip interval
Advantages
Applicable to situations when no sampling frame is available.
Ensures representativity across list
Applicable to situations when the sampling units are too numerous to number for
purposes of simple random sampling.
Easy to implement
Disadvantage
Dangerous if list has cycles

Examples of systematic sampling


Example 1
There are 100,000 goats in the population and a sample of 1,000 is desired. In this case the
sampling interval, i, is 100 (from 100,000/1000). A random number between 1 and 100 is
selected. If, for example, this number is 45, the sample consists of elements 45, 145, 245, 345,
445, 545, and so on.

Example 2
Target Population size= N= 100
Desired Sample size= n= 20
Skip interval= N/n= 100/20= 5
Choose random number from 1 to 5 for starting= lets assume the number 3
Start with number 3 and take every 5th unit

10
3. Stratified sampling
Description- Stratified random sampling is a method of probability sampling in which the
population is divided into different subgroups (strata) and samples are selected randomly
from each stratum.
A major objective of stratified sampling is to increase precision without increasing
cost.
The strata should be mutually exclusive and collectively exhaustive in that every
population element should be assigned to one and only one stratum and no population
elements should be omitted.
Elements are selected from each stratum by a random procedure, usually simple
random sample (SRS).
The elements within a stratum should be as homogeneous as possible, but the
elements in different strata should be as heterogeneous as possible.

Procedure
1. Classify population into homogeneous subgroups (strata)
2. Draw random samples from each stratum
3. Combine results of all strata into a single sample of the target population
Advantage
More precise if variable associated with strata
All subgroups represented, allowing separate conclusions about each of them
Disadvantages
Sampling error difficult to measure
Loss of precision if small numbers sampled in individual strata

11
4. Cluster sampling
Description- Cluster sampling is a method of probability sampling in which the
population is divided into a large number of groups, called clusters. Then a random
sample of clusters is selected, based on a probability sampling technique such as SRS.
Every element found in each cluster selected may or may not be included in the study.

Clusters are mutually exclusive and collectively exhaustive subpopulations


For each selected cluster, you have two options regarding its elements:
1. Include all the elements in the sample (one-stage or single-stage cluster sampling
design)
2. Draw only a subset of the sampling units (two-stage or multi-stage sampling
design).
Elements within a cluster should be as heterogeneous as possible, but clusters themselves
should be as homogeneous as possible. Ideally, each cluster should be a small-scale
representation of the population.
The main reason for cluster sampling is to sample economically while retaining the
characteristics of a probability sample

Advantages
Simple: No list of units required
Less travel/resources required
Disadvantages
Imprecise if clusters homogeneous (Large design effect)
Sampling error difficult to measure

Difference between Cluster Sampling and Stratified Sampling


Although both types of sample involve dividing the population into groups, they follow
opposite sampling operations.
In a stratified sample, we sample individuals within every stratum. The sampling errors
involve variability within strata. Strata are supposed to be homogeneous as possible
and as different as possible from each other.
In single-stage cluster sampling, we have no source of sampling error within the clusters
because every case is being used. The variability is between the clusters.

12
5. Multistage sampling
Description- Multi-stage sampling is a method of probability sampling wherein sampling
a population is undertaken in different stages, with the sample unit being different at each
stage.

Procedure:
1. FIRST-STAGE. The population is first divided into a set of primary or first-stage
sampling units. For example, the researcher divided the Philippines into 15
regions. From this sampling frame, he randomly selected six (6) regions.
2. SECOND STAGE. Each of the selected units from the first-stage sampling is
further subdivided into secondary or second-stage sampling units. For example,
the researcher divided each of the selected six (6) regions into existing provinces.
From this sampling frame of provinces, he randomly selected two (2) provinces
per region.
3. ADDITIONAL STAGES. The procedure is repeated until the desired stage is
reached. The third stage may involve listing the commercial swine farms in each
province and a random sample of say 3 farms per province is selected. Once the
farm units have been selected, it may prove possible to construct a sample frame
of the animals within the units and sample these in turn, say 30 pigs per farm (this
procedure constitutes the fourth stage).

13
Advantages
No complete listing of population required
Most feasible approach for large populations
The complete sample frame is needed only at the first-stage sampling.
The reduction in places to visit for data collection, makes this sampling design cheaper.
Disadvantages
Several sampling lists
Sampling error difficult to measure

Problems with Survey Estimates


The estimate from a survey is never exactly identical to the actual value in the population, even if
all the procedures are done correctly.

For example, in a hypothetical population in which precisely 50.0% of goats less than 6 months
old have anemia, a very well-done survey of 300 young goats shows that 135 (45.0%) have
anemia.

Two explanations:

1. Bias - Something is wrong with the way the sampling was done or the measurements
taken.
2. Sampling error - Just by chance, even in the perfect survey, a sample selected randomly
from a population will almost never be exactly the same as the entire population.

14
Bias

Bias is the difference between survey result and population value due to:
Incorrect measurements, resulting in measurement bias
Selection of a non-representative sample, resulting in sampling bias

Bias can be present in surveys:


Even if sampling and analysis are done correctly
Even if the survey is done with a very large sample size

Examples of measurement bias


1. Instrument bias. Instrument bias occurs when calibration errors lead to inaccurate
measurements being recorded, e.g., an unbalanced weight scale.
2. Insensitive measure bias. Insensitive measure bias occurs when the measurement tool(s)
used are not sensitive enough to detect what might be important differences in the
variable of interest.
3. Expectation bias. Expectation bias occurs in the absence of masking or blinding, when
observers may err in measuring data toward the expected outcome. This bias usually
favors the treatment group
4. Recall or memory bias. Recall or memory bias can be a problem if outcomes being
measured require that subjects recall past events. Often a person recalls positive events
more than negative ones. Alternatively, certain subjects may be questioned more
vigorously than others, thereby improving their recollections.
5. Attention bias. Attention bias occurs because people who are part of a study are usually
aware of their involvement, and as a result of the attention received may give more
favorable responses or perform better than people who are unaware of the studys intent.

Examples of selection bias

1. Non-representative sample. Undercoverage occurs when some members of the population


are inadequately represented in the sample. For example, a survey sought to find out the
type of food given to dogs. The survey involved only those respondents with landline
telephones. It is possible, most of the respondents would be affluent owners who give
commercially prepared diets to their pets.
2. Nonresponse bias. Sometimes, individuals chosen for the sample are unwilling or unable
to participate in the survey. Nonresponse bias is the bias that results when respondents
differ in meaningful ways from nonrespondents. This is a common problem with mail
surveys. Response rate is often low, making mail surveys vulnerable to nonresponse bias.
3. Voluntary response bias. Voluntary response bias occurs when sample members are self-
selected volunteers. An example would be call-in radio shows that solicit audience
participation in surveys on controversial topics (cruelty against animals, eating of dogs,
etc.). The resulting sample tends to over represent individuals who have strong opinions.

15
When your survey records a different result, consider the following questions:
Did you perform the measurements correctly?
Did you sample from the right animals?

Usually, bias cannot be quantitatively measured or calculated.

Sampling error

Sampling error is the difference between survey result and population value due to the random
selection of animals or farms to include in the sample. Sampling error is the error that occurs just
because of chance (some call it bad luck).No sample is a perfect mirror image of the
population

Unlike bias, sampling error can be predicted, calculated, and accounted for. There are several
measures of sampling error:
Confidence intervals
Standard error
Coefficient of variance
P values
Others

Confidence Interval

A confidence interval gives an estimated range of values which is likely to include an unknown
population parameter, the estimated range being calculated from a given set of sample data.
The width of the confidence interval gives us some idea about how uncertain we are about the
unknown parameter (see precision). A very wide interval may indicate that more data should be
collected before anything very definite can be said about the parameter.

Confidence Limits

Confidence limits are the lower and upper boundaries / values of a confidence interval, that is,
the values which define the range of a confidence interval.

Confidence Level

The confidence level is the probability value (1-) associated with a confidence interval. It is
often expressed as a percentage. For example, say = 0.05= 5%, then the confidence level is
equal to (1-0.05) = 0.95, i.e. a 95% confidence level.

Confidence Level is the likelihood - expressed as a percentage - that the results of a test are real
and repeatable, and not just random. The idea is based on the concept of the "normal distribution
curve," which shows that variation in almost any data (such as the heights of all Landrace

16
breeding boars, or the amount of rainfall in June) tends to be clustered around an average value,
with relatively few individual measurements at the extremes.

In surveys, the most common measurement of sampling error is the 95% confidence interval.

What does "95% confidence interval" mean?


Excerpts from Source: [Link]

If you repeat the same survey many times and measure the same indicator with the same
methodology and same sample size, 95% of the results of these surveys will have confidence
intervals which overlap the true value for this indicator in the population.
The drawing below is another way of visualizing confidence intervals. It imagines that a
single survey is a dart which produces a single estimate of some health outcome, for example,
the prevalence of having a safe water supply. If the sampling error is large because the sample
size of the survey was small, the dart might have a large circle of uncertainty. We may be 95%
sure that the true population value is somewhere in the circle, but if the circle is large, this survey
result may not be very useful. If the sampling error is small because the sample size was large,
the circle of certainty may be much smaller, as shown on the right. Now if we are 95% sure that
the true population value is within this small circle, the survey result may be very useful.

Factors that Affect Confidence Intervals


There are three factors that determine the size of the confidence interval for a given confidence
level. These are: sample size, percentage and population size.

Sample Size - The larger your sample, the more sure you can be that their answers truly
reflect the population. This indicates that for a given confidence level, the larger your
sample size, the smaller your confidence interval. However, the relationship is not linear
(i.e., doubling the sample size does not halve the confidence interval).
Percentage - Your accuracy also depends on the percentage of your sample that picks a
particular answer. If 99% of your sample said "Yes" and 1% said "No" the chances of
error are remote, irrespective of sample size. However, if the percentages are 51% and
49% the chances of error are much greater. It is easier to be sure of extreme answers than

17
of middle-of-the-road ones. When determining the sample size needed for a given level
of accuracy you must use the worst case percentage (50%). You should also use this
percentage if you want to determine a general level of accuracy for a sample you already
have. To determine the confidence interval for a specific answer your sample has given,
you can use the percentage picking that answer and get a smaller interval.
Population Size - How many animals are there in the group your samples represent? This
may be the number of broiler chickens in a province you are studying, the number of pigs
vaccinated with hog cholera, etc. Often you may not know the exact population size. This
is not a problem. The mathematics of probability proves the size of the population is
irrelevant, unless the size of the sample exceeds a few percent of the total population you
are examining. This means that a sample of 500 people is equally useful in examining the
opinions of a state of 15,000,000 as it would a city of 100,000. For this reason, the
sample calculator ignores the population size when it is "large" or unknown. Population
size is only likely to be a factor when you work with a relatively small and known group
of people.

Accuracy and precision


Precision defined
1. In statistics, precision is defined as the inverse of the variance of a measurement or
estimate. -- Last. A Dictionary of Epidemiology. 1988
2. Precision in epidemiologic measurements corresponds to the reduction of random (or
sampling) error. - Rothman. Modern Epidemiology. 1986.
3. If the results are precise, they do not vary if the measures are repeated
4. Precision does not imply accuracy.
5. Precision is estimated by the confidence interval around the measure.

Narrow CI = precise
Wide CI = imprecise

Accuracy defined
The degree to which a measurement, or an estimate based on measurements, represents
the true value of the attribute that is being measured- -- Last. A Dictionary of
Epidemiology. 1988
Accuracy vs. precision
A measurement (or in our case, the estimate from a survey) is precise if it obtains similar
results with repeated measurement (or repeated surveys).
A measurement is accurate if it is close to the truth with repeated measurement (or
repeated surveys).
A faulty measurement may be expressed precisely but may not be accurate.
Measurements should be both accurate and precise, but the two terms are not
synonymous- Last's Dictionary of Epidemiology

18
Precision of the estimate of Prevalence

Expression Selected value Applied to an assumed


prevalence of 30%
Absolute error 2% 28-32
Relative error 2% 29.4-30.6

19
Accurate and precise

Accurate but not precise

Not accurate but precise

Not accurate and not precise

20
Sample Size Estimation

Main factors that determine the sample size


1. Variance- the amount of variation in the population. When the variance is high, more
animals are needed because each of the selected animals is likely to be quite different.
2. Desired precision- This is usually measured by the width of the confidence interval. A
very wide confidence interval indicates we are not certain where the true value lies. On
the other hand, a narrow confidence interval indicates that we are certain the true value
lies within a narrow range.
3. Desired confidence level- By convention, we usually use a 95% confidence level.
4. Proportion (or percentage) of the sample that have (or expected to develop) the condition
of interest. If the latter is uncertain it recommended that a calculation be done for a value
of 50% (the most conservative estimate).
5. Other factors- required by specific survey types.

Three common sample size calculations:


Estimation of prevalence or incidence of disease
Detection of disease
Detection of a difference in prevalence or incidence of disease between groups

More complicated studies (e.g. those involving multiple regression models, survival analysis, or
longitudinal data with repeated measurements) may require specialized software and additional
biostatistical input to calculate sample size.

21
To estimate prevalence of disease with
a sample from a large population
(theoretically finite)

Formula:
This formula assumes random sampling and that
2
n = Z p(1 - p) the sample size is small relative to the population
------------- size (practically this is true when the sample size is
less than about 10% of the population size).
e2

where,
Z is The critical value obtained from a standard normal distribution. For each level of
confidence there is a corresponding value of z. See table below:

Confidence level Z value


90 % 1.645
95 % 1.96
99 % 2.575

e is the margin of error (e.g., 0.1 = 10%, and 0.05 = 5%); same as desired accuracy or
absolute precision

p is the estimated value for the proportion of a sample that have the condition of interest
(e.g., .50 for 50%). Theoretically this is based on the assumption that the test that
estimates this proportion is perfectly sensitive and specific, but the calculation can
also assume the proportion estimated is the apparent (test-based) prevalence

Level of confidence 0.95


Desired accuracy*
(absolute)
Expected
0.1 0.05 0.01 0.001
Prevalence
0.1 35 138 3457 345731
0.2 61 246 6146 614633
0.3 81 323 8067 806706
0.4 92 369 9220 921950
0.5 96 384 9604 960365
0.6 92 369 9220 921950
0.7 81 323 8067 806706
0.8 61 246 6146 614633
0.9 35 138 3457 345731
Derived from ELECTRONIC FIELD SURVEY TABLES ver. 1, by Cannon and Roe
*Desired absolute precision

22
The expected prevalence can be based on previous investigations. If there are none, a pilot study
may be undertaken.

Example:
Calculate the sample size needed to study a disease with an expected prevalence of 20%. Assume
a level of confidence of 95% and a desired absolute precision of 5%.

1.962 x 0.20 (1-0.20)


0.052

3.8416 x 0.16 = 245.86 or 246


0.0025

Notes:

p The assumed prevalence of the event in the population under study


(usually based on previous studies, field data or the literature). When no
information is available a value of 0.50 will yield the maximum sample
size.
Acceptable values: 0 and 1.
e A measure of the desired precision. For example, if you assume a
prevalence of 0.40 and a relative error of 0.10, the result will have a
precision of 0.04 (that is, 0.40 0.10). In this case 0.04 is the absolute
error. In general, the relative error should be 0.20.
Acceptable values: 0 and 1.
Level of confidence The confidence that the user wants to have in the results.
Acceptable values: 90%, 95% or 99%.
Population size The number of individuals in the population under study.
Acceptable values: any positive whole number.

23
To estimate prevalence of disease with
a sample from a small (finite)
population

When population sizes are less than 10 times the estimated sample size, it is prudent to use the
Finite Population Correction (FPC) to calculate a corrected sample size.

Formula:

Where n= required sample size


n= sample size based on a large (infinite) population- see previous section
N= size of the study population
Example:

Assume a small ruminant farm with 800 sheep and an expected prevalence of 20% for caseous
lymphadenitis. How many samples are needed to give an estimate of prevalence within 10% of
the true value with 95% confidence?

Calculated sample size for a large population= 61

Level of confidence 0.95

Expected Desired
Prevalence accuracy 0.1 0.05 0.01 0.001
0.2 61 246 6146 614633

1/n=1/61 + 1/800 = 1/56.67

Finite Population Correction


Population size: 800
57 188 708 799

Answer: a sample (n) of 57 animals is required which is fewer than 61 required


from a large population to give an estimate with the same precision.

24
To Detect the presence of a disease
(source: Hawkins, C. [Link] Field Survey Tables)

Percentage of diseased animals in the population (d/N), OR percentage sampled and found clean (n/N)

Population
Size (N) 50% 40% 30% 25% 20% 10% 5% 2% 1% 0.5% 0.1%
10 4 5 6 7 7 10 10 10 10 10 10
20 4 5 7 8 10 15 19 20 20 20 20
30 5 6 8 9 11 19 26 30 30 30 30
40 5 6 8 10 12 21 31 39 40 40 40
50 5 6 8 10 12 22 35 48 50 50 50
60 5 6 8 10 12 23 37 55 60 60 60
70 5 6 8 10 13 24 40 62 69 70 70
80 5 6 8 10 13 24 42 68 78 80 80
90 5 6 9 10 13 25 43 73 87 90 90
100 5 6 9 10 13 25 44 77 95 100 100
120 5 6 9 10 13 26 46 85 110 119 120
140 5 6 9 11 13 26 48 92 123 138 140
160 5 6 9 11 13 26 49 97 135 156 160
180 5 6 9 11 13 27 50 101 146 174 180
200 5 6 9 11 13 27 51 105 155 190 200
250 5 6 9 11 14 27 52 112 174 227 250
300 5 6 9 11 14 28 53 117 189 259 300
350 5 6 9 11 14 28 54 121 201 287 350
400 5 6 9 11 14 28 55 124 210 310 400
450 5 6 9 11 14 28 55 127 218 331 450
500 5 6 9 11 14 28 56 129 225 349 499
600 5 6 9 11 14 28 56 132 235 379 596
700 5 6 9 11 14 28 56 134 243 402 690
800 5 6 9 11 14 28 57 136 249 421 781
900 5 6 9 11 14 28 57 137 254 437 868
1000 5 6 9 11 14 29 57 138 258 450 950
1200 5 6 9 11 14 29 57 140 264 471 1101
1400 5 6 9 11 14 29 58 141 269 487 1235
1600 5 6 9 11 14 29 58 142 272 499 1354
1800 5 6 9 11 14 29 58 143 275 509 1459
2000 5 6 9 11 14 29 58 143 277 517 1553
3000 5 6 9 11 14 29 58 145 284 542 1894
4000 5 6 9 11 14 29 58 146 288 556 2108
5000 5 6 9 11 14 29 59 147 290 564 2253
6000 5 6 9 11 14 29 59 147 291 569 2358
7000 5 6 9 11 14 29 59 147 292 573 2436
8000 5 6 9 11 14 29 59 147 293 576 2498
9000 5 6 9 11 14 29 59 148 294 579 2547
10000 5 6 9 11 14 29 59 148 294 581 2588
"Infinite" 5 6 9 11 14 29 59 149 299 598 2994

25
Notes:
1. The previous table gives the sample size (n) required to be 95% certain of including at
least one positive if the disease is present at the specified level.
Example: if the expected percentage of positives is 20% and the population size is
878 (use 900), the required sample size to be 95% certain of detecting at least one
positive is 14.
2. The table can also be used to determine the upper limit to the number (d) of diseased
animals in a population given that the specified proportion were tested and found to be
negative.
Example: if the 10% sample taken from a population of 2000 were all found to be
negative, the 95% confidence limit for the number of positives is 29

26
To estimate the mean of a continuous
variable

Reference: Epidemiology Course notes, 2001. School of Biomedical Sciences, Department of


Microbiology and Immunology.

Applications for body height, body weight, blood pressure, etc

Formula

Where SD = standard deviation of the variable


n = required sample size
e = acceptable error (ie., the precision of measurement)
(Z0.95)2 = (1.96)2 or 3.84

Example

Determine the number of piglets required for an experiment to measure the mean increase in
weight over a 30-day feeding trial using a new diet. Assume the following conditions- the
standard deviation of the group is 100 gm, the acceptable error is 50 gm, and with 95 %
confidence.

n = 3.84 (100)2
(50)2
= 38,400
2,500

= 15.36 or 16

27

You might also like