Southeast-Asian J. of Sciences Vol. 3, No. 2 (2014) pp.
141-152
SAMPLE SIZE DETERMINATION FOR
NON-FINITE POPULATION
Inpong Luanglath
The International College (BUIC) &
Family Enterprise Research Center (FERC)
Bangkok University
e-mail: Lecturepedia@gmail.com
Abstract
This paper reviews the conventional approach to sample size calcula-
tion with finite and non-finite population. Two alternative sample size
determination methods are provided. The need and necessity of a test
sample as a requisite to minimum sample size determination is explained.
Standard error, margin of error and sampling errors are differentiated.
This paper presents two new methods of determining minimum sample
size: (i) n-hat method and (ii) multistage nonfinite population method.
Under both methods, the minimum sample size is n ≈ 30. The range
under MNP method is 30-40 counts. The claim under Weisberg and
Bowen that the minimum sample size for 0.05 error level n = 400 is
wrong. The standard error equation has been wrongly applied as the
sampling error and it is erroneously used as a tool for minimum sample
size determination.
1 Introduction to Standard Error
The standard error is the standard deviation of the distribution of a statistic.
(Evritt, 2003). The standard error is given by:
σ
SEx̄ = √ (1)
n
where σ is the estimated standard deviation and n is the sample size. This
”sample size” is not the same as the minimum sample size needed to test the
Key words: Multi-stage non-finite population method, n-hat, sample size, standard error.
141
Electronic copy available at: https://ssrn.com/abstract=2710455
142 Sample Size Determination for Non-Finite Population
condition x̄ = μ in the sample population inferential argument. The value for
σ may be determined through the Z-equation:
x̄ − μ
Z= √ (2)
S/ n
where x̄ = sample mean;μ = assumed population mean; S = sample standard
population; and n = sample size. By solving for σ, thus
x̄ − μ √
σ= n (3)
Z
The population is assumed to be normally distributed and is generally de-
scribed as N (0, σ 2 ) : normally distributed with mean 0 and variance σ 2 . This
assumption is implicit. However, the standard error equation makes no explicit
reference to such distribution. (Press, Flannery, Teulosky and Vetterling, 1992,
p. 465). The variance of a normally distributed is σ 2 = 1 for standard normal
distribution. The condition N (0, σ 2 ) may be written as N (0, 1). Therefore, the
standard error in (1) may be written as:
1
SE = √ (4)
n
The condition N (0, 1) produces the probability density function:
1
P (x)dx = √ exp − z 2 /2 dz (5)
2π
√
It assumes that√Z = (x̄ − μ)/σ; therefore dxσ
dz
= P (x)dx. Thus, SE = σ/ n
becomes SE = 1/ n in statement (4). Statement (4) has been misinterpreted
and misapplied to mean ”sampling error” for the purpose of calculating min-
imum sample size by many researchers. (Weisberg, 1971, p. 41). This mis-
application starts with the assumption that the standard error
is becausethe
error level in the normal distribution curve is set at: α = − 12 α + 12 α =
0.025 + 0.025 = 0.05 This leap of logic represents a misunderstanding of the
standard error equation and the misuse of the normal distribution curve (5) and
the normal distribution function φ(z) which gives the probability of a standard
normal variate to assume a value between [0, z] or:
z
1
φ(z) = √ exp[−z 2 /2]dx (6)
2π 0
which may be reduced to:
1 z
φ(z) = erf √ (7)
2 2
Electronic copy available at: https://ssrn.com/abstract=2710455
Inpong Luanglath 143
where erf is the error function (Abramowitz, 1972; Spanier and Oldham, 1987).
The error function for z is given by:
z
2 2
erf(z) = √ e−t dt (8)
π 0
The leading factor π2 is sometimes omitted. (Wittiker and Watson, 1990,
p. 341). The error function satisfies the identity:
err(z) = 1 − erfc(z) (9)
which may be reduced to:
2z 1 3
erf(z) = √ 1 F1 ; ; −z 2 (10)
π 2 2
where erf(z) = erfc and 1 F1 12 ; 32 ; −z 2 is a confluent of hypergeometric func-
tion of the first kind:
1 3 a a(a + 1)z 2
1 F1 ; ; −z 2 = 1 + z + +··· (10a)
2 2 b b(b + 1)2!
which may be reduced to:
∞
1 3 (a)k z k
1 F1 ; ; −z 2 = (10b)
2 2 (b)k k!
k=1
where (a)k and (b)k are Pochhammer symbols (Humbert, 1920, pp. 490-492).
Generally, the error function has the value between 0 and 1: erf(0) = 0
and erf(∞ = 1. For the normal distribution curve, the error is fixed at 0.05 as
a standard error because it is the area under the curve bounded by an interval
of two units of standard deviation covering 0.95 of the area under the curve
produced by the Gaussian function (equation (5)). The use of 0.05 from the
Gaussian√curve’s precision level (random error level) as the standard error in
SE = 1/ n for the purpose of determining minimum sample size is erroneous.
Defining SE as the minimum sample error is also incorrect. SE is the standard
error, not a sampling error. The misapplication comes in the form of: fix the
value of SE to 0.05
√ and solve√ for n. The following calculation illustrates this
point: 0.05 = 1/ n; 0.005( n) = 1; therefore n = 400.
Under this logic, the term n is equated to mean ”minimum sample size.”
The fallacy of this logic would lead the requirement of minimum sample size
to be 400 when the error level is 0.05. The value of 0.05 comes from the 5%
random error in the normal distribution curve under 0.95 confidence interval.
The second error of the logic of comes from the assumption that the data is
normally distributed without even testing to verify whether it is√indeed nor-
mally distributed. Equation (4) and its principal component: n is not a
144 Sample Size Determination for Non-Finite Population
distribution. The application of 0.05 as the standard value for sampling error
(4) is a misuse of the equation. The logic of using 0.05 as the ”error level” is
directly taken from the precision level of the probability distribution function
(PDF) for a normal distribution. However, equation (4) does not determine the
confidence level or the error level, i.e. random error of 0.05 explained by the
t-equation and Z-equation from probability distribution of the normal curve.
The corruption of n = 400 came from Weisberg and Bowen who insinuated the
standard error equation to mean sampling error.
2 Alternative Sample Size Determination n-hat
Sample size determination may be categorized into two scenarios: finite pop-
ulation and non-finite population. Where the population is finite, the com-
mon sample size determination method is classified as population proportion
method. The population proportion method has two requisites: (i) the popula-
tion total must be known, and (ii) the distribution must be normal. The paper
proposes an alternative sample size determination method. The first method
called n-hat appeared in the proceeding of the Silapakorn University 70th An-
niversary International Conference 2013. (Luanglath & Rewtrakunphaiboon,
2013, pp. 127-139). Below are the steps for n-hat calculation introduced in the
proceeding of that conference.
Step-1, the population is estimated from an initial sample randomly se-
lected from a population. The test statistic is used to estimate√ the population
mean. The sample test statisticis given
√ by: t = (x̄ − μ)/(S/ n), solve for the
population mean μ, thus: μ = t Sx / n − x̄.
Step-2, use the unit normal distribution Z-equation to solve for the es-
timated population
√ standard deviation. The Z-equation is given by: Z =
(x̄ − μ)/(σ/√ n), solve for the population standard deviation σ, thus, σ =
[(x̄ − μ)/Z] n.
In the foregoing two steps, n is the initial sample size and the standard
confidence interval of 0.95 is used.
Step-3, with a given initial sample n, compute the expected alpha (Ê =
α̂) for the sample by using the expected error equation. The expected error
equation is given by:
n − n[1 − df(α)]
Ê = of simply Ê = df(α) (11)
n
where df = n − 1 and α is the specified error level, i.e 0.05.
Step-4, estimate the raw sample size by using the following equation:
σ 2 n −Ê
ñ = (12)
S2
Inpong Luanglath 145
The value for ñ is called a raw estimate because it is calculated on the
basis of the estimated population variance, actual sample variance, and the
estimated error. Since the error may run from 0.01 to 0.99, the value of ñ does
not provide an accurate estimate for the minimum sample size. It is necessary
to engage in an experiment by allowing the errors to move towards the point
estimate (population mean) from the maximum error and from the minimum
error. The term ñ is the error migration in the experiment where the error is
allowed to move from 0.01 towards the estimated population mean (μ ) and ñ is
the maximum estimated error allowed to moved from the upper random error
region towards the point estimate μ. This experiment produces the following
specified error ratio:
ñ0.99
n∗ = 10.01 (13)
n1
where ñ0.99
1 = ñ1 /0.99 and ñ0.01
1 = ñ1 /0.01.
As part of the experiment, the specified error also has a specified error range
calculated by:
nr = n0.99
1 − n0.01
1 14)
From this range, the specified error median is determined by:
nM
r = nr /2 (15)
The minimum sample size for an unknown population N is given by taking
the square root of the specified error of minimum sample median thus:
n̂ = nM
r (16)
A numerical illustration of the calculation by the steps outlined above is in
order. Assume that the following initial sample data is given as a set of series
of independent events: {1, 1, 1, 1, 1, 11, 1, 0, 0}. There, eight events n = 10, the
initial sample mean is x̄ = 0.80 and the standard√deviation is Sx = 0.4216. The
estimated population mean follows: μ = t(Sx / n) − x̄ = 1.64(0.4216/3.16).
Finally, μ = 0.59.
Use the estimate error equation to calculate the estimated error among the
initial sample: Ê = df(α) = 9(0.05) = 0.45.
Next, find the specified error ratio at 0.99 and 0.01 points: n10.99 = ñ1 /0.99 =
20.99
0.99
= 21.22, and 0.01 error: ñ0.01
1 = ñ1 /0.01 = 20.99
0.01
= 2098.77
The range of the minimum sample size estimate may be demonstrated by
a line series thus:
146 Sample Size Determination for Non-Finite Population
The range of the specified error of the estimated sample size is: n− r =
n0.99
1 − n10.01 = |21.22 − 2098.77| = 2077.57.
The specified error maximum and minimum ratio is given by:
n∗ = ñ0.99
1 /n10.01 = 21.22/2098.77 = 0.01.
nr
The media for the range is given by: nM r = 2 = 2077.57/2√= 1038.78;
the minimum sample size is given by: nmin = nM r . Thus n̂ = 1038.78 =
32.23 32.
If this minimum sample is correct, it must meet the confidence√ interval test
for the population estimate in the Z-equation Z = x̄ − μ/(σ/ n) where n is
substituted by n̂ = n. Recall that Z0.95 = 1.65. If the calculation n̂ = 32.23 is
correct when substituting n̂ = n in the Z-equation, the result must satisfy the
following hypothesis statements: H0 : Zn̂ < Z0.95√; HA : Zn̂ > Z0.95 . √
The test calculation follows: Z = (x̄−μ)/(σ/ n̂) = (0.80−0.59)/ 32.23) =
21/0.07 = 2.91. The explanatory power of Z(2.91) = 0.9981 or 99.81%. This
means that the null hypothesis may be rejected because the decision rule for
the null hypothesis is ”accept the null hypothesis if Zons > Z0.95 . otherwise
reject.” In this case, the calculation shows that Zobs > Z0.95 . The minimum
sample size of n̂ = 32.23 is 99.81% accurate or the probability of error is 0.19%
or 0.00191
3 Multistage Nonfinite Population Method
(MNP) for Minimum Sample Size Determi-
nation
In 2013, the author introduced the n-hat method. In this paper, the author
proposes another alternative to minimum sample size determination called n-
omega or Multi-stage Nonfinite Population (MNP) method. This new method
is based on the specified alpha level. Using the random error: α level as the
basis to calculate the sample size, n-omega method allows the researcher to
determine minimum sample size at various level of confidence interval. A table
of minimum sample size is provided for various confidence level. n-omega is√an
improvement over Weisberg-Bowen’s heuristics approach based on SE = 1/ n
1 Agresti, Alan and Franklin, Christine (2012). Statistics: The Art and Science Learning
from Data, 3rd ed. Pearson Prentice Hall. Sect. 7.2; p. 321.
Inpong Luanglath 147
which is incorrect and not efficient. Under Weisberg-Bowen’s approach, the
minimum sample size for a 5% error tolerance is 400 counts whereas under
MNP (n-omega) it is 33.72 or approximately 34. This number is consistent
with other writers advocating for a minimum sample size of 30. Roscoe, for
instance, suggested that a minimum sample size should be 30 (Roscoe, 1975,
p. 163). Roscoe’s rationale is that a sample of 30 ensures the benefits of the
Central Limit Theorem. This minimum sample size of 30 was also echoed
by Agresti and Franklin (Agresti & Franklin, 2012, p. 312). However, prior
writers did not provide a precise method to arrive at the magic number 30.
This writing proffers the following steps in MNP method.
Firstly, the estimated sample size called n1 is obtained through the root of
the conventional Specified Precision Estimation (SPE) method thus:
Zσ
n1 = (17)
E
where Z0.95 = 1.65; σN(0,1) = 1 and E = 0.05 for 0.95 confidence interval.
For other percentage confidence interval, the value for each parameter may
be changed accordingly. For 0.95 confidence interval, the calculation follows:
n1 = Zσ/E = 1.65(1)/0.05 = 1.65 = 33.
Secondly, obtain the second estimate of the minimum sample size (n2 ) by
the conventional SPE method according to the following formula:
Z 2 σ2
n2 = (18)
E2
Following the above assumption for 0.95, the calculation follows: n2 =
Z 2 σ 2 /E 2 = (1.65)2 (1)2 /(0.05)2 = 2.723(1)/0.0025 = 1089.
Thirdly, after knowing the raw range between 33 and 1089 of possible sample
size, the square root of the median of the range between n1 and n2 is calculated
as n3 :
n1 − n2
n3 = (19)
2
The calculation for the 0.95 confidence interval example √ follows: n3 =
(1089 − 33)/2 which is n3 = (1089 − 33)/2 = 1056/2 = 528 = 22.98.
Fourthly, the raw estimate of 22.98 is put into a percentage range between
1% and 99% in order to codify the value of into a distribution space ω, thus:
ωmax = n3 /0.01
−→ ω = ωmax − ωmin
ωmin = n3 /0.99
The calculation for the distribution range follows: ωmax = 22.98/0.01 =
2297.83 and ωmax = 22.98/0.99 = 23.21. The actual range is ω = 2297.83 −
23.21 = 2274.61.
148 Sample Size Determination for Non-Finite Population
Lastly, the minimum sample size is the square root of the median of the
range, thus:
ω
nω = (20)
2
The calculation in the example continues: n = ω/2 = 2274.61/2 =
√
1137.31 = 33.72. The minimum sample size for 0.95 confidence interval with
0.05 error level is 33.72 or approximately 34 counts. A table for minimum
sample size for variance error level and confidence interval is given in Table 2.
Note that nω (n-omega) adopts a different approach to minimum sample
size calculation. Where n-hat requires the undertaking of a test sample, n-
omega does not require an empirical test sample. The efficiency of n-omega
is its reliance on the specified error level and the distribution of sample range
ω-space making the link between the minimum sample and the normal distri-
bution. Similar to the Yamane’s approach, nω uses the alpha level as the basis.
However, unlike Yamane’s method, nω does not depend on known population
size. In that aspect, nω is more useful than Yamane’s population proportion
approach. The mean for the minimum sample from 0.80 to 1.00 confidence
interval is 38.31. If the confidence interval range is run from 50% to 100%,
the mean minimum sample is 29.82 or approximately 30. The mathematical
rationale for a minimum sample of 30 has been found.
The sample size at various confidence intervals from 0.50 to 1.0 is tested for
distribution type under the Anderson-Darling test. The result of the Anderson-
Darling test confirms that the sample set is not normally distributed: A2 =
−7.90 while the null hypothesis is A∗2 = −8.28. . Under this circumstance,
in order to select the range of minimum sample size, the probability of data
occurrence is used as a guide. In so doing, a range of confidence interval between
Inpong Luanglath 149
0.80 to 1.0 is recommended. The table below shows the various percentage
probability of each occurrence according to confidence interval level.
The last four items: items 15, 16, 17 and 18 are used as the range of interest
with corresponding sample size of 42.04, 45.63, 47.81 and 61.91 respectively.
The probabilities of their occurrences are 0.785, 0.846, 0.877 and 0.981. These
probabilities are subjected to the adjacency test to verify randomness. The
adjacency test is given by: ???
The purpose of the adjacency test is to verify the random nature of a data
set. Assume that there is a data set: x1 : (i = 1.2. · · · ). The test statistic
depends on the number of n. For n > 25, the test statistic is given by:
n−1 (x − xi )2
Ln>25 =1− n i+1
i=1
(21)
2 i=1 (xi − x̄)2
For n > 25, equation (21) approximately follows a normal distribution with
mean zero: x̄ = 0 and the variance is given by:
(n − 2)
Sx2 = (22)
(n − 1)(n + 1)
150 Sample Size Determination for Non-Finite Population
For n < 25, the test statistic is given by:
n−1 2
i=1 (xi+1 − xi )
Ln<25 = n 2
(23)
i=1 (xi − x̄)
The null hypothesis may be rejected if the test statistic lies outside of the
lower and upper bound of the critical value. The hypothesis statement follows:
H0 : Labs < L0.05 ; HA : Lobs > L0.05 . The critical value of L is given in a range
of lower and upper value. If the test statistic value falls within this range, the
null hypothesis cannot be rejected. Recall that the null hypothesis states that
the pattern of the data is random. If the test statistic for the observation falls
out of the range, i.e. lower than the lower bound or higher than the upper
bound, the series is not random.
The result of the adjacency test shows that L(obs) = 17.81 compared to
the range of the null hypothesis for random number: 0.78 < L(4) < 3.22. The
probabilities of the four minimum sample sizes are not random occurrence. The
sample sizes for these four probabilities are: 42.20, 45.63, 47.81 and 61.91. The
chi-square test for homogeneity follows:
(n − 1)S 2
χ2 (24)
σ2
The sample size is 4; the mean is 49.35; the sample variance is 75.80. The value
for σ 2 may be estimated through the Z-equation.
Inpong Luanglath 151
The estimated variance is σ 2 = (8.65)2 = 74.88. The chi-square test calcu-
lation follows:χ2 = (n − 1)S 2 /σ 2 − (4 − 1)75.80/74.88 = 227.40/74.88 = 3.04.
The hypothesis and decision rule follows: H0 : χ24 < 9.50; HA : χ24 > 9.50.
”Reject the null hypothesis if χ24 > 9.50.” According to the calculation, the
null hypothesis cannot be rejected. It means that the data set 42.20, 45.63,
47.81 and 61.91 may be best fitted into the normal distribution curve. These
minimum sample sizes are homogeneous and, therefore, the set of four minimum
sizes 42.20, 45.63, 47.81 and 61.91 may used as the basis for estimating the
estimated minimum sample size.
Recall that the descriptive statistics of these 4 items were: x̄ = 49.35; S =
8.71 and n = 4. The inferential statistics are: μ = 42.21 and σ = 8.65. The
range of the estimated mean is 2σ = 2(8.65) = 17.31. Therefore, the upper
limit is μ + 2σ = 42.21 + 17.31 = 59.52. and the lower limit is μ − 2σ =
42.21 − 17.31 = 24.90. The difference between the upper and lower limits is
obtained by M ax − M in = 59.52 − 24.90 = 34.61. The estimated minimum
sample size is given by:
n0.01 − n−0.99 ω̂
< n >= or simply < n >= (25)
2 2
The calculation follows:
34.61−0.01 − 34.61−0.99 3461.42 − 34.96
< n >= = = 41.39.
2 2
152 Sample Size Determination for Non-Finite Population
The estimated minimum sample is 41 counts or 40. The percentage confidence
is as high as μ + 2σ = 0.81 + 0.16 = 0.97 using the data set 0.785, 0.846, 0.877
and 0.981 as the basis. Recall the mean for the minimum sample size for the
confidence interval from 0.50 to 1.0 was 30 counts. Therefore, a range between
30 and 40 counts would be reasonable sample size with confidence level of 0.95
to 0.97.
4 Conclusion
This paper has clarified the proper definition and use of the standard error
equation. It is erroneous to use the standard error equation to determine
minimum sample size. The misuse of the standard error equation as a source for
minimum sample size determination is traced back to a 1971 book by Weisberg
& Bowen. This paper points out that error. The paper provides two new sample
size calculation methods. The first method called n-hat method was introduced
in 2013. The second method called Multistage Nonfinite Population: MNP or
n-omega (nω ) method appears for the first time in this conference paper. Both
methods provide efficient means for minimum sample size determination. MNP
is a new contribution to the field of research methodology in social science.
References
[1] Abramowitz, M. and Stegun, I. A. (Eds.) (1972). ”Error Function and Fresnel Inte-
grals.” Ch. 7 in Handbook of Mathematical Functions with Formulas, Graphs, and
Mathematical Tables, 9th printing. New York: Dover, pp. 297-309.
[2] Agresti, Alan and Franklin, Christine (2012), Statistics: The Art and Science Learning
from Data, 3rd ed. Pearson Prentice Hall. Sect. 7.2; p. 321.
[3] Everitt, B.S. (2003), “The Cambridge Diction of Statistics”, CUP. ISBN 0-521-81099-X.
See entry for ”standard error.”
[4] Humbert, P. (1920), Sur les function hypocylindriques, C.R. Acd. Sci. Paris 171, 490-
492.
[5] Luanglath, P. I. and Rewtrakulpaiboon (2013). Determination of Minimum Sample
Size for Film-Induced Tourism Research.” Silapakorn 70th Anniversary International
Conference 2013, Towards the Next decade of Hospitality and Creative Economics:
Looking Forward to 2020. December 1st - 3rd, 2013, Bangkok, Thailand, Conference
Proceeding, pp. 127-139.
[6] Press, W.H.; Flannery, B.P., Teukoksky, S.A., and Vetterling, W.T. (1992), “Numerical
Recipes in FORTRAN: The Art of Scientific Computing”, 2nd ed. Cambridge, England:
Cambridge University Press. p. 465.
[7] Spanier, J. and Oldham, K. B. (1987). The Error Function erf(x) and Its Complement
erf(x), Ch. 40 in An Atlas of Functions, Washington, DC: Hemisphere, pp. 385-393.
[8] Watson, G. N. (1928), Theorems Stated by Ramanujan (IV): Theorems on Approximate
Integration and Summation of Series, J. London Math. Soc. 3, 282-289.
[9] Weisberg, Jon A. & Bowen, Bruce D. (1971). “Introduction to data Analysis”, p. 41.
[10] Whittaker, E. T. and Robinson, G. (1967). The Error Function. §92 in The Calculus
of Observations: A Treatise on Numerical Mathematics, 4th ed. New York: Dover, pp.
179-182.
[11] Wittaker, E.T. & Watson, G.N. (1990), “A Course in Modern Analysis”, 4th ed. Cam-
bridge University Press, p. 341.