0% found this document useful (0 votes)
12 views12 pages

EM-104-Module

The document outlines the role of statistics in research, emphasizing its importance in data collection, analysis, and interpretation. It details various statistical methods, types of data, sampling techniques, and approaches to determining sample size, while also discussing measures for summarizing data. The document serves as a comprehensive guide to understanding the fundamentals of statistics and its application in research.

Uploaded by

marlon adarme
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views12 pages

EM-104-Module

The document outlines the role of statistics in research, emphasizing its importance in data collection, analysis, and interpretation. It details various statistical methods, types of data, sampling techniques, and approaches to determining sample size, while also discussing measures for summarizing data. The document serves as a comprehensive guide to understanding the fundamentals of statistics and its application in research.

Uploaded by

marlon adarme
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

STATISTICS

I. INTRODUCTION
The Role of Statistics in Research. Research is a critical study into the nature of, reasons
for, and consequences of a set of conditions. Research is “re-search,” meaning a voyage of
discovery. “Re” means again and again, and “search” means a voyage of knowledge. It leads to
the enrichment of knowledge.
In research, Statistics functions as a tool for data collection, analysis, and interpretation of
results. Statistics is the tool of all sciences.
The objectives of research are the discovery of new facts and the revision of accepted
theories or laws based on newly discovered evidence.

Statistics – the science which deals with the collection, presentation, analysis and interpretation
of quantitative data, as well as the theories which are used as bases of the analysis of such data.
Statistics is the science of making sense of data

Areas of Statistics: Statistical Theory and Statistical Methods

Statistical Methods: Its Objective and Scope are to:


1. To describe data sets
2. To use sample data to make inferences about a population.
DIVISIONS OF STATISTICAL METHODS
1. Descriptive Statistics is the branch of statistics devoted to the summarization and
description of data (sample or population)
2. Inferential Statistics is the branch of statistics concerned with using sample data to make
inferences about a population data

STEPS IN MAKING SENSE OF DATA:


Step one: Gathering of data Using Surveys and Experimentations
Step two: Summarizing data Tabular, Graphical and Numerical methods
Step three: Analyzing data Tests of Hypotheses on Means, Variances and Proportions,
Regression and Correlation Analyses, Analyses of
Variance, and Nonparametric Analyses
Step four: Communicating Results

Variable – is a characteristic of interest measurable on each and every individual or object in the
group.

Data refers to the gathered measurements or observations on variable(s) under investigation.

TYPES OF DATA
1. Qualitative data (or categorical or attribute data) can be separated into different
categories that are distinguished by some nonnumeric characteristics.
2. Quantitative data consist of numbers representing counts or measurements. May either
be discrete or continuous data.
 Discrete data result from either a finite or a countable number of possible
values.
 Continuous data result from infinitely many possible values that can be
associated with points on a continuous scale in such a way that there are
no gaps or interruptions.
TYPES OF DATA ACCORDING TO LEVELS OF MEASUREMENT
1. Nominal – when assigned numerical values, the values do not contain quantifiable
information, so that mathematical operations cannot be performed on these values.
Ex: name of school, province, brand of cellphone, sex, course in college, and subject in
school. All these variables can be assigned numbers. For example, for sex, 0 for female
and 1 for male can be used.

2. Ordinal – the values of the variable can be arranged in order of magnitude. Ex: year level
(1st, 2nd,…), military rank (sergeant, lieutenant, coronel,..). The values of these variables
can be assigned numbers 1, 2, …, but the assigned numbers are quantifiable only to the
extent that they can be arranged in order of magnitude.

3. Interval – the values can be regarded as points on a number line. The variable is
quantifiable. The difference between two values of an interval variable provides a
numerical measure of the amount by which the values differ. It incorporates the concept
of equality or intervals, but has an arbitrary zero point. Ex: temperature (0C) and time of
arrival of the airplane (2:30 PM).

4. Ratio – represents the actual amount of a variable. It has an absolute zero or origin.
Ex: amount of money in the bank, number of trips, and length of time in minutes late.

II. SAMPLING TECHNIQUES


What is the major concern of Inferential Statistics?
The major concern of Inferential Statistics is to be able to generalize results to the entire
population of interest based on the information obtained from a sample.
To ensure that the information obtained from the sample is reliable and can be used to
generalize about the entire population, we must make sure that the sample obtained is
representative of the population of interest.

There are concepts that need to be well defined and established. These concepts are the
population and the sample. In statistics, population refers to the set of all observations made on
all objects under study for a given characteristic of interest or variable. The number of
observations in a given population is referred to as the population size and is designated as N.

Population – (physical population) set of all individuals/objects or entities under consideration.


Population – (population data) refers to the set of all possible values of a variable, observed on
every individual or object under study. This is also called a statistical population.

A population may be so large that it may be impossible or impractical for the researcher
to study all its elements. In such a case, the study of a sample from the given population would
be more appropriate.

Sampling is the process of choosing a sample. As a rule, a representative or sample should be


obtained in such a manner that the characteristics and variations of the population being sampled
are reflected.
Sampling frame is the list from which the potential respondents or objects are drawn.
Sampling scheme refers to the method of selecting sampling units from the sampling frame.
Sampling units contain the elements of the population and are used for selecting the elements
into the sample, while observational units are the units from which the
observations are obtained.
A sample is a subset of the population. This may refer to a sample of individuals or
objects, or a sample of data or observations gathered from the sample objects or respondents. It
is any subgroup of units drawn from the population by some appropriate method so that the
characteristics of the population can be estimated. The number of units or observations in a given
sample is termed as sample size and is designated by n. The sample comprises all selected
respondents, objects, or elements of the population.
Population data is the set of all values of a variable, observed on every individual or
object in the group under investigation while the data gathered from a sample is called a sample
data, and is subset of the population data. The terms population and sample are also used to
refer to the data.

Parameter – is a numerical characteristic of a population.


Statistic – is a quantity calculated from the observations in a sample.

Basically, there are two broad classifications of sampling, namely: probability and non-probability
sampling.

A. PROBABILITY SAMPLING PROCEDURES


In probability sampling, every element belonging to the population has a known and non-
zero probability of being included in the sample, while in non-probability sampling, some of the
elements of the population are deliberately ignored, giving some elements no chance at all to be
included in the sample.

Methods of Probability Sampling


1. Simple Random Sampling
2. Stratified Sampling
3. Systematic Sampling
4. Cluster Sampling
5. Multi-stage Sampling

1. Simple Random Sampling. Basic to all sampling designs, this procedure is suitable when the
population being studied is homogeneous with respect to the characteristic under investigation.
A sample is a simple random sample if all members of the population have equal chance of being
included in the sample. This is usually done by draw lots, by the use of the table of random
numbers or by the use of a hand-held calculator.

2. Systematic Sampling. The elements of a population of size N are numbered from 1 to N in


some order. A systematic sample of size n consists of an element selected randomly from the
population using the first sample selected between 1 to k elements, call this as r or the random
start, and starting from r every kth subsequent element is selected. (k is called the sampling
interval and is obtained using the formula N/n). Our samples then will be r, r+k, r+2k, r+3k, etc…
Example:
A researcher wants to choose about 75 subjects from a total target population of 600
people.
N = 600 n = 75 k = 600/75 =8
r is a value to be randomly obtained by drawlots:
1<r<8
Suppose r = 3 was obtained.
The sample will consist of the following subjects from the list:
r=3 = 3rd
r + k = 3+8 = 11th
r + 2k = 3+16 = 19th
Every 8th person will be selected after the 3rd person is selected.

3. Stratified Sampling. If the population of size N is heterogeneous and can be subdivided into
non-overlapping L homogeneous subpopulations called strata, of sizes N1, N2, …, NL
respectively, such that
N1 + N2 + … + NL = N.
A stratified sample of size n, consists of samples of sizes n1, n2, … nL, drawn independently
from one stratum to another, where
n1 + n2 + … + nL = n.
Allocation of samples. Depending on the size of the sample taken from the strata, stratified
sampling can be categorized as one with equal allocation or proportional allocation.
Equal stratified sampling involves drawing samples of the same size from each stratum. The total
sample size n is divided equally to the different strata.
ni = n/L
Proportionate stratified sampling involves drawing a sample from each stratum in proportion to
the stratum’s share in the total population.
ni = n(Ni/N)

4. Cluster Sampling. Cluster sampling is a one-stage sampling procedure, wherein the


population is composed of a group of clusters or small units composed of population elements. A
cluster sample consists of all elements of a number of these clusters chosen by simple random
sampling or by systematic sampling.

5. Multi-Stage Sampling. This sampling procedure is done in stages. For example, in medical
studies, the provinces selected in the first stage of the analysis maybe partitioned into
municipalities, and then the selected municipalities can be partitioned into barangays. Then, from
the selected barangays, the researcher could select a random sample of families wherein data or
information on medical expenditures can be elicited.

B. NON-PROBABILITY SAMPLING METHODS


1. Convenience sampling – chooses as sampling units those which are readily or easily
accessible.
2. Accidental sampling – chooses sampling units by chance or accident.
3. Quota sampling – chooses as sampling units those which satisfy pre-specified characteristics
or criteria. Defines samples based on the known proportions within the population, and
nonrandom sampling is completed in each group. Similar to stratified random sampling, the
sampling size (quota) is identified, then convenience sample is taken from each level of the
population stratification variable.
4. Purposive sampling – involves a purpose in mind based on certain criteria; would usually
have one or more specific predefined groups the researcher is seeking.
5. Judgment sampling – chooses sampling units on the basis of an expert’s opinion.
6. Snowball sampling - selects sampling units based on referrals from initial informants; useful to
use when it is not really possible to identify all members of the population at the outset highly
useful for studies of behavior of deviant or illegal in nature
7. Voluntary Sample - A voluntary sample is made up of people who self-select into the survey.
Often, these folks have a strong interest in the main topic of the survey.

C. DETERMINING SAMPLE SIZE


Approaches to sample size determination
1. Sampling fraction approach – this sometimes lead to sample sizes that are too large even for
small sampling rates and too small even for large sampling rates.
2. Subjective approach – using the perception of the researcher as to what is the sample size;
the sample size is mainly dependent on the budget for the survey operation. This is easily done;
however, it does not consider the level of variation in the population.
3. Via precision point of view – by setting the desired precision of the estimate. Objective and
scientific, however, the information required is oftentimes difficult to obtain.

Requirements for determining n using the precision point of view


1. Level of confidence – measures the degree of confidence of the estimate
2. Maximum tolerable error – the margin of error one is willing to tolerate
3. Variance of the population – measures the variation of the target population
4. Perceived value of P – needed when the objective is to estimate a population proportion
Remarks:
In the formulas for the sample size computations, if the value of n is not integral, we usually take
the next larger integer, subject to the unit cost of sampling. For an additional sampling unit,
compare the increase in cost to the corresponding increase in precision. The former should be
commensurate to the latter.

Sources of prior Estimates for the Population Parameters:


1. We can use results of previous similar surveys
2. We can make some assumptions about the population
3. Pilot survey
a. Take an initial small sample of size n1
b. Using the sample in (a), compute estimates for the necessary parameters
c. Using the estimates from (b), determine the required sample size n
d. Sample (n-n1) additional units to attain the required number of sampling units
4. Double sampling
a. Take an initial large sample size n1
b. Using the sample in (a), compute estimates for the necessary parameters
c. Using the estimates from (b), determine the required sample size n
d. Take a sub-sample of size n from the initial sample.

Formulas that can be used to determine the sample size:


1. Determining the required sample size for estimating the mean (N>100,000)

where:
𝜎 – standard derivation of the population (or its estimate S)
e – maximum error deemed acceptable
Z – standard normal variable for the specified degree of confidence interval

2. Determining the required sample size for estimating the mean (N< 100,000)

where:
N – population size
S – standard derivation of the population

3. Determining the required sample size for estimating the proportion (N > 100,000)

where:
𝑃 – initial estimate of the population proportion

Remark: If an initial estimate of P is not possible, then it should be estimated asbeing 0.50.
Such an estimate is conservative.

4. Determining the Required Sample size for Estimating the Proportion (N < 100,000)

Sample size for surveys with more than one variable

1. If there are continuous and categorical variables in the survey, determine the most
important ones. Compute the required sample size for each of these variables
2. If they are equally important, use the largest sample size. However, consider the
practicality and the cost of using the different sample sizes.
a) If the largest sample size is too costly, adjust or relax the precision to make the sample
size smaller.
b) If the sample sizes are very different from each other, drop some items; these may require
another sampling approach. Combine the variables with similar sample sizes and separate
those variables that may need special methods.

III. SUMMARIZING DATA


The objectives of data description is to summarize the characteristics ultimately to make
the data set more comprehensible and meaningful. The following three characteristics are
extremely important:
1. Representative value, such as an average
2. Measure of scattering or variation
3. Nature or shape of the distribution of the data

A. WAYS OF SUMMARIZING DATA:


1.Tabular method: Frequency distributions, Relative frequency distributions, Percentage
distributions
2.Graphical Method: Bar graph, histogram, line graph, and pie chart, Boxplot, and stem and leaf
display
3.Numerical Method: Measures of Central Tendency, Measures of Dispersion or Variability

B. MEASURES OF CENTRAL TENDENCY


1. The Mean
The mean, denoted by the Greek letter µ (mu), is defined as the sum of all observations
divided by the number of observations. Symbolically, the population mean is given by

where, 𝑋𝑖 is the value of the ith observation, N is the number of observations, and 𝑖 is
the index of summation whose value ranges from 1 to N.

2. The Median
The median is the middle value in a set of data, where the observations are arranged from
highest to lowest. It is a single value that divides the array of observations into two equal parts,
such that half of the observations are above it and half are below it.
3. The Mode
The mode is the value which occurs most frequently in a given data set. The mode of a
given set of data is determined by inspection.
4. The Midrange
The midrange is the average of the maximum (highest) and minimum (lowest)
observations in the data set.

C. MEASURES OF DISPERSION OR VARIABILITY


Aside from the measure of central tendency which is a measure of the typical value of the
observations in the data set, it is also important to know the amount of variability or variation of
the observations. Variability or dispersion refers to the degree to which the data are spread out.
Differences in the observations exist, otherwise the first three measures of central tendency will
all be equal if the variable is constant.

1. The Range
The Range, denoted by R, is defined as the difference between the highest value (HV) and the
lowest value (LV) in the data set. In symbols,

2. The Variance
The variance of a set of data, denoted by 𝝈𝟐, is the mean of the squared deviations of the
observations from the mean.

3. The Standard Deviation


The standard deviation, denoted by 𝝈, is defined as the positive square root of the variance. Thus,
𝜎 = √𝜎2 population standard deviation
s = √𝑠2 sample standard deviation
4. The Coefficient of Variation
The coefficient of variation, denoted by CV, is defined as the ratio of the standard deviation
and the mean, and is expressed in percent. The CV is the standard deviation, expressed as a
percentage of the mean.
𝐶𝑉 = × 100% where 𝝁 ≠ 𝟎
5. Standard Error
The standard error 𝝈𝑿̅ is the standard deviation of the sample means of sample size n.
PROBABILITY AND PROBABILITY DISTRIBUTIONS

FUNDAMENTAL PRINCIPLES OF COUNTING


The fundamental principle of counting states that if one choice can be done in n1 ways
and another choice can be done in n2 ways, together, the choices can be done in n1 x n2 ways.
The rule can be extended to more than two choices.

1. Combination is a collection or group formed by taking all or part of a given set objects
WITHOUT regard to order by which the objects are selected
𝑛 𝑛!
𝑛𝐶𝑟 = =
𝑟 (𝑛 − 𝑟)! 𝑟!
where: nCr – number of combinations of n objects taken r at a time

Ex. In a farm, 50 plots are available for soil analysis, the budget is only for 4 plots, in how many
possible sets of 4 plots are there for analysis?
50
50𝐶4 = = 230300
4
2. Permutation is an ordered collection of all or part of a given set of objects
𝑛!
𝑛𝑃𝑟 =
(𝑛 − 𝑟)!
where: nPr – number of permutations of n distinct items taken r at a time

Ex. Ten available tractors for inspection are to be lined up in a shop. How many ways can the
tractors be arranged?
10𝑃10 = 3628800

B. DEFINITION OF TERMS
Random Experiment – any process that can be repeated under basically the same conditions,
and which yields well-defined outcomes.
Sample Space – set of all possible outcomes of a random experiment. It is usually denoted by
the capital letter S
Sample Point – an element of the sample space (S) = number of sample points in the sample
space S
Event – a subset of the sample space. It is usually represented by capital letters of the English
alphabet
n(A) = number of sample points in event A
Elementary Event – an event that contains only one sample point
Compound Event – an event containing more than one sample point

C. PROBABILITY DISTRIBUTION OF A RANDOM VARIABLE


A random variable is a rule or function that assigns exactly one real number to every element in
the sample space. We use a capital letter, say X, to denote a random variable, and its
corresponding small letter, x, for one of its values. Each possible value x of the random variable
X then represents an event that is a subset of the sample space.

A discrete random variable is one that may assume a finite or countably infinite number of
numerical values.

Continuous Random Variable is a random variable which is not discrete, that is, it is one that
may assume an uncountably infinite number of possible values. It may assume any value in a
given interval.

Probability Distribution of a Random Variable. A listing of all possible values that a random
variable can take on together with their corresponding probabilities is called a probability
distribution. The values of the random variable correspond to events that are mutually exclusive.
This is because each outcome or sample point corresponds to exactly one value of the random
variable. The probability distribution then of a random variable, say X, provides a probability for
each possible value x. These probabilities must sum to 1.

The probabilities of X are denoted as


f(x) = P(X = x)
where: x – represents any one of the possible values that the random variable X may assume.

SPECIAL DISCRETE PROBABILITY DISTRIBUTIONS


1. Uniform Distribution
The discrete uniform probability distribution is the simplest of all discrete probability
distributions. Here, the random variable X assumes the values a1, a2, …, aN with equal
probabilities. Such a probability distribution is called the discrete uniform distribution.The
probability function or formula of the uniform distribution is

The notation f(x; N) indicates that the uniform distribution depends on the parameter N. The
graphical representation of the uniform distribution by means of a histogram always turns out to
be a set of rectangles with equal heights.

A special case of the uniform probability distribution is when the values of the uniform random
variable corresponds to the natural numbers 1 to N, that is,

In this case, the mean and variance of a uniformly distributed random variable are given
respectively by

2. Binomial Distribution
A binomial experiment is one that possesses the following properties:
a) The experiment consists of n repeated trials.
b) Each trial results in an outcome that may be classified as a success or a failure.
c) The probability of success, denoted by p, remains constant from trial to trial.
d) The repeated trials are independent.
The binomial random variable X is defined as the number of successes in n trials. Since it depends
on the number of trials and the probability of a success on a given trial, then the probability
distribution of this discrete variable is called the binomial distribution. The probability function or
formula of the binomial distribution is

where: q – 1 – p, is the probability of failure.

The mean and variance of the binomial random variable are given respectively by

3. Poisson Distribution
A Poisson experiment is one that possesses the following properties:
a. The number of outcomes occurring in one-time interval or specified region is independent
of the number that occur in any other disjoint time interval or region of space.
b. The probability that a single outcome will occur during a very short time interval or in a
small region is proportional to the length of the time interval or the size of the region and
does not depend on the number of outcomes occurring outside this time interval or region.
c. The probability that more than one outcome will occur in such a short time interval or fall
in such a small region is negligible.
The number X of outcomes occurring in a Poisson experiment is called a Poisson random
variable. The probability function or formula of the Poisson distribution is

where: λ – is the average number of outcomes occurring in the given time interval or
specified region
e – 2.71828…
If n is large and p is small, the binomial probabilities are often approximated by means of the
formula

where λ – np is the expected, or average number of successes.

The mean and variance of a Poisson distribution are given by

CONTINUOUS PROBABILITY DISTRIBUTIONS


Continuous Random Variable is a random variable which is not discrete, that is, it is one that
may assume an uncountably infinite number of possible values. It may assume any value in a
given interval.
1. The Normal Distribution
The normal distribution is one of the most useful continuous probability distributions in the field of
Statistics. Many naturally-occurring phenomena or measurements are approximately the normal
distribution. The normal distribution plays a very important role in inferential statistics.
ESTIMATION

Estimation refers to estimating the value of the parameter of interest (point estimation and
confidence interval estimation). The objective of estimation is to determine the approximate value
of a population parameter on the basis of a sample statistic.

For example, suppose in the above example, for n=25 individuals, the sample mean 𝑋̅ for the
headache to be gone is 13 minutes. 𝑋̅ is called point estimator while its specific value, 13 minutes
is called point estimate.

Characteristics of a good point estimator:


1. Unbiased – the mean of the sampling distribution of the estimator is equal to the value of
the parameter being estimated.
2. Consistency – an estimator is consistent if its value approaches the value of the
parameter as the sample size increases.
3. Efficiency – given two estimators, 𝜃̅and 𝜃̃, for 𝜃. 𝜃̅ is more efficient than 𝜃̃ if Var[𝜃̅] <
Var[𝜃̃].
Example
The height in centimeters of a random sample of 5 basil plants are 14.6, 12.5, 15.3, 16.1 and
14.4. Find a point estimate for the mean height of all basil plants. Since a good point estimate for
the population mean height of basil plants is the sample mean height then

Therefore, 14.58 is the estimated mean height of all basil plants.

INTERVAL ESTIMATION
Interval estimation is based on sample data. Two numbers are calculated to form an interval,
consisting of the lower limit and an upper limit. This interval is expected to contain the parameter
with probability (1-α)100 percent. The resulting pair of numbers is called an interval estimate or a
confidence interval.

An alternative statement for the example on the mean effectivity of a Paracetamol in treating
headache is “The length of time headache is gone right after paracetamol intake is between 10.5
and 15.5 minutes.” Here, (10.5 minutes – 15.5 minutes) is called the interval estimate.

A confidence interval (or interval estimate) is a range (or an interval) of values that is likely to
contain the true value of the population parameter with some degree of confidence.

The degree of confidence is the probability 1-𝛼 that the confidence interval contains the
population parameter. This probability is often expressed as the equivalent percentage value. The
degree of confidence is also referred to as the level of confidence or the confidence level.

Margin of Error – is the difference between the observed sample statistic and the value of the
population parameter.

When sample data are used to estimate a population mean, the margin of error, denoted by E is
the maximum likely (with probability 1-𝛼) difference between the observed statistic
𝜃̂ and the population parameter 𝜃. The margin of error E is also called the maximum error
of the estimate and can be found by multiplying the standard normal distribution critical
value 𝑍𝛼/2 and the standard deviation of the sample statistics. The standard deviation of
the sample statistics is called the standard error SE.
Interpretation of confidence interval. The limits 𝜃̂ – E and 𝜃̂ + E, either enclose the population
parameter 𝜃 or not, and it is not possible to determine if it is, without knowing the true value of the
parameter 𝜃. It is incorrect to state that the parameter 𝜃 has a 95% chance of falling within the
specific limits obtained, because 𝜃 is constant, being not a random variable, and either it will fall
within these limits or it will not. There is no probability involved. It is correct to say that in the long
run these methods will result in confidence intervals that will contain 𝜃 in 95% of the cases.
Four commonly used alpha:

1. Confidence Interval about One Population Mean


For the construction of the confidence interval for the population mean 𝜇, there are several cases
to be considered. For these cases, let the sample be of size n, and the sample consist of the
observations Xi, i = 1,2, …, n
Example
The manager of the SJC Mall want to estimate the mean amount spent per shopping visit by
customers. A sample of 20 customers reveals the following amounts spent (in pesos). Assume
that the amount spent per shopping visit variable follows the normal distribution.
a. Determine a 95% confidence interval.
b. Would it be reasonable to conclude that the population mean is P2500?
c. What about P3000?
2408 2632 2568 1189 3092
1896 3073 2573 2347 2194
2459 2341 2541 2942 2743
2111 2430 2634 2093 3085
Solution:
Given: Normality assumption n = 20
95% CI therefore 𝜶 = 0.05 𝜶/𝟐= 0.025
Compute: 𝑿̅ = P2467.55 𝑺 = P450.687 𝑺/√𝒏= P100.7768
a. Normality is assumed, 𝜎2 is unknown, use

From the t-distribution table, 𝒕𝟎.𝟎𝟐𝟓(𝟏𝟗) = 2.093


P2467.55 ± 2.093(P100.7768) = P2467.55 ± P210.926
The 95% confidence interval for the amount spent per shopping visit is:
P2256.62 – P2678.48.
We are 95% confident that the interval P2256.62 – P2678.48 contains the true
mean amount spent per shopping visit of customers.
b. Since P2500 is within the 95% CI obtained, the P2500 can be a reasonable value to
conclude as the mean amount spent per shopping of customers.
c. The P3000 amount is not within the 95% CI obtained. This value is not a reasonable value
to be concluded as the true value of the mean.

2. Confidence Interval About One Population Proportion


Recall that the point estimate of the population proportion P is 𝑝̂ = 𝑛1/𝑛, which has properties that
𝜇𝑝̂ = P and 𝜎𝑝̂/2 = PQ/n.

p(1 − p) p(1 − p)
p−𝑍 < 𝑷< p+𝑍
𝑛 𝑛

The sample size should be large enough such that there are at least 5 successes and at least 5
failures in the sample.

Example
The National Student Org A (NSOA) is considering a proposal to merge with another National
Student Org B (NSOB). According to the NSOA bylaws, at least three fourths of the organization
membership must approve any merger. A random sample of 2000 current NSOA members
reveals 1600 plan to vote for the merger proposal. What is the estimate of the population
proportion? Develop a 95 percent confidence interval for the population proportion. Basing your
decision on this sample information, can you conclude that the necessary proportion of NSOA
members favor the merger? Why?

Solution
Given: n1 = 1600 n = 2000 95% CI therefore 𝜶 = 0.05 𝜶/𝟐= 0.025
First, calculate the sample proportion: 𝑝̂= 𝑛1/𝑛 = 1600/2000= 0.80

Thus, we estimate that 80 percent of the population favor the merger proposal.
( )
We determine the 95% CI using the formula p̂ ± 𝒁𝒂 𝒏
𝟐

The Z value corresponding to the 95 percent level of confidence is


𝑍 =𝑍 . = 1.96

The endpoints of the confidence interval are 0.782 and 0.818. The lower endpoint is greater than
0.75. Hence, we conclude that the merger proposal will likely pass because the interval estimate
includes values greater than 75 percent of the organization membership.

3. Confidence Interval About One Population Variance


In the construction of confidence intervals for the population mean and population proportion, the
normal and Student t distributions were used. In developing estimates of variances or standard

deviations, the Chi-square distribution will be used. The 100(1-𝛼) % confidence


interval on the population variance 𝜎2 is given by
denotes the left-tailed critical value

denotes the right-tailed


critical value, and degrees of freedom equal to n-1.

4. Confidence Interval About One Population Standard Deviation


If a confidence interval estimate of the population standard deviation 𝝈 is desired, this is obtained
by taking the square root of the upper and lower confidence interval limits and change 𝜎2 to 𝜎:

Example A random sample of 25 fish caught at Taal Lake has a mean length of 35.5 cm with a
standard deviation of 5 cm. Construct a 95% confidence interval for the variability (SD) of the
length of fish in Taal Lake
Solution
Given: n = 25 100(1-𝛼) % = 95% 𝛼 = 5% or 0.05
𝑋̅ = 35.5 cm S = 5 cm k = n-1 = 24
Required: Construct a 95% confidence interval for the variability (SD) of the length of fish in Taal
Lake.

You might also like