0% found this document useful (0 votes)
29 views20 pages

Statistic Analysis

The document discusses statistical analysis and provides definitions and examples of key statistical concepts like descriptive statistics, inferential statistics, variables, data, and different types of scales of measurement. It also outlines common statistical techniques for describing univariate data like frequency tables, charts, measures of central tendency, and discusses uses of statistics in various fields.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views20 pages

Statistic Analysis

The document discusses statistical analysis and provides definitions and examples of key statistical concepts like descriptive statistics, inferential statistics, variables, data, and different types of scales of measurement. It also outlines common statistical techniques for describing univariate data like frequency tables, charts, measures of central tendency, and discusses uses of statistics in various fields.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION

PRELIMS

● HEALTH CARE
STATISTICS ➜ Evaluate 100 incoming patients
● the branch of mathematics that transforms using a 42-item physical and
data into useful information for decision mental assessment questionnaire.
makers. ● QUALITY IMPROVEMENT
➜ DESCRIPTIVE STATISTICS ➜ Initiate a triple inspection program,
★ Collecting, summarizing, setting penalties for workers who
and describing data produce poor-quality output.
➜ INFERENTIAL STATISTICS ● PURCHASING
★ Drawing conclusions and/or ➜ A food producer purchases plastic
making decisions containers for packaging its
concerning a population product. Inspection of the most
based only on sample data recent shipment of 500 containers
found that 3 of the containers were
WHAT IS STATISTICS? (Doane & Seward, defective. The supplier’s historical
2019) defect rate is .005. Has the defect
● is the science of collecting, organizing, rate really risen or is this simply a
analyzing, interpreting, and presenting “bad” batch?
data. ● MEDICINE
● A STATISTIC is a single measure, reported ➜ Determine whether a new drug is
as a number, used to summarize a sample really better than the placebo or if
data set; for example, the average height the difference is due to chance.
of students in a university. ● OPERATIONS MANAGEMENT
● Examples of statistics: ➜ Manage inventory by forecasting
➜ average height for the length of the consumer demand.
gowns ● PRODUCT WARRANTY
➜ maximum height to design the ➜ Determine the average dollar cost
height of the doorways of the of engine warranty claims on a new
classrooms, etc. hybrid engine.

USES OF STATISTICS (Doane & Seward, KINDS OF STATISTICS (Doane & Seward,
2019) 2019)
● AUDITING ● DESCRIPTIVE STATISTICS
➜ The firm has learned that some ➜ refers to the collection,
invoices are being paid incorrectly, presentation, and summary of data
but it doesn’t know how widespread (either using charts and graphs or
the problem is. A sample of invoices using numerical summary).
can be used to estimate the ● INFERENTIAL STATISTICS
proportion of incorrectly paid ➜ refers to the generalizing from a
invoices. sample to a population, estimating
● MARKETING unknown population parameters,
➜ Many companies use Customer drawing conclusions, and making
Relationship Management (CRM) decisions.
to analyze customer data from
multiple sources. With statistical SOME DEFINITIONS
and analytics tools such as ● VARIABLE
correlation and data mining, they ➜ is any characteristic of a person or
identify specific needs of different an object that may vary across
customer groups, and this helps persons or across different time
them market their products and points.
services more effectively.
Age Student number Asset size

Quiz scores Civil status Place of birth

1 diamla, foronda, gan


Customer satisfaction Satisfied, Neutral, Not
● DATA
Satisfied
➜ are the values associated with a
variable. For the variable civil Student’s Year level Freshman,
status, possible data are single, Sophomore, Junior,
married, separated, Senior
widow/widower.
S&P’s bond ratings AAA, AA, A, BBB, BB,
Height 5’6” 1.75 165 cm B, CCC, CC, C, …
meters b. Numerical or Quantitative
Variables
Academic Strand ABM STEM GAS ★ An interval scale is an
ordered scale in which the
CLASSIFICATION OF VARIABLES difference between
1. According to nature measurements is a
a. Categorical or Qualitative meaningful quantity but the
Variables measurements do not have
★ Categorical or Qualitative a true zero point.
Variables have values ★ A ratio scale is an ordered
which can be placed into scale in which the difference
categories. between measurements is a
★ Examples of categorical meaningful quantity and the
variables are gender, civil measurements have a true
status, frequency zero point.
b. Numerical or Quantitative
Variables Numerical Variable Scale or Level of
★ Numerical or quantitative Measurement
variables have values that
represent quantities. Temperature (in oC or Interval
★ Examples are height, oF)
weight and temperature
2. According to scale Standardized Exam Interval
a. Categorical or Qualitative Score (SAT or NSAT)
Variables
Height Ratio
★ A NOMINAL SCALE is
used to classify persons (or Weight Ratio
objects) in which NO
ranking is implied. Age Ratio

Categorical Categories Salary Ratio


Variables
Industry Index Ratio
Car Ownership Yes / No
STATISTICAL TECHNIQUES FOR
Profession Engineer, Architect, DESCRIBING UNIVARIATE DATA
Teacher, others
Scale or Level of Measurement
Work shift Day shift/night shift
Nominal Ordinal Interval Ratio
★ An ORDINAL SCALE is Tables Frequency Frequenc Frequency Frequency
used to classify persons (or and y and and and
objects) into distinct Percentage Percenta Percentag Percentag
(Summary ge e (FDT) e (FDT)
categories in which ranking Table) (Summar
is implied. y Table)

Charts Pie chart Pie chart Histogram Histogram


Categorical Categories bar graph bar graph Line Line
Graph Graph
Variables

2 diamla, foronda, gan


Central Mode Mode and Mean, Mean, ● It includes precise information from every
Tendency Median Median Median score, hence, it is affected by a change in
and Mode and Mode any score.
Variations Range Range, Range, ● It is affected by extreme values.
and standard standard ● The mean of a separate distribution can be
interquarti deviation, deviation, combined to get the mean of the total
le range variance variance
and coeff. and coeff. distribution.
of variation of variation ● What is the mean of the distribution: 713,
300, 618, 595, 311, 401, and 292?
MEASURES OF CENTRAL TENDENCY ● The number of tornadoes that have
occurred in the United States over an 8-
● This is a statistical measure which
year period were as follows. What is the
describes where the center of a frequency
mean?
distribution lies.
● The three measures commonly used are ➜ 684, 764, 656, 702, 856, 1133,
the mean, the mode and the median. 1132, 1303
● Some variations of the mean are the
arithmetic mean, geometric mean, WEIGHTED MEAN
weighted mean and the trimmed mean.

MODE
● It is simply the observation or value that
occurs most frequently in the data set.
● What is the mode given the following: A, A,
B, C, B, C, D, A, A? How do you call the
distribution with respect to the mode?
Mode = A, distribution is unimodal
● What is the mode of the distribution: 10, 10,
20, 20, 30, 30, 20, 30? How do you call the
distribution with respect to the mode?
Mode = 20, 20; distribution is bimodal.
● What is the mode of the distribution: 500,
500, 700, 600, 700, 600? How do you call
the distribution with respect to the mode?
➜ Answer: Mode = { } or none. The
GEOMETRIC MEAN
distribution has no mode.
● This measure is useful for growth rates.
● The mode is suitable for nominal, ordinal,
● It mitigates high extremes.
interval and ratio level variables.
● It is, however, less familiar.
● It is not affected by extreme values.
● It requires that data is positive.
● There may be no mode.
● There may be several modes.
𝐺 = 𝑛√𝑥1 𝑥2 . . . 𝑥𝑛
MEDIAN
● It is simply the middle score or middle value TRIMMED MEAN
when scores are ranked in order of ● Computed in the same manner as the
magnitude. arithmetic mean but it omits the highest and
● It is unique. lowest k% of data values (e.g., 5%).
● It is relatively unaffected by extreme scores ● It mitigates the effects of extreme values.
at either end of the distribution. ● It has the disadvantage of excluding some
● Not all values in the distribution contribute data values that could be relevant.
to the value of the median.
● It can be used with ordinal, interval and MEASURES OF POSITION
ratio data. ● A measure of position, or quantile, is a
● It is not suitable for nominal data because general descriptive measurement used to
nominal data have no numerical order. separate quantitative data into distinct
MEAN groups. To compute quartiles of ungrouped
● The mean of a set of numerical data is data, the values must first be arranged
unique. either in ascending or descending order.
● It is the only measure of central tendency ● Quartiles divide the values into four groups
where the sum of the deviation of each of equal size, each comprising 25% of
value from the mean will always be zero.
3 diamla, foronda, gan
observations. If n = 50, 25% of the values ● The more the data are concentrated, the
is less than or equal to 𝑄1. smaller, the quartile deviation, variance,
● Deciles divide the values into ten groups of and standard deviation.
equal size, each comprising 50% of ● If the values are all the same (no variation),
observations. If n = 50, 30% of the values all these measures will be zero.
is less than or equal to D3. ● None of these measures are ever negative.
● Percentiles divide the values into 100
groups of equal size, each comprising 1% TABLES AND CHARTS FOR
of observations. If n = 200, 65% of the CATEGORICAL DATA
values is less than or equal to P65. Summary Table
● A summary table indicates the frequency,
MEASURES OF VARIABILITY amount, or percentage of items in a set of
● Variation measures the spread, or categories so that differences in categories
dispersion, of values in a data set. can be seen.
➜ Range
➜ Quartile Deviation Bar Chart
➜ Variance ● A bar chart shows each category, the
➜ Standard deviation length of which represents the amount of
frequency or percentage of values falling
➜ Coefficient of Variation
under each category.
● It measures the difference of each value
around the mean.
Pie Chart
● It functions as a measure of risk or
● A pie chart shows a circle broken up into
uncertainty in the field of finance.
slices that represent categories. The size of
● It provides a measure of volatility in
each slice of the pie varies according to the
considering alternatives for pricing
percentage in each category.
commodities.
● It may be used as a measure of error in the
field of forecasting. TABLES AND CHARTS FOR NUMERICAL
DATA
RANGE Stem and Leaf Display
● The range of a set of data with n ● A stem and leaf display organizes data into
observation is defined as the difference groups (called stems) so that the values
between the highest and lowest values. within each group (the leaves) branch out
● The quartile deviation, QD, is the amount to the right on each row.
of spread with the middle half of the items
arranged in an ordered array. It is also Frequency Distribution Table
called semi-interquartile range. It is used ● A frequency distribution table is a summary
for ordinal data. table in which the data are arranged into
numerically ordered class groupings.
VARIANCE
● The variance is the average Histogram
(approximately) of squared deviations of ● A histogram is the graph of data in a
values from the mean. frequency distribution where the class
boundaries are shown on the horizontal
COEFFICIENT OF VARIATION axis while the vertical axis is either
● The coefficient of variation is the standard frequency, relative frequency or
deviation divided by the mean, multiplied percentage. Bars of the appropriate
by 100. heights are used to represent the number
● It is always expressed as a percentage, %. of observations within each class.
● It shows variation relative to the mean.
● The CV can be used to compare two or Line Graph
more sets of data measured in different ● A percentage polygon is formed by having
units (e.g. weight in kgs and height in the midpoint of each class represent the
meters). data in that class and then connecting the
sequence of midpoints at their respective
SUMMARY CHARACTERISTICS class percentages.
● The more the data are spread out, the
greater the range, quartile deviation,
variance, and standard deviation.

4 diamla, foronda, gan


Scatter Plots Pro: Robust when extreme data values
● A scatter plot is used for numerical data exist.
consisting of paired observations taken Con: Ignores extremes and can be
from two numerical variables. affected by gaps in data values.
● One variable is measured on the vertical ● Statistic: Mode
axis while the other variable is measured Formula: Most frequently occurring data
on the horizontal axis. value
● In case of dependence relationship, the Excel Formula: =MODE(Data)
dependent variable is plotted along the Pro: Useful for attribute data or discrete
vertical axis. data with a small range
Con: May not be unique, and is not helpful
MEASURES OF CENTRAL TENDENCY, for continuous data
POSITION, AND VARIABILITY ● Statistic: Trimmed Mean
Formula: Same as the mean except omit
● The central tendency is the extent to highest and lowest k% of data values (e.g.,
which all the data values group around a 5%)
typical or central value. Excel Formula: =TRIMMEAN(Data,
● The variation is the amount of dispersion, Percent)
or scattering, of values Pro: Mitigates effects of extreme values.
● The shape is the pattern of the distribution Con: Excludes some data values that
of values from the lowest value to the could be relevant.
highest value. ● Statistic: Geometric mean (G)

NUMERICAL DESCRIPTION Formula:


● Three Key Characteristics of numerical Excel Formula: =GEOMEAN(Data)
data: Pro: Useful for growth rates and mitigates
high extremes.
Characteristic Interpretation Con: Less familiar and requires positive
data.
Central Tendency Where are the data
values concentrated? MEASURES OF CENTRAL TENDENCY
What seems to be ● Arithmetic Mean - the most common
typical or middle data measure of central tendency
values?

Dispersion or How much variation is


Variation there in the data? How
spread out are the
data values? Are there
unusual values? ➔ Affected by extreme values
(outliers)
Shape Are the data values ● Geometric Mean - Used to measure the
distributed rate of change of a variable over time
symmetrically?
Skewed? Sharply
peaked? Flat? ● Geometric Mean Rate of Return -
Bimodal?
Measures the status of an investment over
time
CENTRAL TENDENCY
Five Measures of Central Tendency ➔ Where Ri is the rate of return in time
● Statistic: Mean period i

GROWTH RATES
Formula: ● A variation on the geometric mean used to
Excel Formula: =AVERAGE(Data) find the average growth rate for a time
Pro: Familiar and uses all the sample series.
information
Con: Influenced by extreme values
● Statistic: Median
Formula: Middle value in sorted array
Excel Formula: =MEDIAN(Data)
5 diamla, foronda, gan
● Given by taking the geometric mean of the QUARTILE MEASURES
ratios of each year’s revenue to the ● Quartiles split the ranked data into 4
preceding year. segments with an equal number of values
per segment.
MEDIAN
● In an ordered array, the median is the
“middle” number (50% above, 50% below).
● Not affected by extreme values. ➔ The first quartile, Q1 , is the value
for which 25% of the observations
LOCATING THE MEDIAN are smaller and 75% are larger
● The median of an ordered set of data is ➔ Q2 is the same as the median (50%
𝑛+1 are smaller, 50% are larger)
located at the 2 ranked value.
➔ Only 25% of the values are greater
● If the number of values is odd, the
than the third quartile
median is the middle number.
● GUIDELINES
● If the number of values is even, the
median is the average of the two middle ➔ Rule 1: If the result is a whole
numbers. number, then the quartile is equal to
𝑛+1 that ranked value.
● Note that 2 is NOT the value of the
➔ Rule 2: If the result is a fraction half
median, only the position of the median in (2.5, 3.5, etc), then the quartile is
the ranked data. equal to the average of the
corresponding ranked values.
MODE ➔ Rule 3: If the result is neither a
● Value that occurs most often whole number or a fractional half,
● Not affected by extreme values you round the result to the nearest
● Used for categorical data integer and select that ranked
● Used for numerical primarily when grouped value.
● There may be no mode
● There may be several modes DISPERSION
● Variation - “spread” of data points about
TRIMMED MEAN the center of the distribution in a sample.
● To calculate the trimmed mean, first ● MEASURES OF VARIATION
remove the highest and lowest k percent of
➔ Statistic: Range
the observations.
Formula: Xmax - Xmin
● To determine how many observations to
Excel: =MAX(Data)-MIN(Data)
trim, multiply k by n and round off the
Pro: Easy to calculate
result.
Con: Sensitive to extreme data
values
LOCATING EXTREME OUTLIERS
➔ Statistic: Variance (s2)
Z-SCORE
● To compute the Z-score of a data value,
subtract the mean and divide by the Formula:
standard deviation. Excel: =VAR(Data)
● Z-score - number of standard deviations a Pro: Plays a key role in
data value is from the mean. mathematical statistics.
● A data value is considered an extreme Con: Non-intuitive meaning.
outlier if its Zscore is less than -3.0 or
➔ Statistic: Standard deviation (s)
greater than +3.0.
● The larger the absolute value of the Z-
score, the farther the data value is from the
mean. Formula:
Excel: =STDEV(Data)
Pro:.Most common measure. Uses
same units as the raw data ($ , £, ¥,
etc.).
where X represents the data value Con: Nonintuitive meaning.
𝑥 is the sample mean
➔ Statistic: Coefficient. of variation
S is the sample standard deviation (CV)

6 diamla, foronda, gan


● Comparing Standard Deviation
Formula:
Excel:
=(STDEV.S(Data)/MEAN(Data))*100
or
=(STDEV.P(Data)/MEAN(Data))*100
Pro:.Measures relative variation in
percent so can compare data sets.
Con: Requires nonnegative data ➔ The smaller the standard deviation
➔ Statistic: Mean absolute deviation the steeper the slope is.
(MAD)
COEFFICIENT OF VARIATION
● Useful for comparing variables measured
Formula: in different units or with different means.
Excel: =AVEDEV(Data) ● A unit-free measure of dispersion
Pro: Easy to understand. ● Expressed as a percent of the mean.
Con: Lacks “nice” theoretical
properties.
RANGE ● Only appropriate for nonnegative data. It is
● Simplest measure of variation undefined if the mean is zero or negative.
● Difference between the largest and the
smallest values MEAN ABSOLUTE DEVIATION
● DISADVANTAGES ● MAD - reveals the average distance from
➔ Ignores the way in which data are an individual data point to the mean (center
distributed of the distribution).
➔ Sensitive to outliers ● Uses absolute values of the deviations
around the mean.
VARIANCE
● The population variance (𝜎2) is defined as
the sum of squared deviations around the
mean 𝜇 divided by the population size.

SKEWNESS AND KURTOSIS


● Generally, skewness may be indicated by
looking at the sample histogram.
● For the sample variance (s2), we divide by
n – 1 instead of n, otherwise s2 would tend
to underestimate the unknown population
variance 𝜎2.

This visual indicator is imprecise and does


not take into consideration sample size n.
● Skewness may be indicated by comparing
STANDARD DEVIATION the mean and median.
● The square root of the variance.
● Explains how individual values in a data set
vary from the mean.
● Units of measure are the same as x.
● Skewness is a unit-free statistic.
● The coefficient compares two samples
measured in different units or one sample
with a known reference distribution (e.g.,
symmetric normal distribution).
● Calculate the sample’s skewness
coefficient as:

● In Excel, use Data Analysis/Descriptive


Statistics or the function =SKEW(array)

7 diamla, foronda, gan


● Coefficients outside the range suggest the THE CORRELATION COEFFICIENT
sample came from a non normal ● Unit free
population. ● Ranges between –1 and 1
● The closer to –1, the stronger the negative
linear relationship
● The closer to 1, the stronger the positive
linear relationship
● The closer to 0, the weaker any linear
relationship
● Kurtosis - relative length of the tails and
the degree of concentration in the center.
● Consider three kurtosis prototype shapes.

● A histogram is an unreliable guide to ➔ Negative = Downward Slope


kurtosis since scale and axis proportions
may differ. ETHICAL CONSIDERATIONS
● Excel and MINITAB calculate kurtosis as: ● Should document both good and bad
results
● Should be presented in a fair, objective and
● Coefficients outside the range would neutral manner
suggest the sample differs from a normal ● Should not use inappropriate summary
population. measures to distort fact

NORMAL DISTRIBUTION AND TEST OF


NORMALITY

NORMAL DISTRIBUTION
● It is the most common continuous
distribution.
BIVARIATE DATA ● Also known as the Gaussian distribution or
● Sample Covariance - measures the the bell curve.
strength of the linear relationship between ● In this distribution, the probability that
two numerical variables. various values occur within certain ranges
or intervals can be calculated.
● ‘Bell Shaped’
● Symmetrical
● The covariance is only concerned with the ● Mean, Median and Mode are equal
strength of the relationship. ● Location is characterized by the mean, μ
● No causal effect is implied. ● Spread is characterized by the standard
● Covariance between two random deviation, σ
variables: ● The random variable has an infinite
● Statistical function covariance also in Data theoretical range: -∞ to +∞
Analysis

cov(X,Y) > 0 X and Y tend to


move in the same
direction

cov(X,Y) < 0 X and Y tend to


move in opposite
directions

cov(X,Y) = 0 X and Y are


independent

8 diamla, foronda, gan


THE NORMAL DISTRIBUTION SHAPE

● The total area under the curve is 1.0, and


the curve is symmetric, so half is above the
● By varying the parameters μ and σ, we mean, half is below.
obtain different normal distributions.
● Which distributions have the same mean
(μ) but have different standard deviations?
● Which distributions differ with respect to
both μ and σ?

NORMAL PROBABILITY TABLES

THE STANDARDIZED NORMAL


DISTRIBUTION
● Also known as the “Z” distribution
● Mean is 0
● Standard Deviation is 1

ASSESSING NORMALITY
● It is important to evaluate how well the data
set is approximated by a normal
distribution.
● Normally distributed data should
approximate the theoretical normal
● Values above the mean have positive Z- distribution:
values, values below the mean have ➔ The normal distribution is bell
negative Z-values. shaped (symmetrical) where the
mean is equal to the median.
➔ The empirical rule applies to the
normal distribution.
➔ The interquartile range of a normal
distribution is 1.33 standard
● Note that the distribution is the same, only deviations.
the scale has changed. We can express
the problem in original units (X) or in THE EMPIRICAL RULE AS APPLIED TO
standardized units (Z). THE NORMAL DISTRIBUTION
● This rule states that for symmetrical bell-
NORMAL PROBABILITIES shaped data sets, one can find that roughly
● Probability is measured by the area under two out of every three observations are
the curve. contained within a distance of 1 standard
deviation around the mean and roughly.

9 diamla, foronda, gan


● Construct charts or graphs ● Maximum
➔ For small- or moderate-sized data
sets, do stem and-leaf display and EXPLORATORY DATA ANALYSIS: THE
box-and-whisker plot look BOX-AND-WHISKER PLOT
symmetric? ● A graphical display of the five number
➔ For large data sets, does the summary.
histogram or polygon appear bell-
shaped?
● Compute descriptive summary
measures
➔ Do the mean, median and mode ● The box and central line are centered
have similar values? between the endpoints if data are
➔ Is the interquartile range symmetric around the median.
approximately 1.33 σ?
➔ Is the range approximately 6 σ?
● Observe the distribution of the data set
➔ Do approximately 2/3 of the
observations lie within mean ± 1
standard deviation? ● A Box-and-Whisker plot can be shown in
➔ Do approximately 80% of the either vertical or horizontal format.
observations lie within mean ± 1.28
standard deviations?
➔ Do approximately 95% of the
observations lie within mean ± 2
standard deviations?
● Evaluate normal probability plot
➔ Is the normal probability plot
approximately linear with positive OTHER WAYS OF ASSESSING NORMALITY OF
slope? DATA INCLUDE:
● Checking for skewness with Pearson
THE NORMAL PROBABILITY PLOT coefficient (PC) of skewness as
● A normal probability plot for data from a
normal distribution will be approximately
linear:
● NOTE: The data is considered significantly
skewed when the PC is greater than or
equal to +1 or less than or equal to -1.’
● Checking for outliers
● NOTE: An outlier is a data value that lies
more than 1.5(IQR) units below Q1 or
1.5(IQR) units above Q3 .

DATA MEASUREMENT

WHY COLLECT DATA?


● A marketing research analyst needs to
assess the effectiveness of a new
television advertisement.
● A pharmaceutical manufacturer needs to
determine whether a new drug is more
effective than those currently in use.
● An operations manager wants to monitor a
manufacturing process to find out whether
EXPLORATORY DATA ANALYSIS: THE the quality of product being manufactured
FIVE NUMBER SUMMARY is conforming to company standards.
● Minimum ● An auditor wants to review the financial
● First Quartile (Q1) transactions of a company in order to
● Median (Q2) determine whether the company is in
● Third Quartile (Q3)
10 diamla, foronda, gan
compliance with generally accepted
accounting principles. ● An ordinal scale classifies data into distinct
categories in which ranking is implied.
SOURCES OF DATA Categorical Ordered Categories
● Primary Sources: The data collector is the Variable
one using the data for analysis
➔ Data from a market survey Student Class Freshman,
➔ Data collected from an experiment Designation Sophomore, Junior,
Senior
➔ Observed data
● Secondary Sources: The person Product Satisfaction Satisfied, Neutral,
performing data analysis is not the data Unsatisfied
collector
➔ Analyzing census data Faculty Rank Professor, Associate
Professor, Assistant
➔ Examining data from print journals,
Professor, Instructor
from government agencies or data
published on the internet. Standard & Poor’s AAA,AA,A,BBB,BB,
Bond Ratings B,CCC,CC,C,DDD,D
TYPES OF VARIABLES D,D
● Categorical (qualitative) variables have
values that can only be placed into Student Grades A,B,C,D,F
categories, such as “yes” and “no.”
● Numerical (quantitative) variables have NUMERICAL VARIABLES
values that represent quantities. ● Interval Scale - an ordered scale in which
the difference between measurements is a
meaningful quantity but the measurements
do not have a true zero point.
● Ratio Scale - an ordered scale in which the
difference between the measurements is a
meaningful quantity and the
measurements have a true zero point.
● NOTE: Ambiguity is introduced when ● Likert Scales - a special case of interval
continuous data are rounded to whole data frequently used in survey research.
numbers. Be cautious. ➔ The coarseness of a Likert scale
refers to the number of scale points
(typically 5 or 7).
➔ A neutral midpoint (“Neither Agree
Nor Disagree”) is allowed if an odd
number of scale points is used or
omitted to force the respondent to
“lean” one way or the other.
➔ Likert data are coded numerically
(e.g., 1 to 5) but any equally spaced
values will work.
➔ Careful choice of verbal anchors
LEVELS OF MEASUREMENT result in measurable intervals (e.g.,
CATEGORICAL VARIABLES the distance from 1 to 2 is “the
● A nominal scale classifies data into same” as the interval, say, from 3 to
distinct categories in which no ranking is 4).
implied. ➔ Ratios are not meaningful (e.g.,
here 4 is not twice 2).
Categorical Categories ➔ Many statistical calculations can be
Variables performed (e.g., averages,
correlations, etc.).
Iphone Ownership Yes/No

Type of Bank Savings/Current/Inve


RATIO MEASUREMENT
Accounts stment ● Ratio data have all properties of nominal,
ordinal and interval data types and also
Cable TV Provider Sky/Destiny

11 diamla, foronda, gan


possess a meaningful zero (absence of be better spent to improve
quantity being measured). training of field interviewers
● Because of this zero point, ratios of data and improve data
values are meaningful (e.g., $20 million safeguards.
profit is twice as much as $10 million).
● Zero does not have to be observable in the Cost Even if a census is feasible,
data, it is an absolute reference point. the cost, in either time or
money, may exceed our
budget.
CHANGING DATA BY RECORDING
● In order to simplify data or when exact data Sensitive A trained interviewer might
magnitude is of little interest, ratio data can Information learn more about sexual
be recorded downward into ordinal or harrassment in an
nominal measurements (but not organization through
conversely). confidential interviews of a
➔ For example, record systolic blood small sample of employees.
pressure as “normal” (under 130),
“elevated” (130 to 140), or “high” SITUATIONS WHERE A CENSUS MAY BE
(over 140). PREFERRED
➔ The above recorded data are
Small Population If the population is small,
ordinal (ranking is preserved), but there is little reason to
intervals are unequal and some sample, for the effort of data
information is lost. collection may be only a
small part of the total cost.
SAMPLING CONCEPTS
● Population - involves all of the items one Large Sample Size If the required sample size
is interested in. It may be finite (e.g., all of approaches the population
the passengers on a plane) or effectively size, we might as well go
ahead and take a census.
infinite (e.g., all of the Cokes produced in
an ongoing bottling process). Database Exits If the data are on disk, we
● Sample - a subset of the population and can examine 100% of the
involves looking only at some of the items cases. But auditing or
selected from the population. validating data against
➔ involves looking only at some items physical records may raise
selected from the population the cost.
● Census - an examination of all items in a
Legal Requirements Banks must count all the
defined population cash in bank teller drawers
at the end of each business
SITUATIONS WHERE A SAMPLE MAY BE day. The U.S. Congress
PREFERRED forbade sampling in the
Infinite Population No census is possible if the 2000 decennial population
population is of indefinite census.
size (an assembly line can
keep producing bolts, a PARAMETER OR STATISTIC
doctor can keep seeing
more patients)
● Parameter - a measurement or
characteristic of the population.
Destructive Testing The act of measurement ➔ Usually unknown because we can
may destroy or devalue the rarely observe the entire
item (battery life, vehicle population.
crash tests) ● Statistics - a numerical value computed
from a sample.
Timely Results Sampling may yield more
timely results (checking ➔ Can be used as estimates of
wheat samples for moisture parameters found in the population
content, checking peanut ● Symbols are used to represent population
butter for salmonella parameters and sample statistics.
contamination) ● From a sample of n items, chosen from a
population, we compute statistics that can
Accuracy Instead of spreading be used as estimates of parameters found
resources thinly to attempt in the population.
a census, the budget might

12 diamla, foronda, gan


● To avoid confusion, we use different ➔ An operational definition defines a
symbols for each parameter and its variable in terms of specific
corresponding statistic. measurement and testing criteria.
● Thus, the population mean is denoted by 𝜇
(the lowercase Greek letter mu) while the WHAT IS MEASURED?
sample mean is 𝑥. ● Variables being studied in research may be
● The population proportion is denoted is 𝜋 classified as objects or as properties.
(the lowercase Greek letter pi), while the ● Objects include tangible items such as
sample proportion is p. people, automobiles, etc.
● Properties are the characteristics of the
object such as weight, height, attitudes,
intelligence, leadership ability, etc.
● In a literal sense, researchers do not
measure either objects or properties. They
measure indicants of the properties of
objects.
● Properties like height, weight, age, years of
experience, or number of employees are
TARGET POPULATION easy to measure.
● The population must be carefully specified ● In contrast, it is not easy to measure
and the sample must be drawn scientifically properties of constructs like attitudes,
so that the sample is representative. satisfaction, engagement, work-life
● The target population is the population we balance, or persuasiveness. Since each
are interested in (e.g., U.S. gasoline property cannot be measured directly, one
prices). must infer its presence or absence by
● The sampling frame is the group from observing some indicator.
which we take the sample (e.g.115,000 ● The nature of measurement scales,
stations). sources of error and characteristics of
● The frame should not differ from the target sound measurement are considered in
population. subsequent slides.

CHARACTERISTICS OF A GOOD SOURCES OF MEASUREMENT


MEASUREMENT TOOL DIFFERENCES
● The ideal study should be designed and
controlled for precise and unambiguous
MEASUREMENT IN RESEARCH measurement of the variables. Since
● Measurement in research consists of complete control is unattainable, error does
assigning numbers to empirical events, occur.
objects of properties, or activities in ● The Respondent
compliance with a set of rules. This implies
➔ Opinion differences that affect
that measurement is a three-part process:
measurement come from relatively
➔ Selecting observable empirical stable characteristics of the
events. respondent like employee status,
➔ Developing a scheme (or mapping and social class. Respondents may
rules) for assigning numbers or also suffer from temporary factors
symbols to represent aspects of the like fatigue, boredom, anxiety or
event being measured. general variations in mood or other
➔ Applying the mapping rule/s to each distractions; these limit the ability to
observation of that event. respond accurately and fully.
● The goal of measurement is to provide ● Situational Factors
the highest quality, lowest-error data for ➔ Any condition that places a strain
testing hypotheses, estimation or on the interview or measurement
prediction, or description. session can have serious effects on
● The object of measurement is either a the interviewer-respondent rapport.
concept, construct or variable. Examples of such conditions are:
➔ Concepts, constructs and variables presence of another person during
may be defined descriptively or the interview, belief that anonymity
operationally. is not ensured and “ambush”
interviews.

13 diamla, foronda, gan


● The Measurers relevant items
➔ The interviewer can distort under study.
responses by rewording,
paraphrasing, or reordering Criterion-Related Degree to which
questions. the predictor is
A. Inflections of voice and adequate in Correlation
conscious or unconscious capturing the
relevant aspects
prompting with smiles, nods,
of the criterion.
and so forth, may encourage or
discourage certain replies. Concurrent Description of
B. Careless mechanical the present:
processing like checking the criterion data are Correlation
wrong response of failure to available at the
record full replies will obviously same time as
distort findings. predictor scores.
C. Incorrect coding, careless
tabulation, and faulty statistical Predictive Prediction of the
future: criterion
calculation may introduce data are Correlation
further errors. measured after
● The Instrument the passage of
➔ The instrument can be too time.
confusing and ambiguous.
A. Use of complex words and Construct Answers the Judgmental
syntax beyond participant question, “What
comprehension accounts for the Correlation of
variance in the proposed test
B. Leading questions, ambiguous
measure?”; with
meanings, mechanical defects attempts to established
(e.g. poor printing), and multiple identify the one
questions suggest the range of underlying
problems. construct(s) Convergent-
➔ One technique used to minimize being measured discriminant
measurement differences in and determine techniques
research instruments is through how well the test
represents it Factor
pilot testing.
(them). Analysis
● There are 3 major criteria for evaluating a
measurement tool: Multitrait-
➔ Validity multimethod
➔ Reliability analysis
➔ Practicality
● Validity - the extent to which a test ● Reliability - A measure is reliable to the
measures what we actually intend to degree that it supplies consistent results.
measure. Reliability is concerned with estimates of
➔ External validity of research the degree to which a measurement is free
findings is the data’s ability to be of random or unstable error.
generalized across persons, ➔ It is a necessary contributor to
settings, and times. validity but it is not a sufficient
➔ Internal validity is the ability of a condition for validity.
research instrument to measure
what it is purported to measure. Perspectives on Reliability
1. Stability – A measure is said to
Types What is Methods possess stability if one can secure
Measured? consistent results with repeated
measurements of the same person with
Content Degree to which Judgemental the same instrument.
the content of ➔ Stability is concerned with
the items Panel personal and situational
adequately evaluation
fluctuations from one time to
represent the with content
universe of all validity ratio another.

14 diamla, foronda, gan


2. Equivalence – considers how much Alpha items are
error may be introduced by different homogeneous
investigators (in observation) or and reflect the
different samples of items being same
underlying
studied (in questioning or scales). construct(s).
➔ Equivalence is concerned with
variations at one point in time
among observers and samples of ● Practicality - The scientific requirements
items. of a project calls for the measurement
➔ Examples of indicators used to process to be reliable and valid, while the
assess equivalence is inter rater operational requirements call for it to be
reliability practical. Practicality has been defined as
3. Internal consistency – refers to economy, convenience and interpretability.
homogeneity among the items. Among
the techniques used are the split-half SELECTION OF A MEASUREMENT SCALE
technique and the Spearman Brown ● Selecting and constructing a measurement
correction formula scale requires the consideration of several
➔ The split-half technique is used factors that influence the reliability, validity
when the measuring tool has many and practicality of the scale.
similar questions while the ● These are:
Spearman Brown correction ➔ research objectives
formula is used to adjust for the ➔ response types
effect of test length and to estimate ➔ data properties
reliability of the whole test. ➔ number of dimensions
➔ Other measures of internal ➔ balanced or unbalanced
consistency are KR20, Cronbach’s ➔ forced or unforced choices
𝛼 and McDonald’s 𝜔. ➔ number of scale points and rater
errors.
Type Coefficient What is Methods
Measured SURVEY RESEARCH
Test- Retest Stability Reliability of a
test or BASIC STEPS OF SURVEY RESEARCH
instrument ● Step 1: State the goals of the research
inferred from
examinee ● Step 2: Develop the budget (time, money,
scores; same Correlation staff).
test is ● Step 3: Create a research design (target
administered population, frame, sample size).
twice to same
● Step 4: Choose a survey type and method
subjects over
an interval of of administration.
less than six ● Step 5: Design a data collection instrument
months. (questionnaire).
● Step 6: Pretest the survey instrument and
Parallel Equivalence Degree to
Forms which
revise as needed.
alternative ● Step 7: Administer the survey (follow up if
forms of the Correlation needed).
same measure ● Step 8: Code the data and analyze it.
produce same
or similar
results;
administered SURVEY TYPES
simultaneously ● Mail - You need a well-targeted and current
or with a delay. mailing list (people move a lot).
Interrater
estimates of ➔ Low response rates are typical and
the similarity of nonresponse bias is expected
judges (nonrespondents differ from those
observations or
who respond).
scores.
➔ Zip code lists (often costly) are an
Split-Half, Internal Degree to Specialized attractive option to define strata of
KR20, Consistency which Correlational similar income, education, and
Cronbach’s instrument Formulas attitudes.
15 diamla, foronda, gan
➔ To encourage participation, a cover Design Invest time and money in
letter should clearly explain the designing the survey. Use
uses to which the data will be put. books and references to avoid
Plan for follow-up mailings. unnecessary errors.
● Telephone - Random dialing yields very
low response and is poorly targeted. Quality Take care in preparing a quality
survey so that people will take
➔ Purchased phone lists help reach
you seriously.
the target population, though a low
response rate still is typical Pilot Test Pretest on friends or co-workers
(disconnected phones, caller to make sure the survey is clear.
screening, answering machines,
work hours, no-call lists). Buy-in Improve response rates by
➔ Other sources of nonresponse bias stating the purpose of the
include the growing number of non- survey, offering a token of
appreciation or paving the way
English speakers and distrust
with endorsements.
caused by scams and spams.
● Interviews - Interviewing is expensive and Expertise Work with a consultant early on.
time consuming, yet a trade-off between
sample size for high-quality results may still
be worth it. QUESTIONNAIRE DESIGN
➔ Interviews must be carefully ● Use a lot of white space in layout.
handled so interviewers must be ● Begin with short, clear instructions.
well-trained – an added cost. ● State the survey purpose.
● Assure anonymity
➔ But you can obtain information on
● Instruct on how to submit the completed
complex or sensitive topics (e.g.,
survey.
gender discrimination in
● Break survey into naturally occurring
companies, birth control practices,
sections.
diet and exercise habits).
● Let respondents bypass sections that are
● Web - Web surveys are growing in
not applicable (e.g., “if you answered no to
popularity, but are subject to nonresponse
question 7, skip directly to Question 15”).
bias because those who participate may
● Pretest and revise as needed.
differ from those who feel too busy, don’t
● Keep as short as possible
own computers or distrust your motives
(scams and spam are again to blame).
CODING AND DATA SCREENING
➔ This type of survey works best
● Responses are usually coded numerically
when targeted to a well-defined
(e.g., 1 = male 2 = female).
interest group on a question of self-
● Missing values are typically denoted by
interest (e.g., views of CPAs on
special characters (e.g., blank, “.” or “*”).
new proposed accounting rules,
● Discard questionnaires that are flawed or
frequent flyer views on airline
missing many responses.
security).
● Watch for multiple responses, outrageous
● Direct Observation - This can be done in
or inconsistent replies or range answers.
a controlled setting (e.g., psychology lab)
● Follow-up if necessary and always
but requires informed consent, which can
document your data-coding decisions.
change behavior.
➔ Unobtrusive observation is possible
ADVICE ON COPYING DATA
in some non lab settings (e.g., what
● Using commas (,), dollar signs ($), or
percentage of airline passengers
percent (%) as part of the values may result
carry on more than two bags, what
in your data being treated as text values.
percentage of SUVs carry no
● A numerical variable may only contain the
passengers, what percentage of
digits 0-9, a decimal point, and a minus
drivers wear seat belts).
sign.
● To avoid round-off errors, format the data
SURVEY GUIDELINES column as plain numbers with the desired
Planning What is the purpose of the number of decimal places before you copy
survey? Consider staff the data to a statistical package.
expertise, needed skills, degree
of precision, budget.

16 diamla, foronda, gan


FUNDAMENTALS OF HYPOTHESIS THE TEST STATISTIC AND CRITICAL VALUES
TESTING AND TESTS FOR A POPULATION ● If the sample mean is close to the
MEAN (ONE SAMPLE) hypothesized value of the population
mean, the null hypothesis is not rejected.
STATISTICAL HYPOTHESIS TESTING ● If the sample mean is far from the
● Hypothesis - Claim, assumption, or assumed population mean, the null
conjecture about a population parameter. It hypothesis is rejected.
may or may not be true. ● How far is “far enough” to reject the null
● Population Mean hypothesis is determined by the level of
significance, 𝛼.
➔ EX. The mean monthly allowance
of students in this university is 𝜇 = ➔ It defines the unlikely values of the
Php 7,500 sample statistic if the null
● Population Proportion hypothesis is true.
➔ EX. The proportion of adults who ➔ It defines the rejection or critical
own a car in this city is 𝜌 = 55% region of the sampling distribution.
● Hypothesis testing is a decision making ➔ Typical values used are 0.01, 0.05
process for evaluating claims about a and 0.10.
population. ➔ It is selected by the researcher
before sampling.
TYPES OF STATISTICAL HYPOTHESES ➔ It also provides the critical value of
THE NULL HYPOTHESIS (H0) the test.
● states that there is no difference between a
parameter and a specific value or that there POSSIBLE OUTCOMES OF A HYPOTHESIS
is no difference between two parameters. TEST
● Always contains “=“, “≤” or “≥” sign ● We reject the null hypothesis when it is
true. This would be an incorrect decision
and would result in a Type I error.
● Always about a population parameter, not ● We reject the null hypothesis when it is
about a sample statistic. false. This would be a correct decision.
● We do not reject the null hypothesis when
it is true. This would be a correct decision.
● May or may not be rejected. ● We do not reject the null hypothesis when
it is false. This would be an incorrect
THE ALTERNATIVE HYPOTHESIS (H1) decision and would result in a Type II error.
● States the opposite of the null hypothesis.
● Never contains “=“, “≤” or “≥” sign. ERRORS IN DECISION MAKING
● A Type I error occurs when we reject the
null hypothesis when it is true.
➔ The level of significance, 𝛼, is the
● May or may not be proven.
maximum probability of committing
HYPOTHESIS TESTING
a Type I error. That is, P Type I
● We assume that the null hypothesis is true.
error = 𝛼.
● If the null hypothesis is rejected, we
● A Type II error occurs when we do not
have proven the alternative hypothesis.
reject the null hypothesis when it is false.
● If the null hypothesis is not rejected, we
have proven nothing as the sample size ➔ The probability of committing a
may have been too small. Type II error is denoted by 𝛽. That
is, P Type II error = 𝛽.
HYPOTHESIS TESTING PROCESS - MEAN ● Type I and Type II errors can not happen at
● State the hypotheses about the population the same time.
mean.
● Gather data from a sample of the LEVEL OF SIGNIFICANCE AND CRITICAL
population. REGION
● Use the data obtained from a sample to ● Critical or rejection region - range of test
make a decision about whether the null values that indicates that there is a
hypothesis should be rejected. This is significant difference and that the null
called a statistical test. hypothesis should be rejected.
● The numerical value computed from a ● Noncritical or non-rejection region -
statistical test is called the test value or range of test values that indicates that the
the test statistic. difference was probably due to chance and
17 diamla, foronda, gan
that the null hypothesis should not be ● FORMULA FOR Z-TEST
rejected.
● Critical value - separates the rejection
region from the non-rejection region. This
value is determined based on the level of
Where: 𝑥 is the sample mean
significance, 𝛼 .
𝜇 is the hypothesized value of
HYPOTHESIS AND TYPE OF TEST the population mean
● The alternative hypothesis determines the 𝜎 is the population standard
type of test (two-tailed, left-tailed or right- deviation
tailed) as well as the rejection region. 𝑛 is the sample size

HYPOTHESIS TEST: TRADITIONAL METHOD


OR CRITICAL VALUE APPROACH
1. State the hypotheses and identify the claim.
The hypotheses can be structured in one of
three ways.
where 𝜇𝑜 is the hypothesized value of the 2. Compute the test value or test statistic.
population mean. 3. Find the critical value (refer to Area under
normal curve table). Identify the critical or
LEVEL OF SIGNIFICANCE, TYPE OF TEST, rejection region.
AND REJECTION REGION 4. Make the decision.
5. Summarize the results/make the conclusion.

HYPOTHESIS TEST: TRADITIONAL METHOD


How to find critical values for specific values of
𝛼:
1. Draw the figure and indicate the
appropriate area.
A. If the test is left-tailed, the rejection region,
with an area equal to 𝛼, will be at the left
tail of a normal curve.
B. If the test is right-tailed, the rejection
region, with an area equal to 𝛼, will be at
the right tail of a normal curve.
C. If the test is two-tailed, divide 𝛼 by 2; one
HYPOTHESIS TESTS FOR THE MEAN half will be at the right tail while the other
half will be at the left tail of the curve.
2. Obtain the z-value from the table “Area
under the normal curve”.
A. .For a left-tailed test, use the z-value that
corresponds to the area equivalent to 𝛼.
Affix a negative sign to the z-value.
B. For a right-tailed test, use the z-value that
corresponds to the area equivalent to 𝛼 .
C. For a two-tailed test, use that z-value that
corresponds to 𝛼/2. Affix a negative sign to
the z-value at the left tail.

HYPOTHESIS TESTS FOR 𝜇: 𝜎 IS KNOWN HYPOTHESIS TEST: P-VALUE METHOD


● If the population standard deviation is 1. State the hypotheses and identify the claim.
known, use the z-test. The hypotheses can be structured in one of
● The z-test is a statistical test which can be three ways.
used when: 2. Compute the test value or test statistic.
➔ The population standard deviation 3. Use the Area under the normal curve table to
is known. determine the probability or p-value that
➔ The sample size is n ≥ 30. corresponds to the computed value obtained
➔ The population is normally in step 2.
distributed even if n < 30. 4. Make the decision. If p-value < 𝛼 , reject 𝐻𝑜.
5. Summarize the results/make the conclusion.

18 diamla, foronda, gan


● The mean, median and mode are equal to
HYPOTHESIS TEST: P-VALUE APPROACH 0 and are located at the center of the
● The P-value (or probability value) is the distribution.
probability of getting a sample statistic ● The curve approaches but never touches
(such as the sample mean) or a more the x-axis.
extreme sample statistic in the direction of
the null hypothesis when the null Properties different from the z distribution:
hypothesis is true. ● 1The variance is greater than 1.
● It is the actual area under the standard ● It is a family of curves based on degrees of
normal distribution curve associated with freedom.
the computed value of the test statistic. ● As the sample size increases, it
approaches the normal distribution.
HYPOTHESIS TESTS FOR 𝜇: 𝜎 IS UNKNOWN
● If the population standard deviation is Z-TEST FOR A POPULATION
unknown, the sample standard deviation, PROPORTION (ONE SAMPLE)
s, instead of the population standard EXAMPLES OF PROPORTIONS
deviation 𝜎. ● 59% of consumers purchase gifts for their
● As a result, the student’s t-distribution is fathers.
used instead of the standard normal ● 50.3% of businessmen said they own
distribution to hypothesis for a population stocks and mutual funds.
mean. ● 55% of Filipinos buy generic products.
● The t-test is a statistical test which can ● 40% of Filipino families go out for dinner
be used when: once a week.
➔ The population standard deviation is
unknown. HYPOTHESIS TESTING INVOLVING A
➔ n < 30 and the population is normally POPULATION PROPORTION
distributed ● This test is considered a binomial
● FORMULA FOR THE T-TEST experiment where there are only two
possible outcomes.
● The two possible outcomes are:
➔ “success” (a sample possesses of
Where: 𝑥 is the sample mean a certain characteristic)
𝜇 is the hypothesized value of ➔ “failure” (a sample does not
the population mean possess that characteristic)
𝑠 is the sample standard ● The fraction or proportion of the population
deviation in the “success” category is denoted by 𝑝.
𝑛 is the sample size We refer to 𝑝 as the population proportion.
df if the degree of freedom ● The fraction or proportion of the sample in
● ASSUMPTIONS the “success” category is denoted by 𝑝̂ . We
➔ The sample statistic comes from a refer to 𝑝̂ as the population proportion.
random sample from a normal ● A normal distribution can be used to
distribution. approximate the binomial distribution when
➔ If the sample size is less than 30, 𝑛𝑝 ≥ 5 and 𝑛 1 − 𝑝 ≥ 5.
use a box and whisker plot or a ● Hence, the standard normal distribution
normal probability plot to assess can be used to test hypotheses for
whether the assumption of population proportions.
normality is valid. ● The sample proportion in the success
➔ If the sample size is at least 30, category, denoted by 𝑝̂ , is defined by
the central limit theorem applies
and the sampling distribution of the
mean will be normal. ● When 𝑛𝑝 ≥ 5 and 𝑛(1 − 𝑝) ≥ 5, the sample
proportion 𝑝̂ can be approximated by a
PROPERTIES OF THE STUDENT’S T- normal distribution with mean 𝜇𝑝̂ and
DISTRIBUTION standard deviation 𝜎𝑝̂ obtained by:
Properties similar to the standard normal (z)
distribution:
● It is bell shaped. 2
● It is symmetric about the mean.

19 diamla, foronda, gan


● The test statistic:

ASSUMPTIONS FOR TESTING A


PROPORTION
● The following must be met for testing a
population proportion:
➔ The sample is a random sample.
➔ 𝑛𝑝 ≥ 5 and 𝑛(1 − 𝑝) ≥ 5.
➔ The sampled values are
independent of each other

20 diamla, foronda, gan

You might also like