0% found this document useful (0 votes)
79 views137 pages

Combinepdf

The document provides an overview of statistics, defining key terms such as population, sample, variable, and measurement. It discusses different types of variables, levels of measurement, and areas of statistics including descriptive and inferential statistics. Additionally, it covers measures of central tendency, correlation, and regression analysis, emphasizing the importance of statistical literacy and data analysis processes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views137 pages

Combinepdf

The document provides an overview of statistics, defining key terms such as population, sample, variable, and measurement. It discusses different types of variables, levels of measurement, and areas of statistics including descriptive and inferential statistics. Additionally, it covers measures of central tendency, correlation, and regression analysis, emphasizing the importance of statistical literacy and data analysis processes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

STATISTICS

WHAT IS STATISTICS?

• Plural Sense
• Data themselves or numbers derived from the collected and analyzed data

• Singular Sense
• Scientific methods for collecting, organizing, summarizing, presenting and analyzing data and
drawing conclusion on the basis of such analysis
KEY TERMS

• Population (universe)
• The collection of things under consideration. (All CTU Graduate Students for 1 st Trim 2023)

• Sample
• A portion of the population selected for analysis. (New Students for 1st Trim 2023)

• Variable
• A characteristic observed or measured on every unit of universe/sample. (Height/Weight)

• Measurement
• The process of determination the value or label, either qualitative or quantitative, of a
particular variable for a particular unit of analysis
KEY TERMS

• Observation
• The realized value of the variable. (5’1, 70kg)

• Data set
• The collection of all observation. Height and Weight of 2 students (5’0, 40 kg, 5’1 70 kg)

• Parameter
• Summary measure computed to describe a characteristics of the population. Use Greek
letters - μ (myu), σ (sigma), ρ (rho), and θ (theta)
• μ = 5’1 – average height of all Grad School Students for 1st Trim 2023.
KEY TERMS

• Statistic
• A summary measure computed to describe a characteristic of a sample

SAMPLE
Use statistics to
summarize features
Use parameters to
POPULATION
summarize features
VARIABLES

• QUALITATIVE
• Describes quality or attribute of person or object
• Assign label or name
• Gender – male, female
• Civil Status – single, married, widow

• QUANTITATIVE
• Describes amount or number of something
• Any attribute measured in numbers
• Weight – 80 kg, 120 lbs, etc.
• Height – 5’6, 50 cm, etc.
QUANTITATIVE VARIABLE (QV)

• Continuous QV
• Can assume any value between two given values
• Example: Weight – can take any value from 70 kg to 71 kg
• Not an example: between 4 and 5 persons

• Discrete QV
• There are no possible values between adjacent units on the scale
• Example: number of students
• Not an example: 5.5 students
LEVEL OF MEASUREMENT

• MEASUREMENT
• The process of determining the value or label, either qualitative or quantitative, of a particular
variable for a particular unit of analysis

• Nominal Level
• Number or symbols are used simply to classify an object, person, or characteristics into
categories
• Categories must be non-overlapping, and exhaustive
• Weakest level
LEVEL OF MEASUREMENT

• Examples of Nominal Level


• Gender
• Political Affiliation
• Identification Numbers (Student or Government IDs)

• Ordinal Level
• Properties of nominal + the number assigned to categories can be ordered in some
low-to-high manner
• Sizes of Apparel (T-shirts, shoes, etc.), Performance rating (O,VG, G, F, P)
LEVEL OF MEASUREMENT

• Interval Level
• Properties of ordinal + distances between any two numbers on the scale are of known sizes
• Zero point is arbitrary
• Example: Temperature in °C or °F, Intelligence Quotient (IQ)

• Ratio Level
• Properties of interval + a true zero point
• Strongest Level
• Height in meters, weight in kilograms
AREAS OF STATISTICS

• Descriptive Statistics
• Concerned with techniques that are used to describe or characterize the obtained
data

• Inferential Statistics
• Involves techniques that use the obtained sample data to infer to populations
EXAMPLE

You measure the number of times ten albino rats press a lever to obtain
food following an injection of amphetamine (an appetite suppressant). The
average number of lever presses made by each rat during the session was
[Link] state that the average number of level presses emitted by your ten
rats was 187.

Descriptive Stat
EXAMPLE

You want to administer a reading achievement test to all the fourth-grade children in the
school district. However, due to the limited number of test materials and personnel, you
select one fourth-grade class from each elementary school in the district and administer
the tests to only these children. You test 135 out of the total of 674 fourth-graders in the
district. After analyzing the reading achievement scores, the fourth-grades in the district are
reading at the sixth-grade level.

Inferential Stat
DESCRIPTIVE STATISTICS INFERENTIAL STATISTICS
✔ Collect ✔ Predict and forecast values of population
✔ Organize parameters
✔ Summarize ✔ Estimation of parameter
✔ Display ✔ Test hypothesis about values of population
✔ Analyze parameters
✔ Make decisions
Without drawing conclusion
Drawing conclusion about the population from
the data gathered in the sample
THREE
IMPORTANT
REASONS
WHY STATISTICAL
LITARACY
IS IMPORTANT
TO BE INFORMED

TO UNDERSTAND ISSUES AND BE ABLE TO MAKE


SOUND DECISIONS BASED ON DATA

TO BE ABLE TO EVALUATE DECISIONS THAT


AFFECT YOUR LIFE
DATA ANALYSIS PROCESS

UNDERSTANDING THE INTERPRETATION OF


NATURE OF THE PROBLEM RESULTS

DECIDING WHAT TO FORMAL DATA ANALYSIS


MEASURE AND HOW TO
MEASURE IT

DATA COLLECTION DATA SUMMARIZATION


AND PRELIMINARY
ANALYSIS
The student senate at a university with 15,000 students
is interested in the proportion of students who favor a
change in the grading system to allow for plus and minus
grades (e.g., B+, B, B-, rather than just B).Two hundred
students are interviewed to determine their attitude
toward this proposed change.

What is the population of interest?

What group of students constitutes the


sample in this problem?
DATA COLLECTION
METHODS
DATA COLLECTION METHODS

• OBJECTIVE METHOD
• SUBJECTIVE METHOD
• USE OF EXISTING RECORDS
WHY SAMPLE?

• Because the census of a population may be:


• Impossible
• Impractical
• Too costly
Summary
Measures

Central
Variation
Tendency
Quartile

Interquar
Mean Media Mode Range Variance tile Range
Coefficien
t of
Variation
Arithmeti
c Mean Geometri Harmonic Standard
c Mean Mean Deviation
MEASURES OF CENTRAL TENDENCY

• A single value that is used to identify the “center” of the data


• It is thought of as a typical value of the distributioin
• Precise yet simple
• Most representative value of the data
MEAN

• Most common measure of the center


• Also known as arithmetic mean or average
PROPERTIES OF THE MEAN

• May not be an actual observation in the data set


• Can be applied in at least interval level
• Easy to compute
• Every observation contributes to the value of the mean
PROPERTIES OF THE MEAN

• Subgroup means can be combined to come up with a group mean


(weighted mean)
• Easily affected by extreme values.

10 11 12 13 14 15 16

Mean
EXAMPLE

• 2000 out of 5000 examinees passed the October 2023 Licensure Examination for Teachers
• You are tasked to determine the performance of the examinees of the said examination.
• Sample: 6

EXAMINEE A EXAMINEE B EXAMINEE C EXAMINEE D EXAMINEE E EXAMINEE F

96% 92% 77% 83% 70% 75%


96 + 92 + 77 + 83 + 70 + 75
MEAN =
6

MEAN = 82%
UNGROUP DATA

A student kept track of the number of hours


he studied each day for a 2-week period. The
following daily scores were recorded (scores
are in hours): 2.5, 3.2, 3.8, 1.3, 1.4, 0, 0, 2.6, 5.2,
4.8, 0, 4.6, 2.8, 3.3. Calculate for the mean
number of hours studied per day.
GROUP DATA
For 108 randomly selected college students,
the following IQ frequency distribution was
obtained. Compute for the mean using (a)
long method, (b) short method, (c) coding
method.
CLASS LIMITS FREQUENCY
90-98 6
99-107 22
108-116 43
117-125 28
126-134 9
LONG METHOD

CLASS LIMITS (X) FREQUENCY (f) fX


94 6 564
103 22 2266
112 43 4816
121 28 3388
130 9 1170
N=108 Σ(fX) = 12204

MEAN = 113
SHORT METHOD

CLASS LIMITS FREQUENCY (f) d=X-A Fd


(X)
94 6 -18 -108
103 22 -9 -198
A 112 43 0 0
121 28 9 252
130 9 18 162
N=108 Σ(fd) = 108
CODING METHOD

CLASS LIMITS FREQUENCY (f) u Fu


(X)
94 6 -2 -12
103 22 -1 -22
A 112 43 0 0
121 28 1 28
130 9 2 18
N=108 Σ(fu) = 12
MEDIAN


PROPERTIES OF MEDIAN

• May not be an actual observation in the data set


• Can be applied in at least ordinal level
• A positional measure; not affected by extreme values

10 11 12 13 14 15 16

MEDIAN
UNGROUP DATA
An industrial psychologist observed eight drill-press
operators for three working days. She recorded the
number of times each operator pressed the “faster”
button instead of the “stop” button to determine
whether the design of the control panel was
contributing to the high rate of accidents in the
plant. Given the scores 4, 7, 0, 2, 7, 3, 6, 7, compute
for the median
SOLUTION

Arrange the data in ascending order:

0, 2, 3, 4, 6, 7, 7, 7
MEDIAN
MODE

• Occurs most frequently


• Nominal average
• May or may not exist

1, 2, 3, 3, 3, 5, 6, 6, 7, 8, 9, 9, 9
PROPERTIES OF MODE

• Can be used for qualitative as well as quantitative data


• May not be unique
• Not affected by extreme values
• Can be computed for ungrouped and grouped data
MEAN, MEDIAN, MODE

• Use the MEAN when:


• Sampling stability is desired
• Other measures are to be computed
MEAN, MEDIAN, MODE

• Use the MEDIAN when:


• The exact midpoint of the distribution is desired
• There are extreme observations
MEAN, MEDIAN, MODE

• Use the MODE when:


• The typical value is desired
• Dataset is measured in nominal scale
Correlation & Regression
ATTY. JURIS RENIER MENDOZA
CORRELATION
● A statistical measure that indicates the extent to which two or more
variables fluctuate together.
○ Positive Correlation - variables increase or decrease in parallel
○ Negative Correlation - one variable increases as the other decreases
● A statistical technique that can show whether and how strongly pairs
of variables are related
○ POSITIVE/DIRECT - values INCREASE together
○ NEGATIVE - one value decreases as the other increases (inverse or contrary)
CORRELATION

Strong positive correlation


between x and y. The points lie
close to a straight line with y
increasing as x increases.
CORRELATION

Weak, positive correlation


between x and y. The trend
shown is that y increases as x
increases but the points are not
close to a straight line.
CORRELATION

No correlation between x and y;


the points are distributed
randomly on the graph.
CORRELATION

Weak, negative correlation


between x and y. The trend
shown is that y decreases as x
increases but the points do not
lie close to a straight line
CORRELATION

Strong, negative correlation. The


points lie close to a straight line,
with y decreasing as x increases
CORRELATION VALUES

PERFECT POSITIVE CORRELATION

NO CORRELATION

PERFECT NEGATIVE CORRELATION


TYPES OF CORRELATIONS
1. PEARSON CORRELATION

2. KENDALL RANK CORRELATION

3. SPEARMAN CORRELATION
PEARSON CORRELATION
It is developed by Karl Pearson

It is denoted as r coefficient

The Pearson’s correlation coefficient is a statistical measure of the


strength of a linear relationship between paired data.

It is a parametric statistical measure.


PEARSON CORRELATION
To use Pearson correlation, your data must meet the following
requirements:
1. Two or more continuous variables (interval or ratio level)
2. Cases must have non-missing values on both variables
3. Linear relationship between the variables
4. Independent cases or observations*
5. Bivariate Normality*
6. Random sample of data from the population
7. No outliers
KENDALL RANK CORRELATION
It is named after MAURICE KENDALL

It is a statistic used to measure the ordinal association between two


measured quantities.

It is a measure of rank correlation.


KENDALL RANK CORRELATION
The Kendall rank coefficient is often used as a test statistic in a
statistical hypothesis test to establish whether two variables may be
regarded as statistically dependent. This test is non-parametric, as it
does not rely on any assumptions on the distributions of X or Y or the
distribution of (X,Y).
SPEARMAN’S RANK CORRELATION
It is named after Charles Spearman.

It is a nonparametric measure of rank correlation.

The Spearman correlation between two variables is equal to the


Pearson correlation between the rank values of those two variables;
while Pearson's correlation assesses linear relationships, Spearman's
correlation assesses monotonic relationships (whether linear or not)
SPEARMAN’S RANK CORRELATION
The Spearman correlation coefficient is often described as being
"nonparametric". This can have two meanings. First, a perfect
Spearman correlation results when X and Y are related by any
monotonic function. Contrast this with the Pearson correlation, which
only gives a perfect value when X and Y are related by a linear
function.

The other sense in which the Spearman correlation is nonparametric is


that its exact sampling distribution can be obtained without requiring
knowledge of the joint probability distribution of X and Y.
REGRESSION
Regression

Regression analysis is one of the most commonly used statistical techniques in


social and behavioral sciences as well as in physical sciences which involves
identifying and evaluating the relationship between a dependent variable and
one or more independent variables, which are also called predictor or
explanatory variables.
Linear regression explores relationships that can be readily described by
straight lines or their generalization to many dimensions. A surprisingly large
number of problems can be solved by linear regression, and even more by
means of transformation of the original variables that result in linear
relationships among the transformed variables.
Regression
● Simple Linear Regression
○ There is a single continuous dependent variable and a single independent variable

● Multiple Regression
○ Relationship between several independent or predictor variables and a dependent
or criterion variable
Regression
The primary objective of regression is to develop a linear relationship
between a response variable and explanatory variables for the
purposes of prediction, assumes that a functional linear relationship
exists, and alternative approaches (functional regression) are
superior.
Simple Linear Regression

● Simple linear regression is a statistical method that allows us to


summarize and study relationships between two continuous
(quantitative) variables. In a cause and effect relationship, the
independent variable is the cause, and the dependent variable is
the effect.
● Least squares linear regression is a method for predicting the value
of a dependent variable y, based on the value of an independent
variable x.
MEASURES OF
CENTRAL TENDENCY
MEASURES OF CENTRAL
TENDENCY
• A single value that is used to identify the “center” of the
data

• It is thought of as a typical value of the distribution

• Precise yet simple

• Most representative value of the data


MEAN
• Most common measure of the center

• Also known as arithmetic mean or average


Properties of the Mean
• May not be an actual observation in the data set

• Can be applied in at least interval level

• Easy to compute

• Every observation contributes to the value of the mean


Properties of the Mean
• Subgroup means can be combined to come up with a
group mean (weighted mean)

• Easily affected by extreme values.

10 11 12 13 14 15 16

Mean
Example
• 2000 out of 5000 examinees passed the October 2023
Licensure Examination for Teachers
• You are tasked to determine the performance of the
examinees of the said examination.
• Sample: 6

EXAMINEE A EXAMINEE B EXAMINEE C EXAMINEE D EXAMINEE E EXAMINEE F

96% 92% 77% 83% 70% 75%


96 + 92 + 77 + 83 + 70 + 75
MEAN =
6

MEAN = 82%
UNGROUP DATA

A student kept track of the number of hours he


studied each day for a 2-week period. The
following daily scores were recorded (scores are
in hours): 2.5, 3.2, 3.8, 1.3, 1.4, 0, 0, 2.6, 5.2, 4.8,
0, 4.6, 2.8, 3.3. Calculate for the mean number of
hours studied per day.
GROUP DATA
For 108 randomly selected college students,
the following IQ frequency distribution was
obtained. Compute for the mean using (a)
long method, (b) short method, (c) coding
method.
CLASS LIMITS FREQUENCY
90-98 6
99-107 22
108-116 43
117-125 28
126-134 9
LONG METHOD

CLASS LIMITS (X) FREQUENCY (f) fX


94 6 564
103 22 2266
112 43 4816
121 28 3388
130 9 1170
N=108 Σ(fX) = 12204

MEAN = 113
SHORT METHOD

CLASS LIMITS FREQUENCY (f) d=X-A Fd


(X)
94 6 -18 -108
103 22 -9 -198
112 43 0 0
A
121 28 9 252
130 9 18 162
N=108 Σ(fd) = 108
CODING METHOD

CLASS LIMITS FREQUENCY (f) u Fu


(X)
94 6 -2 -12
103 22 -1 -22

A 112 43 0 0
121 28 1 28
130 9 2 18
N=108 Σ(fu) = 12
MEDIAN

10 11 12 13 14 15 16

10 11 12 13 14 15
MEDIAN
• It may not be an actual observation in the data set.
• It can be applied in at least ordinal level.
• A positional measure and it is not affected by
extreme values.
UNGROUPED DATA

An industrial psychologist observed eight drill-press


operators for three working days. She recorded the
number of times each operator pressed the “faster”
button instead of the “stop” button to determine
whether the design of the control panel was
contributing to the high rate of accidents in the
plant. Given the scores – 4, 7, 0, 2, 7, 3, 6, 7 –
compute for the median.
Solution

0 2 3 4 6 7 7 7

0 2 3 4 6 7 7 7
Grouped Data
The following data represents the survey regarding
the heights (in cm) of 51 girls of Class Maganda. Find
the median height.
HEIGHT (cm) Number of Girls
Below 140 4
140-145 7
145-150 18
150-155 11
155-160 6
160-165 5
Solution
• Get the cumulative frequency.

HEIGHT (cm) Number of Girls Cumulative Frequency


Below 140 4 4
140-145 7 11
145-150 18 29
150-155 11 40
155-160 6 46
160-165 5 51

• Take note that our sample population = 51 girls.


Solution
• Calculate the median class. To find the median
class, we have to find the cumulative frequencies of
all the classes and n/2.
• After that, locate the class whose cumulative
frequency is greater than (nearest to) n/2. The class
is called the median class.
• Median class = n/2
• Median class = 51/2 = 25.5
• Therefore, the observations lie between the class
interval 145-150, which is called the median class.
Solution

Solution
• HEIGHT (cm) Number of Girls Cumulative Frequency
Below 140 4 4
140-145 7 11
145-150 18 29
150-155 11 40
155-160 6 46
160-165 5 51
Mode
• It occurs most frequently
• Nominal Average
• It may or may not exist.

1 2 3 3 3 5 6 6 7 8 9 9 9

• It can be used for qualitative as well as quantitative


data.
• It may not be unique.
• It is not affected by extreme values.
• It can be computed for ungrouped and grouped
data
Grouped Data
In a class of 30 students, the scores obtained by
students in mathematics out of 50 is tabulated
below. Calculate the mode of the given data.

Scores Obtained Number of Students


10-20 5
20-30 12
30-40 8
40-50 5
Solution
• Formula for the mode in a grouped data set.

Where:

l = lower limit of the modal class

h = size of the class interval

f1 = frequency of the modal class

f0 = frequency of the class preceding the modal class

f2 = frequency of the class succeeding the modal class


Solution
Scores Obtained Number of Students • l = 20
10-20 5 • h = 10
• f1 = 12
20-30 12
• f0 = 5
30-40 8 • f2 = 8
40-50 5
MEAN, MEDIAN, MODE
• Use the MEAN when:
• Sampling stability is desired.
• Other measures are to be computed.

• Use the MEDIAN when:


• The exact midpoint of the distribution is desired.
• There are extreme observations.

• Use the MODE when:


• The typical value is desired
• Data set is measured in nominal scale.
MEASURES OF
LOCATION
• Measure of Location
• It summarizes a data set by giving a value within the range of the
data values that describes its location relative to the entire data set
arranged according to magnitude (array).

• Some common measures:


• Minimum, Maximum
• Percentiles, Deciles, Quartiles
Maximum and Minimum
• Minimum
• The smallest value in the data set, denoted as MIN.

• Maximum
• The largest value in the data set, denoted as MAX.
Percentiles
• Numeral measures that give the relative position of a data
value relative to the entire data set.

• Divide an array (raw data arranged in increasing or


decreasing order of magnitude) into 100 equal parts.

• The jth percentile, denoted as Pj is the data value in the


data set that separates the bottom j% of the data from the
top (100-j)%
• UP Diliman Health Clinic is interested on the
weights of some athletes in the school. So, they
measured the weights of 6 athletes coming from
different areas of sports.

WEIGHTS SORTED WEIGHTS


82 78
88 82
78 86
86 88
97 92
92 97
• Find the 50th, and the 80th percentiles of the data
set
• Determine the data point in the position: (n+1)(P/100)
• (6+1)(50/100) = 3.5
• Thus, the percentile is located at the 3.5th position.
• The 3rd observation is 86 and the 4th observation is also 88.
• The 50th percentile will lie halfway between the 3rd and 4th
values and therefore, 87.
Deciles
• Divide an array into ten equal parts, each part haven
ten percent of the distribution of the data values,
denoted by Dj.

• The 1st decile is the 10th percentile; the 2nd decile is


the 20th percentile…
Quartiles
• Divide an array into four equal parts, each part
having 25% of the distribution of the data values,
denoted by Qj.

• The 1st quartile is the 25th percentile; the 2nd


quartile is the 50th percentile, also the median, and
the 3rd quartile is the 75th percentile.
MEASURES OF
VARIATION
• A measure of variation is a single value that
is used to describe the spread of the
distribution
• A measure of central tendency alone does not uniquely
described a distribution
Range (R)
• The difference between the maximum and
minimum value in a data set.

R = MAX – MIN

• Pulse rates of 15 male residents of a certain village.

54 58 58 60 62 65 66 71
74 75 77 78 80 82 85
Some Properties of the Range
• The larger the value of the range, the more
dispersed the observations are.

• It is quick and easy to understand.

• A rough measure of dispersion.


Inter-Quartile Range (IQR)
• The difference between the third quartile and first
quartile:

IQR = Q3 – Q1

• E.g. Pulse rates of 15 male residents of a certain


village.
54 58 58 60 62 65 66 71
74 75 77 78 80 82 85
Some Properties of the IQR

• Reduces the influence of extreme values.

• Not as easy to calculate as the Range.


Variance

• An important measure of variation.

• It shows variation about the mean.


Standard Deviation
• It is the most important measure of variation

• It is the square root of variance

• It has the same units as the original data.


Computation of Standard Deviation

10 12 14 15 17 18 18 24

n=8
Mean = 16
Remarks on Standard Deviation
• If there is a large amount of variation, then on
average, the data values will be far from the mean.
Hence, the SD will be large.

• If there is only a small amount of variation, then on


average, the data values will be close to the mean.
Hence, the SD will be small.
Properties of Standard
Deviation
• It is the most widely used measure of dispersion.
(Chebychev’s Inequality)
• It is based on all the items and is rigidly defined.
• It is used to test the reliability of measures
calculated from the samples.
• The standard deviation is sensitive to the presence
of extreme values.
• It is not easy to calculate by hand (unlike the range)
Coefficient of Variation

Comparing CVs
• Plant A:
• No. of workers = 5,000
• Average Monthly Wages = P 2,500.00
• Standard Deviation = P 9
• CV = 0.36%

• Plant B:
• No. of workers = 6,000
• Average Monthly Wages = P 2,500.00
• Standard Deviation = 10
• CV = 0.4%
CV and SD

COEFFICIENT OF VARIATION STANDARD DEVIATION


It is a relative measure of It is an absolute measure of
dispersion dispersion
It measures the ratio of the It measures how far a data point
standard deviation to the mean lies from the mean
It is usually used to compare It is used to measure the
variation of different data sets dispersion in a single data set.
MEASURES OF SHAPE
The measure of central tendency and measure of
dispersion can describe the distribution but they are not
sufficient to describe the nature of the distribution. For this
purpose, we use other two statistical measures that
compare the shape to the normal curve called Skewness
and Kurtosis.
SKEWNESS
• A statistical number that tells us if a distribution is
symmetric or not.

• A distribution is symmetric if the right side of the


distribution is similar to the left side of the distribution.

• If a distribution is symmetric, then the Skewness value


is 0. (mean = median = mode)


• Skewness is greater than zero, it is called right-skewed
(the right tail is longer than the left tail)
• Skewness is less than zero, it is called left-skewed (the
left tail is longer than the right tail)
Formula of Skewness

Where:
S = standard deviation
X̄ = mean
Kurtosis
• A statistical number that tells us if a distribution is taller
or shorter than a normal distribution.

• If a distribution is similar to the normal distribution, the


Kurtosis value is 0.

• If Kurtosis is greater than 0, then it has a higher peak


compared to the normal distribution.

• If Kurtosis is less than 0, then it is flatter than a normal


distribution.
Flattest peak and Medium peaked Sharply peaked
highly dispersed. with fat tails, and
less variable.
Formula of Kurtosis
KEY DIFFERENCES

SKEWNESS KURTOSIS

The characteristic of a frequency Kurtosis means the relative pointedness of


distribution that ascertains its symmetry the standard bell curve, defined by the
about the mean is called skewness frequency distribution

Skewness is a measure of the degree of Kurtosis is a measure of degree of


lopsidedness in the frequency distribution tailedness in the frequency distribution

Skewness is an indicator of lack of Kurtosis is a measure of data, that is either


symmetry, i.e. both left and right sides of peaked or flat, with respect to the
the curve are unequal, with respect to the probability distribution
central point

Skewness shows how much and in which Kurtosis explain how tall and sharp the
direction, the values deviate from the mean central peak is
Sampling
Procedures
Atty. Juris Renier C. Mendoza
What is a sampling procedure?
A sampling procedure is a way of selecting a subset (sample) of individuals or
goods from a larger group (population) in order to study or learn about the
overall population's characteristics. It's like attempting to understand the
taste of a full soup from only a few spoonfuls while ensuring that those
spoonfuls accurately represent the overall flavor.

The most important aspect when selecting a sampling process is to


guarantee that the sample is representative, which means that it
appropriately reflects the characteristics of the entire population.
Main Categories of Sampling Procedures

PROBABILITY NON-PROBABILITY
SAMPLING SAMPLING
PROBABILITY SAMPLING
Every member of the population has a known and equal chance of being
selected for the sample. This is achieved through random selection
techniques, like drawing names from a hat or using computer-generated
random numbers. This type of sampling allows for generalizable results that
can be applied to the entire population with more confidence.
Common Types of Probability Sampling
● Simple random sampling
○ Each member of the population is assigned a unique number, and then a random
selection of numbers is chosen to represent the sample. This is the most basic and
straightforward type of probability sampling.
Common Types of Probability Sampling
● Systematic sampling
○ The population is ordered in some way, and then a random starting point is chosen.
Every nth individual from the list is then selected for the sample.
Common Types of Probability Sampling
● Stratified sampling
○ The population is divided into subgroups (strata) based on some shared
characteristic, and then a random sample is drawn from each stratum. This
method is useful when you want to ensure that your sample is representative
of the different subgroups in the population.
Common Types of Probability Sampling
● Cluster sampling
○ The population is divided into groups (clusters), and then a random sample of
clusters is chosen. All of the members from the chosen clusters are then included in
the sample. This method is useful when the population is geographically dispersed
or when it is difficult to identify individual members of the population.
NON-PROBABILITY SAMPLING
A method where the selection of the sample is not based on random chance.
This means that not every member of the population has an equal chance of
being selected. Non-probability sampling methods are often used when it is
difficult or impractical to obtain a random sample, or when the goal of the
research is not to generalize the results to the population as a whole.
Common Types of Non-Probability Sampling
● Convenience sampling
○ The sample is chosen from the population that is easiest to access. This
method is often used in pilot studies or exploratory research.
Common Types of Non-Probability Sampling
● Purposive sampling
○ The sample is chosen based on the researcher's judgment about who is
most likely to provide information relevant to the research question. This
method is often used in qualitative research.
Common Types of Non-Probability Sampling
● Snowball sampling
○ The sample is selected by starting with a few individuals who meet the
criteria for the study and then asking them to identify others who also
meet the criteria. This method is useful when the population is difficult to
identify or reach.
Common Types of Non-Probability Sampling
● Quota sampling
○ The researcher sets quotas for different subgroups of the population and
then selects a sample that meets those quotas. This method is useful
when the researcher wants to ensure that the sample is representative of
the population in terms of certain characteristics.
Key Probability Sampling Non-Probability Sampling
Probability sampling is a Non-probability sampling is a
sampling technique, in method of sampling wherein,
which the subjects of the it is not known that which
Meaning
population get an equal individual from the
opportunity to be selected population will be selected as
as a representative sample a sample.
Alternately known
Random Sampling Non-random sampling
as
Basis of selection Randomly Arbitrarily
Opportunity of
Fixed and known Not specified and unknown
Selection
Research Conclusive Exploratoy
Result Unbiased Biased
Method Objective Subjective
Inferences Statistical Analytical
Hypothesis Tested Generated
Summary
Probability sampling can be more expensive and time-consuming compared to
Non-probability sampling.

While probability sampling is based on the principle of randomization where


every entity gets a fair chance to be a part of the sample, non-probability
sampling relies on the assumption that the characteristics are evenly
distributed within the population, which make the sampler believe that any
sample so selected would represent the whole population and the results
drawn would be accurate.
QUESTIONS?

You might also like