0% found this document useful (0 votes)
4 views83 pages

1. Statistical Methods for Continuous Variables_Part One

This document provides an overview of statistical methods for analyzing continuous variables, focusing on comparisons between groups using t-tests and ANOVA. It outlines the learning objectives, general steps in data analysis, and detailed procedures for conducting one-sample, independent, and paired sample t-tests, as well as one-way ANOVA. The document emphasizes the importance of proper data analysis to derive meaningful insights from research data.

Uploaded by

zewdualega2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views83 pages

1. Statistical Methods for Continuous Variables_Part One

This document provides an overview of statistical methods for analyzing continuous variables, focusing on comparisons between groups using t-tests and ANOVA. It outlines the learning objectives, general steps in data analysis, and detailed procedures for conducting one-sample, independent, and paired sample t-tests, as well as one-way ANOVA. The document emphasizes the importance of proper data analysis to derive meaningful insights from research data.

Uploaded by

zewdualega2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 83

Statistical Methods for Continuous

Variables
Comparison of two or more than two
groups

Tarekegn Solomon (MPH, PhD)

1
Learning objectives
At the end of this session, participants should be
able:
• Define data analysis
• Become familiar with Statistical Methods for
Continuous Variables
• Explain one way ANOVA
Introduction

Data Analysis
• Turning raw data into useful information

– Purpose is to provide answers to questions


being asked at research questions

– Even the greatest amount and best quality


data mean nothing if not properly analyzed
Introduction…
• Analysis does not mean using computer software
package

• Analysis is looking at the data in light of the


questions you need to answer:
– How would you analyze data to determine, “Is
my program meeting its’ objectives?”
General Steps in Data Analysis
• Coding and Entry: Completeness, cleaning of data

• Visualizing: Graphical display, scatter plot, histogram,


charts

• Tables: Frequencies, outliers, concentrations

• Data Manipulation: Recoding, transforming, grouping,


restructuring

• Data Analysis: Descriptive, inferential, etc.


Introduction to comparison

Importance of comparison:
• Evaluate magnitudes of problems

• Identify risk factors

• Efficacy of interventions (test hypothesis)

• Program evaluations

6
Introduction to comparison…

Levels of comparison for continuous outcome


variable and nominal independent variable

• One sample with known or hypothetical mean value

• Two independent groups (parameters)

• Two related parameters

• Several independent groups (parameters)

7
A) Comparison using t-test
• There are a number of different types of t-tests:
i) One sample t-test
– Test hypothesis about single population mean

ii) Independent samples t-test (independent means


t-test)
– Compare mean scores of two different populations

iii) Paired sample T-Test


– Self-paired (before and after study)

8
One-Sample t-Test
• It a statistical test used to determine whether the mean
of a sample differs significantly from a known or
hypothesized population mean.

Examples
1. A researched wants to know if the average
haemoglobin level of study participants (sample) is
different from 11 g/dl.

2. A school wants to know if the average test score of


their students (sample) is different from the national
average of 75.
Assumptions for the One-Sample t-Test

• Independence of observations: Each data point in the


sample should be independent of the others.

• Normality: The sample data should be approximately


normally distributed. This assumption is especially
important for small sample sizes (n < 30).

• Scale of Measurement: The variable of interest must be


continuous (interval or ratio level data).
Test Statistic (t) for one-sample t- test
The formula for the t-statistic The formula for
of freedom(df)

df=n-1
df are used to determine the
critical t-value from the t-
distribution table.
Steps to Perform a One-Sample t-Test

Step 1: State the Hypotheses


• Formulate the null hypothesis and alternative hypothesis,
based on the research question or objective.

Step 2: Check Assumptions


• Ensure the sample data meets the assumptions of
normality and independence.

Step 3: Calculate the t-statistic


• Use the formula to calculate the t-statistic based on your
sample data.
Steps to Perform a One-Sample t-Test

Step 4: Find the critical t-value


• Using the df and the chosen significance level (α), find the critical t-
value from a t-distribution table or use statistical software.

Step 5: Compare the t-Statistic to the Critical t-Value


• If ∣t∣>t critical​, reject the null hypothesis.
• If ∣t∣≤t critical​, fail to reject the null hypothesis.

Step 6: Conclusion
• Interpret the results based on whether or not the Ho was rejected.

• Alternatively, you can use the p-value approach


• If the p-value is < α, reject the null hypothesis.
• If the p-value is > α, fail to reject the null hypothesis.
Performing t-tests using SPSS
1)Testing the hypothesized mean for a single
population (t-test):
Ex.: Using Hgb data from a randomly selected
sample of 400 people. The mean Hgb and
standard deviations were 12.97gm/dl and 1.58
gm/dl respectively. Test the hypothesis that the
population Hgb is not different from 11 gm/dl at
α=0.05
• Using SPSS software and a data file Data for
exercise1.sav, test this same hypothesis at the
same level of confidence.

14
SPSS…

Analyze →compare means →one sample T-test


drag the quantitative variable into the
‘test variables’ field → Type the
hypothesized mean in the ‘test value’ field
→ Ok

15
SPSS output for the above data:

16
2) The independent t-sample test
• Comparing the means of two independent groups to
determine if there is a statistically significant difference
between them.
E.g
o Comparing the average scores of students from two
different classes.
o Comparing the average weight loss after two different
diets/treatments.
o Comparing the average CD4 count after ART
between male and female.
Assumptions of the independent t-
sample test
1. Independence of Observations
• Observations in one group should not influence in the
other group.
2. Normality
• The data in each group should be normally distributed.
3. Homogeneity of Variances (Equal Variances)
• The variances of the two groups should be equal.
4. Scale of Measurement
• The dependent variable should be continuous
5. Random sampling
• The data randomly sampled from the population
Null and Alternative Hypotheses of the
independent t-sample test
• H₀: There is no significant difference between the means
of the two groups.
• H ₀: μ1=μ2 ​

• H₁: There is a significant difference between the means


of the two groups.
• H₁: μ1≠μ2,
• H₁: μ1< μ2,
• H₁: μ1>μ2
Test Statistic (t) for the independent t-
sample test
The formula for the t-statistic The formula for
of freedom(df)

The degrees of freedom are


used to determine the critical
value from the t-distribution
table.
Steps to Perform the Test for the
independent t-sample test
1. State the Hypotheses.
2. Check Assumptions: Ensure normality and
homogeneity of variance.
3. Calculate the t-statistic.
4. Find the Critical t-Value: Use the degrees of freedom
and the significance level (α=0.05).
5. Compare t-statistic to Critical Value:
o If ∣t∣ > t critical​, reject the null hypothesis
o If ∣t∣≤t critical​, fail to reject the null hypothesis.
Steps to Perform the Test for the
independent t-sample test…
P-Value
• Alternatively, instead of comparing the t-statistic to a
critical value, you can use the p-value to make a
decision:

• If the p-value is less than the chosen significance


level (α, typically 0.05), reject the null hypothesis.

• If the p-value is greater than α, fail to reject the null


hypothesis.
Interpretation the test for the
independent t-sample test

• If you reject the null hypothesis, it means there is a


statistically significant difference between the two group
means.

• If you fail to reject the null hypothesis, it means there is


no statistically significant difference between the two
group means.
Example

• Ex: Use; Data for exercise2.sav file, test


hypothesis that mean CD4 count at 36 months
for male and female is not different at α=0.05.

• Analyze → compare means → independent


samples T-test →drag the quantitative variable
into the ‘test variables’ field → move the
qualitative variable into the ‘grouping variable’
field → Ok

24
SPSS output for the above data:
Group Statistics
Sex N Mean Std. Std. Error
Deviation Mean
Male 79 429.71 255.224 28.715
CD4 count at
36 months Female 131 480.34 184.128 16.087

Independent Samples Test


Levene's
Test for t-test for Equality of Means
Equality of
Variances
F Sig. t df Sig. (2- Mean Std. Error 95% Confidence
tailed) Differenc Difference Interval of the
e Difference
Lower Upper
Equal
variances .036 .851 -1.664 208 .098 -50.635 30.424 -110.614 9.345
CD4 count at assumed
36 months Equal
variances not -1.538 127.134 .126 -50.635 32.914 -115.765 14.496
assumed

25
Result of Independent t-test:

• Mean CD4 count at 36 months of male and


female were compared using two independent
samples t-test. We found that the mean (±SD)
CD4 count at 36 months to male, [429.71
(±255.224)] and female,[480.34 (±184.128)] was
not significant different at α=0.05, t(df)= -1.664
(208),p=0.098
Paired Sample t-Test

• Measurements are taken at two distinct points in time

• The difference between two observations are used


rather than individual observations.

• Used when the same participants are measured twice


under different conditions or at different times.

• Compare the means of a group before and after a


treatment or intervention (e.g., pre-treatment and post-
treatment scores).
Paired Sample t-Test…
• The purpose of the paired sample t-test is to determine if
there is a statistically significant difference between the
means of two related groups.

Advantages:
• Controls erroneous factors
• Biological variability is eliminated
• Makes comparison more precisely

Example
• A researcher wants to know whether there is a difference
in the weights of individuals before and after a diet
program.The same individuals are weighed before and
after the program.
Assumptions for the Paired Sample
t-Test

1. Dependence between Pairs: Each observation in the


sample has two measurements (i.e., paired
observations)

2. Normality of Differences: The differences between the


paired observations should be approximately normally
distributed.

3. Interval or Ratio Data: The data should be continuous


and, on an interval, or ratio scale (i.e., measurements on
a scale that allows meaningful differences).
Null and Alternative Hypotheses for the
Paired Sample t-Test
• H₀: There is no significant difference between the
means of the paired groups.
H₀: μd=0

• H₁: There is a significant difference between the


means of the paired groups.

H1:μd≠0 (two-tailed)
OR
H1:μd>0 (one-tailed, positive difference)
OR
H1:μd<0 (one-tailed, negative difference)
The t-statistic in a paired sample t-test
The formula for the t-statistic The formula for
of freedom (df)

df=n-1
df are used to determine the
critical t-value from the t-
Where: distribution table.
• d: The mean of the differences between
the paired observations.
• 𝑠𝑑: The standard deviation of the
differences between the paired
observations.
• 𝑛: The number of pairs.
Steps to calculate the Paired Sample
t-Test
• State the hypotheses
• Check assumptions
• Calculate the differences between pairs
• Subtract each paired observation
• Calculate the mean of the differences 𝑑 and 𝑠𝑑, then
compute the t-statistic.
• Find the Critical t-Value using the df (df = n - 1) and the
significance level (𝛼),
• Compare the t-Statistic to the Critical
• Based on the comparison, interpret the result to decide
whether the difference is statistically significant or not
Example
• Use SPSS dataset “Data for exercise2.sav ”, test
hypothesis that the mean CD4 before and after
ART among HIV patients is not different at
α=0.05. Calculate the effect size.
Solution:
• Check assumptions
• Analyze → compare means → paired samples
T-test → move the two quantitative variable that
you are interested in comparing for each subject
into the box ‘paired variables’ → then, click Ok.

33
SPSS output for the above data:
Paired Samples Statistics
Mean N Std. Std. Error
Deviation Mean
CD4 count at 36
461.30 210 214.483 14.801
Pair months
1 CD4 count at
167.32 210 93.459 6.449
Baseline

Paired Samples Test


Paired Differences t df Sig. (2-
Mean Std. Std. 95% Confidence tailed)
Deviation Error Interval of the
Mean Difference
Lower Upper
CD4 count at
36 months -
Pair 1 293.971 207.444 14.315 265.751 322.192 20.536 209 0.000
CD4 count at
Baseline

34
Conclusion: Paired t-test result

• The mean (±SD) difference of CD4 count before


and after ART, was 293.97 (± 207.44).

• This difference was significant with t(df) =


20.54(209), p <0.001.

35
B) ANOVA (Analysis of variance)
• ANOVA is an inferential statistics technique used to
compare mean of a numerical/ Continuous
outcome variable in groups defined by exposure level
with two or more categories/groups
• Splits the total variance into component parts
(within and among groups).

Independent Dependent
Nominal Variable
(Experimental Grouping) Interval/ratio variable

• If you had only two groups to compare, ANOVA would


give the same answer as an independent samples t-
test.

38
Why ANOVA?
• When the group to be compared are more than two, using
‘t’ test is unreliable and is tedious (nC2 comparisons)
𝒏 𝒏!
nC k = =
𝒌 𝒌! 𝒏−𝒌 !
Let say n=6 and k=2, what is possible combination?
In this case:
• n = 6 (total number of groups) e.g 𝐴, 𝐵, 𝐶, 𝐷, 𝐸, 𝐹
• k = 2 (number of groups you’re choosing),

• Conducting 15 t-tests for a variable/factor with 6


category is tedious/gives unreliable test results!
• ANOVA overcome such problem
• ANOVA is an extension of the two-sample ‘t’ test
One-way ANOVA…
• If there is only one factor we call it one-way ANOVA

• One-Way ANOVA is a statistical test used to compare the


means of three or more independent groups or categories
defined by an exposure level

• The method is based on assessing how much of overall


variation in outcome is attributable to differences between
exposure group means

• ANOVA determines if a significant difference exists between


any of the groups means, but does not identify the group or
groups that differ.

• It compares between-group variance and within-group


variance.
One Way ANOVA…
• Between-group variance: Variability due to the
differences between the groups.

• Within-group variance: Variability due to differences


within each group.

• For one factor with k groups, we use the following


definitions:
One-way ANOVA…
One Way ANOVA…
One-way ANOVA…
– The within-groups mean square is also called the
error mean square (MSE); the degrees of
freedom is error degrees of freedom and the SS
is error sum of squares (SSE).
– Between Groups Variance – measures how
groups vary about the Grand mean.
– Within Groups Variance – measures how scores
in each group vary about the group mean.

The total degrees of freedom is =n-1


One Way ANOVA…
Hypotheses in One-Way ANOVA

• Null Hypothesis (H₀): All group means are equal (no


significant difference between groups).

• Alternative Hypothesis (H₁): At least one group


mean is different from the others.
Assumptions in one-way ANOVA
1. Independence of observations
– Each observation must be independent.
2. Normality (Data are parametric)
– Data within each group should be approximately
normally distributed. (Histogram, P-P plot, Shapiro-Wilk test)
3. Homogeneity of variances (Homoscedasticity)
– The variance within each group should be
approximately equal.(Levene's test, Bartlett's test).
4. Random sampling
– The samples within each group should be randomly
selected from the population
5. The factor (Independent Variable) is categorical
The ten steps in conducting One-
Way ANOVA
1. Description of data
– Describing data & display using tabular form

2. Check assumptions

3. State the hypotheses

4. Choose the significance level


– α = 0.05 (5% significance level)
The ten steps …
5. Specify and ccalculate the ANOVA test statistic
– Compute variance ratio (V.R) test or F-test
– V.R. is distributed as F distribution when H0 is true
and the assumptions are met

6. Find the critical value

7. Make a statistical decision


• Compare computed V.R. with critical value of F, with k - 1
numerator df and N - k denominator df
• If computed V.R. ≥ critical value of F: Reject the H0
• If computed V.R. < critical value of F:Do not reject H0
The ten steps …

8. Calculate the p-value

9. Conclusion
• When we reject H0, we conclude that not all population
means are equal
• When we fail to reject H0, we conclude that the
population means are not significantly different from
each other

10. Post-hoc analysis (if needed)


Effect size in One-Way ANOVA
• It measures the strength or magnitude of the difference between groups
• The p-value tells only whether the observed differences between
groups are statistically significant
• But effect size tells how much of the variance in the data is explained
by the independent variable (factor) being tested. Therefore facilitate
– comparison of the strength of findings across different studies
– A large effect size suggests that the independent variable has a meaningful
impact on the dependent variable
• It tells how meaningful or practically significant those differences
• Measures of Effect Size in One-Way ANOVA:
1. Eta Squared (η²) (most commonly used method)
2. Partial Eta Squared (η²ₚ)
3. Omega Squared (ω²)
Eta Squared (η²)

• Commonly used measure of effect size in ANOVA


• It represents the proportion of the total variance in the
dependent variable that is attributable to the factor
(independent variable) being tested.
Sum of Squares Between Groups (SSB)
• η2=
Total Sum of Squares (SST)
– SSB is measures variance due to the factor
– SST is measures total variance in the data
• Interpretation:
– Small Effect: η² = 0.01 or 1%
– Large Effect: η² ≥ 0.15 or 15%
Example Effect Size
• Suppose you are testing the effect of three different diets
on weight loss in three different groups. After conducting
One-Way ANOVA, you find that the p-value is 0.002,
indicating that there is a statistically significant difference
in the weight loss across the three diet groups.

• However, if the eta squared (η²) is 0.02, this suggests


that only 2% of the total variance in weight loss is
explained by the type of diet, which indicates a small
effect size. Despite the significant p-value, the practical
importance of the diet type might be limited.
One Way ANOVA…
E.g. Height in inches of children with four different
groups
Group 1 Group 2 Group 3 Group 4
60 50 48 47
67 52 49 67
42 43 50 54
67 67 55 67
56 67 56 68
62 59 61 65
64 67 61 65
59 64 60 56
72 63 59 60
71 65 64 65
One way ANOVA: Example

Calculate the sum of squares between groups:


▪ Mean for group 1 = 62.0
▪ Mean for group 2 = 59.7
▪ Mean for group 3 = 56.3
▪ Mean for group 4 = 61.4
▪ Grand mean= 59.85
• SSB = (62-59.85)2 + (59 7-[(59.85) 59.7-59.85)2 +
(56.3-59.85)2 + (61.4-59.85)2 ]x10
= 19.65x10 = 196.5
Example….

• Calculate the sum of squares within groups:


(60-62)2 +(67-62) 2+….
• (sum of 40 squared deviations) = 2060.6
Example…
One way ANOVA: Example
SPSS output
ANOVA Table Breakdown
ANOVA Table Breakdown…
ANOVA Table Breakdown…
ANOVA Table Breakdown…
ANOVA Table Breakdown…
Example 2: The following table shows mean Hgb levels of
patients according to type of sickle cell disease

Type of sickle No. of Haemoglobin (g/dl)


cell disease patients
Mean SD Individual values (x)
( ni )
( xi ) ( si )
16 8.7125 0.8445 7.2, 7.7, 8.0, 8.1, 8.3, 8.4,
8.4, 8.5, 8.6, 8.7, 9.1, 9.1,
Hb SS 9.1, 9.8, 10.1, 10.3

 10 10.6300 1.2841 8.1, 9.2, 10.0, 10.4, 10.6,


Hb S/ -
10.9, 11.1, 11.9, 12.0, 12.1
thalassaemia
15 12.3000 0.9419 10.7, 11.3, 11.5, 11.6,
11.7, 11.8, 12.0, 12.1,
Hb SC 12.3, 12.6, 12.6, 13.3,
13.3, 13.8, 13.9
67
Example...

n =  ni = 16 + 10 + 15 = 41, number of groups (k ) = 3


 x = 7.2 + 7.7 + ... + 13.8 + 13.9 = 430.2
 x = 7.2 + 7.7 + ... + 13.8 + 13.9 = 4651.8
2 2 2 2 2

( x ) 2
2
Total : SS =  (x − x ) =  x −
430.2
= 4651.8 − = 137.85
2 2

n 41
(ni xi ) 2 ( x) 2  (16  8.7125) 2 (10  10.63) 2 (15  12.3) 2  430.2 2
K
Between groups : SS =  − = + + − = 99.89
i ni n  16 10 15  41
df = K − 1 = 2
OR
( x ) 2
430.2 2
Between groups : SS =  ni xi − = 16  8.7125 + 10  10.63 + 15  12.3 − = 99.89
2 2 2 2

n 41
df = K − 1 = 2
Within groups : SS =  (ni − 1)S i = 15  0.8445 2 + 9  1.28412 + 14  0.9419 2 = 37.96
2

df = n − K = 41 − 3 = 38

68
Example...

Source of SS Between - goups MS


variation SS df MS = F=
df
Within - groups MS
Between 99.89 2 49.94
groups 49.94
Within 37.96 38 1.00 P<0.001
groups
Total 137.85 40

F = 49.94/1:00 = 49:9 with degrees of freedom (2,38): the


corresponding P-value is <0.001
69
One-way ANOVA with SPSS

• Open data file: Data for exercise1.sav.


• From Analyze, click on Compare Means, then on
One-way ANOVA
• Move variable Haemoglobin in g/dl into the box
marked Dependent list
(this variable should continuous variable)
• Click on your independent WHO stage into the box
marked Factor

70
One-way ANOVA with SPSS

• Click on options button and click on Descriptive,


Homogeneity of variance test, Brown-Forsythe and
Welsh

• Click on the button marked Post Hoc. Click on Dunnett C.

• Click on Continue and then Ok.

71
SPSS outputs:
Details of descriptive statistics such as number of respondents, mean,
SD, minimum, maximum values in each group and the overall total

72
If this is < 0.05,
equal variance is
not assumed. So
the ANOVA table
will be used Equal
variance not
assumed tests

The “Test of Homogeneity of Variances” table provides


Levene’s test for testing the H0 that within-group variances
are constant across groups

73
This is reported
as P=0.015
There is a
significant
difference
among the
four types of
WHO staging.
So it is
appropriate to
proceed to a
posthoc (a
posteriori) test.

Our test shows there is a significant difference among four types of


WHO staging but do not tell us on which populations differed from
other .
We therefore proceed to a posthoc (a posteriori) test to find out where
the difference is.

74
Post Hoc Tests

75
Post hoc Or Multiple Comparisons
• Are comparisons of group means made after data have
been collected.
• They do not assume any prior hypotheses.
• Most are pairwise comparisons, meaning they compare
all pairs of means, to determine if they are significantly
different.
Most frequently used ones are:
• Bonferroni’s t-method
• Tukey’s HSD (Honestly Significant Difference)

76
Conclusion

• The mean haemoglobin levels significantly differ


between patients with different type of WHO
staging.
(the mean being lowest for patients with WHO
stage IV patients, and highest for patients with
WHO stage II)

77
Application of ANOVA
• Applicable to quantitative variables
• Number of groups to be compared is more than two.
• More appropriate when randomized experimental design is
employed.
• But, can also be used for observational designs and
surveys.
• When ANOVA is significant and Ho is rejected, then it is
important to do a post hoc test ( pair-wise comparison).
• When assumptions required for ANOVA are not met, a non-
parametric test equivalent to ANOVA (Kruskal Wallis test) is
performed.
Corresponding non-parametric tests for
each of the parametric tests
Parametric Test Non-Parametric Test
One sample t-test Wilcoxon Signed-Rank Test
Mann-Whitney U Test (or
Independent samples t-test
Wilcoxon rank-sum test)
Wilcoxon Signed-Rank Test
Paired sample T-Test
(for paired data)
One way ANOVA Kruskal-Wallis H Test

You might also like