طرق متقدمة في الإحصاء الحيوي بواسطة SPSS
طرق متقدمة في الإحصاء الحيوي بواسطة SPSS
in Biostatistics
By SPSS
eKutub
2
ACKNOWLEDGEMENTS
3
4
Contents
ACKNOWLEDGEMENTS ................................3
Introduction ........................................... 7
General Objectives of the Book .................8
Chapter 1: Chart Presentation ................. 9
SPSS statistics program ......................... 10
Creating a new data file ......................... 11
Descriptive and inferential statistics ......... 15
A- Categorical variable : Chart presentation
(Bar chart) ........................................... 18
B-Numerical variable.............................. 25
C-Numerical variable: Chart presentation . 29
Chapter 2: Parametric Statistics ........... 38
Parametric tests .................................... 39
Independent T test and paired t test ........ 39
2-Independent t test .............................. 39
2- Paired t test ...................................... 48
3- One -Way ANOVA .............................. 56
Chapter 3: Non Parametric Statistics .... 75
Non Parametric tests.............................. 76
1-Mann-Whitney test ............................. 76
2-Wilcoxon Signed-Ranks Test ................ 83
5
3-Kruskal–Wallis test ............................. 88
Post Hoc Analysis .................................. 95
1- Chi-Square Test ................................ 98
2- Fisher's Exact Test ........................... 103
Chapter 4: Correlation and Regression 111
1- Correlation ..................................... 112
2- Linear Regression ............................ 121
3- Multiple Logistic Regression .............. 167
References ......................................... 191
6
Introduction
Recently, with the advancement of statistical methods in
applied scientific research, there is need for a simplified
statistical material and easily available among researchers
and graduate students.
7
General Objectives of the Book
Upon completion of this book, the health students will be
able to: -
8
Chapter 1
Chart Presentation
9
SPSS
Statistics Advance Biostatistics
10
SPSS
Statistics Advance Biostatistics
11
SPSS
Statistics Advance Biostatistics
12
SPSS
Statistics Advance Biostatistics
13
SPSS
Statistics Advance Biostatistics
14
SPSS
Statistics Advance Biostatistics
16
SPSS
Statistics Advance Biostatistics
Blood Pressure
Valid
Valid Frequency Percent
Percent
Normal 112 58.9 58.9
Hypertension 58 30.5 30.5
Hypotension 20 10.5 10.5
Total 190 100.0 100.0
17
SPSS
Statistics Advance Biostatistics
18
SPSS
Statistics Advance Biostatistics
19
SPSS
Statistics Advance Biostatistics
20
SPSS
Statistics Advance Biostatistics
21
SPSS
Statistics Advance Biostatistics
22
SPSS
Statistics Advance Biostatistics
23
SPSS
Statistics Advance Biostatistics
24
SPSS
Statistics Advance Biostatistics
B-Numerical variable
i- Measurement of central tendency
Central tendency: an average value of any distribution of
data that best represents the middle. Also called centrality.
26
SPSS
Statistics Advance Biostatistics
Variance:
S2 = 30.92
Standard deviation
S = 5.56
27
SPSS
Statistics Advance Biostatistics
IQR
28
SPSS
Statistics Advance Biostatistics
29
SPSS
Statistics Advance Biostatistics
30
SPSS
Statistics Advance Biostatistics
31
SPSS
Statistics Advance Biostatistics
32
SPSS
Statistics Advance Biostatistics
33
SPSS
Statistics Advance Biostatistics
Stem-and-Leaf Plot
F Frequency Stem & Leaf
34
SPSS
Statistics Advance Biostatistics
Box plot
36
SPSS
Statistics Advance Biostatistics
37
SPSS
Statistics Advance Biostatistics
Chapter 2
Parametric Statistics
38
SPSS
Statistics Advance Biostatistics
Parametric tests
Independent T test and paired t test
T tests are used to compare the differences in means
between two groups. There are three types of t test:
1- Independent t test: it is used to compare between
two means of independent samples.
2- Paired t test: it is used to compare between two
means of dependent samples.
3- One sample t test: it is used to determine if the
mean is statistically different from a specific value.
2-Independent t test
The independent t test is used to determine whether
there is a statistically significant difference between
the means of a numerical variable in two unrelated
groups or to compare between the means of two
independent groups.
The dependent variable should be numerical and
the independent variable or factor variable should
be categorical that has two levels.
39
SPSS
Statistics Advance Biostatistics
Assumptions
1- The dependent variable is normally distributed
within each group.
2- The variances of the two groups are measured
to be equal.
3- The two groups are independent to each other.
4- Random sample.
Steps of analysis
1- Hypothesis and question statement
Null hypothesis: There is no mean difference of Quality of
Life (QoL) between males and females in the Gaza Strip.
Research question: Is the mean of QoL in the Gaza Strip
different between males and females.
2- Checking assumptions
i) Random sample (By study design and sampling
method)
ii) The two groups are independent to each other (By study
design)
iii) Normality assumption: since the sample size more than
30 for each group, consider Central Limit Theorem.
40
SPSS
Statistics Advance Biostatistics
41
SPSS
Statistics Advance Biostatistics
42
SPSS
Statistics Advance Biostatistics
43
SPSS
Statistics Advance Biostatistics
44
SPSS
Statistics Advance Biostatistics
45
SPSS
Statistics Advance Biostatistics
Group Statistics
Std. Std. Error
Gender N Mean
Deviation Mean
Female 179 100.3687 21.563 1.611
Total
Male 192 92.7708 18.893 1.363
46
SPSS
Statistics Advance Biostatistics
47
SPSS
Statistics Advance Biostatistics
5- Result presentation
t statistics
Variable Mean (SD) *p value
(df)
Gender Males Females
92.77 100.36 3.616
<0.001
(18.8) (21.5) (369)
*Independent t test
2- Paired t test
The paired t test is used when the data is from only one
group of samples with two time measurement. The test
explore the difference between two dependent or paired
sample means such as:
1- Matched samples.
2- Two observations in same subject (before and after
intervention).
3- Closely related subjects.
Assumptions
1. Test variables are numerical.
2. The measurements are dependent or paired.
3. Random sample.
48
SPSS
Statistics Advance Biostatistics
49
SPSS
Statistics Advance Biostatistics
50
SPSS
Statistics Advance Biostatistics
51
SPSS
Statistics Advance Biostatistics
Steps of analysis
1. Hypothesis and question statement
Null hypothesis: There is no mean difference in body weight
of pre and post.
Research question: Is the mean of body weight different
between pre and post tests.
2. Checking assumptions
3. Perform paired t test in SPSS
1- Analyze
2- Compare means
3- paired-Sample T Test
4- Transfer pre test to Variable1 and post test to
Variable2.
5- Click OK.
52
SPSS
Statistics Advance Biostatistics
53
SPSS
Statistics Advance Biostatistics
The mean body weight score between pre and post showed
p value of t statistic for the difference is <0.001.
The mean (SD) body weight after intervention is less than
pre-intervention [80.40 (8.4) vs 84.6 (8.9)]
54
SPSS
Statistics Advance Biostatistics
5- Result presentation
Table: change of body weight before and after
intervention
Pre Post
t
score score
Variable statistics *p value
mean mean
(df)
(SD) (SD)
*Paired t test
55
SPSS
Statistics Advance Biostatistics
Steps of analysis
1-Hypothesis and question statement
Null hypothesis: There is no mean difference of Quality of
Life (QoL) among the five areas in the Gaza Strip.
Research question: are the means of QoL in the Gaza Strip
different among the five areas in the Gaza Strip.
2- Checking assumptions
i) Random sample (By study design and sampling
method)
56
SPSS
Statistics Advance Biostatistics
57
SPSS
Statistics Advance Biostatistics
58
SPSS
Statistics Advance Biostatistics
59
SPSS
Statistics Advance Biostatistics
Descriptive
Total QoL
95%
Confidence
Std.
Std. Interval for Minimu Maximu
N Mean Deviatio
Error Mean m m
n
Lower Upper
Bound Bound
North
95.811 21.8600 2.6316 90.560 101.062
of 69 67.00 150.00
6 3 4 2 9
Gaza
Gaza 101.13 18.1029 2.0765 96.994 105.268
76 70.00 150.00
Town 1 2 5 9 3
Middl
90.466 17.5193 2.0229 86.435
e 75 94.4975 49.00 121.00
7 6 6 8
Areas
Ghan 96.162 14.8139 1.7220 92.730
74 99.5943 50.00 127.00
Yunis 2 4 9 1
98.441 27.0951 3.0877 92.291 104.591
Rafah 77 30.00 150.00
6 0 7 7 4
37 96.436 20.5525 1.0670 94.338
Total 98.5349 30.00 150.00
1 7 9 4 4
60
SPSS
Statistics Advance Biostatistics
ANOVA
Total QoL
Sum of Mean
df F Sig.
Squares Square
Between
4690.319 4 1172.580 2.831 0.025
Groups
Within Groups 151600.943 366 414.210
Total 156291.261 370
61
SPSS
Statistics Advance Biostatistics
62
SPSS
Statistics Advance Biostatistics
63
SPSS
Statistics Advance Biostatistics
Multiple Comparisons
Dependent Variable: Total QoL
Dunnett C
95% Confidence
Mean
(I) (J) Std. Interval
Difference
Residence Residence Error Lower Upper
(I-J)
Bound Bound
Gaza Town -5.31998- 3.35225 -14.7050- 4.0650
Middle
North of 5.34493 3.31932 -3.9493- 14.6392
Areas
Gaza
Khan Yunis -.35057- 3.14501 -9.1590- 8.4579
Rafah -2.62996- 4.05708 -13.9805- 8.7205
North of
5.31998 3.35225 -4.0650- 14.7050
Gaza
Gaza Middle
10.66491* 2.89904 2.5601 18.7698
Town Areas
Khan Yunis 4.96942 2.69771 -2.5734- 12.5123
Rafah 2.69002 3.72107 -7.7090- 13.0890
North of
-5.34493- 3.31932 -14.6392- 3.9493
Gaza
Middle
Gaza Town -10.66491-* 2.89904 -18.7698- -2.5601-
Areas
Khan Yunis -5.69550- 2.65668 -13.1251- 1.7341
Rafah -7.97489- 3.69144 -18.2921- 2.3423
North of
.35057 3.14501 -8.4579- 9.1590
Gaza
Khan Gaza Town -4.96942- 2.69771 -12.5123- 2.5734
Yunis Middle
5.69550 2.65668 -1.7341- 13.1251
Areas
Rafah -2.27940- 3.53552 -12.1612- 7.6024
North of
2.62996 4.05708 -8.7205- 13.9805
Gaza
Gaza Town -2.69002- 3.72107 -13.0890- 7.7090
Rafah
Middle
7.97489 3.69144 -2.3423- 18.2921
Areas
Khan Yunis 2.27940 3.53552 -7.6024- 12.1612
*. The mean difference is significant at the 0.05 level.
64
SPSS
Statistics Advance Biostatistics
6- Results presentation
Table: Mean difference of QoL among the five areas in
the Gaza Strip
Residential
Mean (SD) t statistics (df) p value
areas
North of Gaza 95.81 (21.86)
Gaza Town 101.13 (18.10)
Middle Areas 90.46 (17.51) 2.831 (2, 366) 0.025
Ghan Yunis 96.16 (14.81)
Rafah 95.81 (27.09)
Post hoc analysis: the mean difference of QoL is significant
between Gaza Town and Middle Areas only.
4- Multi-Factorial ANOVA
Multi-Factorial ANOVA is used to define the effects of
more than two independent variables on numerical variable.
A factorial ANOVA compares means across two or more
65
SPSS
Statistics Advance Biostatistics
Assumptions
6. Dependent variable is numerical.
7. The groups are independent of each other.
8. Random sample.
9. Homogeneity of variances, the groups should come
from population
with equal variances.
10. The observation differences of each group are
normally distributed, if sample size is > 30, apply
Central Limit Theorem.
Steps of analysis
1-Hypothesis and question statement
Null hypothesis: There are no significant effects of variables
(Gender, Citizenship, Residency, Income and Age) on
stress among children in the Gaza Strip.
Research question: are there significant effects of Gender,
Citizenship, Residency, Income and Age variables on stress
of children in the Gaza Strip.
66
SPSS
Statistics Advance Biostatistics
2- Checking assumptions
iv) Random sample (By study design and sampling
method).
v) The groups are independent to each other (By study
design).
vi) Normality assumption: since the sample size more
than 30 for each group, consider Central Limit
Theorem. But if checking normality is needed.
67
SPSS
Statistics Advance Biostatistics
68
SPSS
Statistics Advance Biostatistics
69
SPSS
Statistics Advance Biostatistics
70
SPSS
Statistics Advance Biostatistics
N Std.
Variable Mean Std. Error
Deviation
71
SPSS
Statistics Advance Biostatistics
72
SPSS
Statistics Advance Biostatistics
Result presentation
F Statistics P value
Adjusted Mean (df)
Factors
(95%CI)
73
SPSS
Statistics Advance Biostatistics
74
SPSS
Statistics Advance Biostatistics
Chapter 3
Non Parametric
Statistics
75
SPSS
Statistics Advance Biostatistics
1-Mann-Whitney test
This test is used to compare the median of a numerical
variable of two independent samples, it is equivalent to a
two independent samples t test. The test is based on ranks
of observation.
Assumptions
1. Random sample.
2. Independent sample.
3. The dependent variable is a numerical.
Steps of analysis
1. Hypothesis and question statement
76
SPSS
Statistics Advance Biostatistics
2- Checking assumptions
i) Random sample (By study design and sampling
method)
ii) The two groups are independent to each other (By
study design)
iii) Normality assumption: since the sample size less
than 30 for each group, consider Central Limit
Theorem. But if checking normality is needed, the
steps are as follows:
1- Analyze
2- Descriptive Statistics
3- Explore
4- In the box of explore, transfer dependent
variable (body weight) to the dependent list and
independent variable (gender) to factor list.
5- In the box of explore select Both.
77
SPSS
Statistics Advance Biostatistics
78
SPSS
Statistics Advance Biostatistics
Descriptive
Std.
Gender Variable Statistic
Error
Mean 76.5455 1.98776
95% Lower
72.4117
Confidence Bound
Interval for Upper
80.6792
Mean Bound
5% Trimmed Mean 76.3333
Median 75.0000
Male Variance 86.926
Std. Deviation 9.32343
Minimum 62.00
Maximum 95.00
Range 33.00
Interquartile Range 17.25
Skewness .478 .491
Body Kurtosis -.877- .953
Weight Mean 81.1667 1.80459
95% Lower
77.3593
Confidence Bound
Interval for Upper
84.9740
Mean Bound
5% Trimmed Mean 81.1296
Median 78.0000
Female Variance 58.618
Std. Deviation 7.65622
Minimum 68.00
Maximum 95.00
Range 27.00
Interquartile Range 10.25
Skewness .322 .536
Kurtosis -.484- 1.038
79
SPSS
Statistics Advance Biostatistics
1- Analyze
2- Nonparametric test
3- Legacy Dialogs
4- 2-independent Samples
5- In the box of Two-Independent-Samples Tests, transfer
body weight to Test Variable List and Gender to
Grouping Variable. Select Mann-Whitney U.
6- After transferring Gender to Grouping Variable, click
on Define Groups, in the box of Two Independent Sa…..
inter code 1 of males to Group 1 and code 2 to Group 2.
7- Click Continue, then OK.
80
SPSS
Statistics Advance Biostatistics
81
SPSS
Statistics Advance Biostatistics
Ranks
Gender Mean Sum of
N
Variable Rank Ranks
Body Male 22 17.68 389.00
Weight Female 18 23.94 431.00
Total 40
82
SPSS
Statistics Advance Biostatistics
Test Statistics a
Body Weight
Mann-Whitney U 136.000
Wilcoxon W 389.000
Z -1.692-
Asymp. Sig. (2-tailed) .091
Exact Sig. [2*(1-tailed
.095b
Sig.)]
a. Grouping Variable: Gender Variable
b. Not corrected for ties.
5- Result presentation
Table: The median of body weight of males and females
Variable Meadian (IQR) Z statistics *p value
Gender Males Females
75.00 78.00 -1.692 0.091
(17.25) (10.25)
* Mann-Whitney test
Assumptions
1- The two groups are dependent.
2- The variables are continuous variable.
3- Random samples.
83
SPSS
Statistics Advance Biostatistics
Steps of analysis
1. Hypothesis and question statement
Null hypothesis: There is no median difference in body
weight in pre and post tests.
Research question: Is the median of body weight
different in pre and post tests?.
2. Checking assumptions
i) Random sample (By study design and sampling
method)
ii) The two groups are dependent to each other (By
study design)
iii) Normality assumption: since the sample size
less than 30 for each group, consider Central
Limit Theorem.
84
SPSS
Statistics Advance Biostatistics
85
SPSS
Statistics Advance Biostatistics
86
SPSS
Statistics Advance Biostatistics
Ranks
N Mean Sum of
Rank Ranks
Post Test - Pre Negative 24a 12.50 300.00
Test Ranks
Positive 0b .00 .00
Ranks
Ties 1c
Total 25
a. Post Test < Pre Test
b. Post Test > Pre Test
c. Post Test = Pre Test
87
SPSS
Statistics Advance Biostatistics
Test Statisticsa
Post Test - Pre Test
Z -4.297-b
Asymp. Sig. (2-tailed) .000
a. Wilcoxon Signed Ranks Test
b. Based on positive ranks.
5- Result presentation
Table: The median of body weight of pre and post tests
Variable Meadian (IQR) Z statistics *p value
Pre Post
83.00 78.00 -4.297 <0.001
(78,90) (75.85)
* Wilcoxon Signed Ranks Test
3-Kruskal–Wallis test
It is the non-parametric analogue of a one-way ANOVA
that compares the medians of three or more groups.
Assumptions
1. The groups are independent.
2. The dependent variable is continuous variable.
3. Random samples.
88
SPSS
Statistics Advance Biostatistics
Steps of analysis
2. Checking assumptions
i) Random sample (By study design and sampling
method)
ii) The groups are independent to each other (By
study design)
iii) Normality assumption: checking normality is
needed, the steps are as follows:
1- Analyze
2- Descriptive Statistics
3- Explore
4- In the box of explore, transfer dependent
variable (Total QoL) to the Dependent List
and independent variable (Residence) to
Factor List.
5- Click on Plots and choose Histogram from
the new box.
6- Click OK
89
SPSS
Statistics Advance Biostatistics
90
SPSS
Statistics Advance Biostatistics
Descriptive
Std.
Residence Statistic
Error
Mean 98.4800 3.42789
Lower
91.5914
95% Confidence Bound
Interval for Mean Upper
105.3686
Bound
5% Trimmed Mean 97.1333
Median 89.5000
North of Variance 587.520
Total
Gaza
QoL Std. Deviation 24.23881
Minimum 70.00
Maximum 150.00
Range 80.00
Interquartile Range 21.25
Skewness 1.252 .337
Kurtosis .223 .662
Mean 103.2200 2.96968
91
SPSS
Statistics Advance Biostatistics
Lower
97.2522
95% Confidence Bound
Interval for Mean Upper
109.1878
Bound
5% Trimmed Mean 102.4000
Median 100.5000
Gaza Variance 440.951
Town Std. Deviation 20.99882
Minimum 70.00
Maximum 150.00
Range 80.00
Interquartile Range 20.50
Skewness .890 .337
Kurtosis .632 .662
Mean 88.3800 2.62700
Lower
83.1008
95% Confidence Bound
Interval for Mean Upper
93.6592
Bound
5% Trimmed Mean 88.6556
Median 87.5000
Middle Variance 345.057
Areas
Std. Deviation 18.57570
Minimum 49.00
Maximum 121.00
Range 72.00
Interquartile Range 29.25
Skewness -.201- .337
Kurtosis -.594- .662
92
SPSS
Statistics Advance Biostatistics
93
SPSS
Statistics Advance Biostatistics
94
SPSS
Statistics Advance Biostatistics
Ranks
Residence N Mean Rank
North of Gaza 50 72.86
Gaza Town 50 91.31
Total QoL
Middle Areas 50 62.33
Total 150
95
SPSS
Statistics Advance Biostatistics
Test Statisticsa
Total QoL
Mann-Whitney U 922.000
Wilcoxon W 2197.000
Z -2.262-
Asymp. Sig. (2-tailed) .024
a. Grouping Variable: Residence
Test Statisticsa
Total QoL
Mann-Whitney U 1054.000
Wilcoxon W 2329.000
Z -1.352-
Asymp. Sig. (2-tailed) .176
a. Grouping Variable: Residence
5- Result presentation
Table: The median of QoL according to residence in the
Gaza Strip
Meadian Z statistics *p value
Residence
(IQR)
North of Gaza 89.5 (21.25)
Gaza town 100.5 (20.50 11.408 (2) 0.003
Middle Areas 87.5 (29.25)
97
SPSS
Statistics Advance Biostatistics
1- Chi-Square Test
The chi-square test is a statistical test used to examine
differences with categorical variables.
Assumptions
1. The groups are independent of each other.
2. Random sample.
3. Expected frequency of less than 5 is less than 20%of
the cells.
Steps of analysis
1- Hypothesis and question statement
Null hypothesis: Gender is not associated with
satisfaction level of QoL.
Research question: Is Gender associated with
satisfaction level of QoL?
2- Checking assumptions
i) Random sample (By study design and
sampling method)
ii) The groups are independent to each
other (By study design)
iii) The expected frequency is checked
through doing the analysis.
98
SPSS
Statistics Advance Biostatistics
99
SPSS
Statistics Advance Biostatistics
100
SPSS
Statistics Advance Biostatistics
101
SPSS
Statistics Advance Biostatistics
Chi-Square Tests
Asymp.
Exact Sig. Exact Sig.
Value df Sig. (2-
(2-sided) (1-sided)
sided)
Pearson Chi-
69.191a 1 .000
Square
Continuity
67.458 1 .000
Correctionb
Likelihood Ratio 71.973 1 .000
Fisher's Exact Test .000 .000
Linear-by-Linear
69.004 1 .000
Association
N of Valid Cases 371
a. 0 cells (0.0%) have expected count less than 5. The minimum
expected count is 77.68.
b. Computed only for a 2x2 table
5- Result presentation
Table: Association between gender and satisfaction level
of QoL
Variable Satisfaction level of X2 (df) p value
QoL
Gender Satisfied Unsatisfied
Male 69(35.9) 38(21.2) 69.191 < 0.001
Female 141(78.8) 123(64.1)
*Pearson Chi-Square
102
SPSS
Statistics Advance Biostatistics
Assumptions
1. The groups are independent of each other.
2. Random sample.
3. Expected frequency of less than 5 is more than
20% of the cells.
Steps of analysis
1- Hypothesis and question statement
Null hypothesis: Gender variable is not associated with
Anemia.
2- Checking assumptions
i) Random sample (By study design and sampling
method)
103
SPSS
Statistics Advance Biostatistics
104
SPSS
Statistics Advance Biostatistics
105
SPSS
Statistics Advance Biostatistics
106
SPSS
Statistics Advance Biostatistics
107
SPSS
Statistics Advance Biostatistics
Chi-Square Tests
Asymp. Exact Exact
Value df Sig. (2- Sig. (2- Sig. (1-
sided) sided) sided)
Pearson Chi-
8.571a 1 .003
Square
Continuity
5.658 1 .017
Correctionb
Likelihood Ratio 10.720 1 .001
Fisher's Exact
.007 .007
Test
Linear-by-Linear
8.000 1 .005
Association
N of Valid Cases 15
a. 3 cells (75.0%) have expected count less than 5. The minimum
expected count is 2.33.
b. Computed only for a 2x2 table
108
SPSS
Statistics Advance Biostatistics
5- Result presentation
Table: Association between gender variable and Anemia
Variable Anemia p value
Gender Anemia Non Anemia
Male 2(20.0) 8(80.0)
0.007
Female 5(100.0) 0(0)
*Fisher's Exact Test
109
SPSS
Statistics Advance Biostatistics
110
SPSS
Statistics Advance Biostatistics
Chapter 4
Correlation and
Regression
111
SPSS
Statistics Advance Biostatistics
1- Correlation
Correlation is used to determine the association between
two numerical variables. The measure of association in
correlation model indicates measure of strength.
Correlation can vary from +1 to -1. Values close to +1
indicate a high-degree of positive correlation, and values
close to -1 indicate a high degree of negative correlation.
Values close to zero indicate poor correlation of either kind,
and 0 indicates no correlation at all.
Assumptions
1. The data is independent of each other.
2. Random sample.
3. The variables are normally distributed.
4. The relationship between the two variables is
linear.
Steps of analysis
1- Hypothesis and question statement
Null hypothesis: There is no correlation between
physical health of QoL in the Gaza Strip and
psychological health.
112
SPSS
Statistics Advance Biostatistics
2- Checking assumptions
i) Random sample (By study design and sampling
method)
ii) The data is independent of each other.
iii)Linearity assumption can be checked as follows:
1- Graphs
2- Legacy Dialogs
3- Scatter Dot
4- In the Scatter/Dot box, select Simple
Scatter, then Define
5- In the Simple Scatterplot, transfer
Physical Health to Y Axis and
Psychological Health to X Axis. then
click on OK.
113
SPSS
Statistics Advance Biostatistics
114
SPSS
Statistics Advance Biostatistics
115
SPSS
Statistics Advance Biostatistics
116
SPSS
Statistics Advance Biostatistics
117
SPSS
Statistics Advance Biostatistics
118
SPSS
Statistics Advance Biostatistics
119
SPSS
Statistics Advance Biostatistics
Correlations
Physical
Psychological
Health
Pearson Correlation 1 .951**
Physical Health Sig. (2-tailed) .000
N 371 371
Pearson Correlation .951** 1
Psychological Sig. (2-tailed) .000
N 371 371
**. Correlation is significant at the 0.01 level (2-tailed).
120
SPSS
Statistics Advance Biostatistics
5- Result presentation
Table: Correlation between physical health and
psychological health of QoL
Psychological health
r P value*
Physical health
0.951 <0.001
*Pearson Correlation
Note: if one or both variables are not normally distributed,
Pearson correlation must be used. In analysis with SPSS,
we only choose ( Pearson) from Bivariate Correlations box
and the same steps follow such as in the analysis by Pearson
correlation.
2- Linear Regression
Regression analysis is used to determine the relationship
between a dependent variable and one or more independent
variables (which are also called predictor or explanatory
variables). Linear regression explores relationships that can
be readily described by straight lines.
121
SPSS
Statistics Advance Biostatistics
Assumptions
1- Linear relationship between the two variables.
2- Fit least square line
3- Checking residual diagnostics:
3.1- Linearity of the numerical independent.
3.2- Normality of residual.
3.3- Equal variance of residual.
4- Random samples.
122
SPSS
Statistics Advance Biostatistics
Steps of analysis
1- Hypothesis and question statement
Null hypothesis: There is no linear relationship between
family monthly income and QoL in the Gaza Strip.
Research question: Is the income variable a predictor
factor of QoL in the Gaza Strip?
2- Checking assumptions
i) Random sample (By study design and sampling
method)
ii) Linearity assumption can be checked as
follows:
1- Graphs
2- Legacy Dialogs
3- Scatter Dot
4- In the Scatter/Dot box, select Simple Scatter,
then Define
5- In the Simple Scatterplot, transfer Total QoL
to Y Axis and Monthly Family Income to X
Axis. then click on OK.
123
SPSS
Statistics Advance Biostatistics
124
SPSS
Statistics Advance Biostatistics
125
SPSS
Statistics Advance Biostatistics
126
SPSS
Statistics Advance Biostatistics
127
SPSS
Statistics Advance Biostatistics
128
SPSS
Statistics Advance Biostatistics
129
SPSS
Statistics Advance Biostatistics
130
SPSS
Statistics Advance Biostatistics
131
SPSS
Statistics Advance Biostatistics
132
SPSS
Statistics Advance Biostatistics
133
SPSS
Statistics Advance Biostatistics
134
SPSS
Statistics Advance Biostatistics
135
SPSS
Statistics Advance Biostatistics
136
SPSS
Statistics Advance Biostatistics
137
SPSS
Statistics Advance Biostatistics
138
SPSS
Statistics Advance Biostatistics
139
SPSS
Statistics Advance Biostatistics
140
SPSS
Statistics Advance Biostatistics
141
SPSS
Statistics Advance Biostatistics
Model Summary
Adjusted R Std. Error of
Model R R Square
Square the Estimate
1 .831a .690 .689 8.81543
a. Predictors: (Constant), Monthly Family Income
142
SPSS
Statistics Advance Biostatistics
Coefficientsa
95.0%
Unstandardized Standardized Confidence
Coefficients Coefficients Interval for
Model t Sig.
B
Std. Lower Upper
B Beta
Error Bound Bound
(Constant) 47.476 1.867 25.427 .000 43.802 51.149
1 Monthly
Family .019 .001 .831 26.550 0.000 .017 .020
Income
143
SPSS
Statistics Advance Biostatistics
144
SPSS
Statistics Advance Biostatistics
Steps of analysis
1- Analyze
2- Descriptive
3- Frequencies
4- In the frequencies box, transfer variables of Gender,
Age, Sick and Monthly Family Income to
variable(s), then select Statistics.
5- In the Frequencies Statistics box, select Mean,
Median and Std. deviation, then click continue.
6- Click on OK
145
SPSS
Statistics Advance Biostatistics
146
SPSS
Statistics Advance Biostatistics
147
SPSS
Statistics Advance Biostatistics
A) Numerical variables
Checking assumptions
148
SPSS
Statistics Advance Biostatistics
149
SPSS
Statistics Advance Biostatistics
150
SPSS
Statistics Advance Biostatistics
151
SPSS
Statistics Advance Biostatistics
Model Summary
Adjusted R Std. Error of
Model R R Square
Square the Estimate
1 .831a .690 .689 8.81543
a. Predictors: (Constant), Monthly Family Income
Coefficientsa
95.0%
Unstandardized Standardized
Confidence
Coefficients Coefficients
Model t Sig. Interval for B
Std. Lower Upper
B Beta
Error Bound Bound
(Constant) 47.476 1.867 25.427 .000 43.802 51.149
1 Monthly
Family .019 .001 .831 26.550 .000 .017 .020
Income
a. Dependent Variable: Total QoL
152
SPSS
Statistics Advance Biostatistics
153
SPSS
Statistics Advance Biostatistics
Coefficientsa
Unstandardized Standardized 95.0% Confidence
Model Coefficients Coefficients t Sig. Interval for B
B Std. Error Beta Lower Bound
Upper Bound
(Constant) 54.786 3.167 17.302.000 48.555 61.016
1
Age 1.399 .107 .594 13.127.000 1.189 1.608
a. Dependent Variable: Total QoL
B) Categorical variables
3.A: Simple Linear Regression for Gender variable
Females have 1.89 higher total QoL than males, but the
relationship is not significant (p = 0.287).
154
SPSS
Statistics Advance Biostatistics
Coefficientsa
Standardize 95.0%
Unstandardize
d Confidence
d Coefficients
Model Coefficients t Sig. Interval for B
Std. Lower Upper
B Beta
Error Bound Bound
(Constant 34.78 .00 92.58 103.68
98.135 2.821
) 6 0 5 6
1
- .28 -
Gender 1.892 1.774 -.060- 1.599
1.066- 7 5.383-
a. Dependent Variable: Total QoL
156
SPSS
Statistics Advance Biostatistics
157
SPSS
Statistics Advance Biostatistics
5- Results interpretation
The model summary table shows the R and R2 of each
model. The model 3 is the better model because 70.6% of
the model variation explained by the independent's
variables compared to the two other models.
Model Summary
Adjusted R Std. Error of
Model R R Square
Square the Estimate
1 .831a .690 .689 8.81543
2 .838b .702 .700 8.66576
3 .840c .706 .703 8.62044
a. Predictors: (Constant), Monthly Family Income
b. Predictors: (Constant), Monthly Family Income, Sick
c. Predictors: (Constant), Monthly Family Income, Sick, Age
158
SPSS
Statistics Advance Biostatistics
ANOVAa
Sum of Mean
Model df F Sig.
Squares Square
Regression 54781.185 1 54781.185 704.928 .000b
1 Residual 24556.906 316 77.712
Total 79338.091 317
Regression 55683.032 2 27841.516 370.748 .000c
2 Residual 23655.059 315 75.095
Total 79338.091 317
Regression 56004.118 3 18668.039 251.212 .000d
3 Residual 23333.973 314 74.312
Total 79338.091 317
a. Dependent Variable: Total QoL
b. Predictors: (Constant), Monthly Family Income
c. Predictors: (Constant), Monthly Family Income, Sick
159
SPSS
Statistics Advance Biostatistics
Coefficientsa
95.0%
Unstandardized Standardized Confidence
Coefficients Coefficients Interval for
Model t Sig.
B
Std. Lower Upper
B Beta
Error Bound Bound
(Constant) 47.476 1.867 25.427 .000 43.802 51.149
Monthly
1
Family .019 .001 .831 26.550 .000 .017 .020
Income
(Constant) 41.884 2.444 17.138 .000 37.075 46.692
Monthly
2 Family .018 .001 .816 26.253 .000 .017 .020
Income
Sick 3.657 1.055 .108 3.465 .001 1.581 5.734
(Constant) 39.508 2.686 14.707 .000 34.223 44.794
Monthly
Family .017 .001 .762 18.849 .000 .015 .019
3
Income
Sick 3.524 1.052 .104 3.351 .001 1.455 5.594
Age .198 .095 .084 2.079 .038 .011 .385
a. Dependent Variable: Total QoL
160
SPSS
Statistics Advance Biostatistics
161
SPSS
Statistics Advance Biostatistics
162
SPSS
Statistics Advance Biostatistics
Coefficientsa
Standa
Un 95.0%
rdized
standardized Confidence
Coeffic
Model Coefficients t Sig. Interval for B
ients
Std. Lower Upper
B Beta
Error Bound Bound
(Constant) 37.308 6.988 5.339 .000 23.559 51.057
Monthly
Family .018 .003 .802 6.419 .000 .012 .023
Income
1 Age .270 .232 .115 1.163 .246 -.187- .726
Sick 3.496 1.057 .103 3.309 .001 1.417 5.575
-
Income*Age 2.774E .000 -.065- -.341- .733 .000 .000
-005
a. Dependent Variable: Total QoL
Coefficientsa
95.0%
Unstandardized Standardized
Confidence
Coefficients Coefficients
Model t Sig. Interval for B
Std. Lower Upper
B Beta
Error Bound Bound
(Constant) 40.702 6.674 6.099 .000 27.570 53.833
Monthly
Family .016 .003 .739 5.992 .000 .011 .022
Income
1
Age .197 .095 .084 2.063 .040 .009 .384
-
Sick 2.844 3.634 .084 .783 .434 9.995
4.306-
IncomeSick .000 .001 .033 .195 .845 -.003- .003
a. Dependent Variable: Total QoL
163
SPSS
Statistics Advance Biostatistics
Coefficientsa
95.0%
Unstandardized Standardized Confidence
Coefficients Coefficients Interval for
Model t Sig.
B
Std. Lower Upper
B Beta
Error Bound Bound
(Constant) 33.014 8.736 3.779 .000 15.826 50.203
Monthly
Family .017 .001 .757 18.474 .000 .015 .019
Income
1
Age .429 .311 .182 1.379 .169 -.183- 1.041
-
Sick 7.145 4.752 .210 1.503 .134 16.496
2.206-
Age*Sick -.122- .157 -.154- -.781- .435 -.430- .186
a. Dependent Variable: Total QoL
ii) Multicollinearity
The stability of the regression model should be checked,
which mean that the independent variables have to be
unrelated. Multicollinearity indicates that the independent
variables are highly correlated and it is checked by
determine Variance Inflation Factor (VIF). VIP must be less
than 10.
To identify VIF, all steps of multiple linear regression are
repeated with the three independent variables and Method
is Enter. In Linear Regression: Statistics box, select
Collinearity diagnosis in addition to Confidence intervals.
164
SPSS
Statistics Advance Biostatistics
The results in table below shows that all VIF of the three
variables are < 10 which are acceptable.
Coefficientsa
95.0%
Unstandardized Standardized Confidence Collinearity
Coefficients Coefficients Interval for Statistics
Model t Sig.
B
Std. Lower Upper
B Beta Tolerance VIF
Error Bound Bound
(Constant) 39.508 2.686 14.707 .000 34.223 44.794
Monthly
Family .017 .001 .762 18.849 .000 .015 .019 .574 1.743
1
Income
Age .198 .095 .084 2.079 .038 .011 .385 .574 1.742
Sick 3.524 1.052 .104 3.351 .001 1.455 5.594 .977 1.024
a. Dependent Variable: Total QoL
165
SPSS
Statistics Advance Biostatistics
166
SPSS
Statistics Advance Biostatistics
SLR* MLR**
p t p
Variables b (95% Adjusted b
valu statisti valu
CI) (95% CI)
e cs e
Monthly 0.019
<0.0 0.017 (0.015, <0.0
family (0.0170, 18.849
01 0.019) 01
income 0.020)
1.399
<0.0 0.198 (0.011, 0.03
Age (1.189, 2.079
01 0.385) 8
1.608)
7.556
Sickness <0.0 3.524 (1.455, 0.00
(3.891, 3.351
(Healthy) 01 5.594) 1
11.221)
1.892 (-
Gender 0.28
5.383-, - - -
(Female) 7
1.599)
*Simplelinear regression, ** Multiple linear regression
Model assumption are met. There are no interaction and multicollinearity
problem.
167
SPSS
Statistics Advance Biostatistics
Steps of analysis
1- Hypothesis and question statement
Research question: What are factors associated with
depression in a sample of adults in the Gaza Strip?
objective: to identify the associated factors of
depression in a sample of adults in the Gaza Strip.
168
SPSS
Statistics Advance Biostatistics
1- Analyze
2- Regression
3- Binary Logistic
4- In the Logistic Regression box, transfer
Depression to Dependent, and age to Covariates,
then select Options.
5- In the Logistic Regression: Options box, select
CI for exp(B). Then click on continue.
6- Click on OK.
169
SPSS
Statistics Advance Biostatistics
170
SPSS
Statistics Advance Biostatistics
The table shows the p value is 0.034 and the odds ratio of
age is 0.969. People of the sample with an increase in one-
year age, will have an increase 0.969 times the odds to
depression.
171
SPSS
Statistics Advance Biostatistics
172
SPSS
Statistics Advance Biostatistics
The table shows the p value is <0.001 and the odds ratio of
obesity is 8.112. Obese have 8.1 times the odds to
depression than non-obese.
The table shows the p value is <0.001 and the odds ratio of
age is 11.085. Smokers have 11.0 times the odds to
depression than non-smokers.
The table shows the p value is 0.002 and the odds ratio of
Gender is 2.022. Males have 2.0 times the odds to
depression than females.
173
SPSS
Statistics Advance Biostatistics
Walid
Crude OR (95%
Variable statistics p value
CI)
(df)
Age 0.97 (0.94, 1.00) 4.50 (1) 0.034
Obesity
Non-obese 1.00 65.35 (1) <0.001
Obese 8.11 (4.88, 13.47)
Smoking
1.00
Non-smoker <0.001
11.08 (6.41, 74.122 (1)
Smoker
19.169)
Gender
Female 1.00
9.46 (1) 0.002
Male 2.02 (1.29, 3.16)
174
SPSS
Statistics Advance Biostatistics
175
SPSS
Statistics Advance Biostatistics
176
SPSS
Statistics Advance Biostatistics
177
SPSS
Statistics Advance Biostatistics
178
SPSS
Statistics Advance Biostatistics
179
SPSS
Statistics Advance Biostatistics
180
SPSS
Statistics Advance Biostatistics
iii)Multicollinearity
The stability of the logistic regression model should be
checked, which mean that the independent variables have
to be unrelated. Multicollinearity indicates that the
independent variables are highly correlated. There is no
facility to check multicollinearity in logistic regression but
it is alternatively checked by linear regression analysis, in
which determines by Variance Inflation Factor (VIF). VIP
must be less than 10.
181
SPSS
Statistics Advance Biostatistics
The results in table below shows that all VIF of the three
variable are < 10 which are acceptable.
Coefficientsa
Standardize
Unstandardize Collinearity
d
d Coefficients Statistics
Model Coefficients t Sig.
Toleranc
B Std. Error Beta VIF
e
(Constant
.585 .069 8.516 .000
)
1
Obesity .405 .043 .407 9.475 .000 .978 1.023
Smoking .459 .043 .458 10.673 .000 .978 1.023
a. Dependent Variable: Depression
182
SPSS
Statistics Advance Biostatistics
183
SPSS
Statistics Advance Biostatistics
Classification Tablea
Predicted
Depression
Observed Percentage
Non-
Depressed Correct
Depressed
Non-
166 8 95.4
Depression Depressed
Step 1
Depressed 50 94 65.3
Overall Percentage 81.8
a. The cut value is.500
184
SPSS
Statistics Advance Biostatistics
185
SPSS
Statistics Advance Biostatistics
186
SPSS
Statistics Advance Biostatistics
187
SPSS
Statistics Advance Biostatistics
The result of the ROC Curve figure shows that the area
under the curve is 0.823 which mean that the model can
accurately discriminate 82.3% of the case. More than 50%
of cases are discriminated significantly.
188
SPSS
Statistics Advance Biostatistics
189
SPSS
Statistics Advance Biostatistics
190
References
References
Creswell, J. W. (2002). Research design: qualitative,
quantitative, and mixed methods approaches. 2nd Edition.
California: Sage Publication, Inc., pp. 153-159. (ISBN:
978-0761924425) (Book).
191
References
192
References
193
References
E-KUTUB
Publisher of publishers
Amazon & Google Books Partner
No 1 in the Arab world
Registered with Companies House in England
under Number: 07513024
Email: [email protected]
Website: www.e-kutub.com
Germany Office: In der Gass 10,
55758 Niederwörresbach,
Rhineland-Palatinate
UK Registered Office:
28 Lings Coppice,
London, SE21 8SY
Tel: (0044)(0)2081334132
194