0% found this document useful (0 votes)
129 views194 pages

طرق متقدمة في الإحصاء الحيوي بواسطة SPSS

The document describes how to create a bar chart to present categorical variable data in SPSS, including selecting "Bar" from the Graphs menu, choosing "Simple" bar chart type, defining the categorical variable ("Blood Pressure") as the category axis, which produces a bar chart displaying the frequency of each blood pressure category.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
129 views194 pages

طرق متقدمة في الإحصاء الحيوي بواسطة SPSS

The document describes how to create a bar chart to present categorical variable data in SPSS, including selecting "Bar" from the Graphs menu, choosing "Simple" bar chart type, defining the categorical variable ("Blood Pressure") as the category axis, which produces a bar chart displaying the frequency of each blood pressure category.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 194

Advanced Methods

in Biostatistics
By SPSS

Dr. Hatem Yousef Abu Zaydeh


Assistant Professor in
Neurophysiology

eKutub

London, January 2019


Advanced Methods in Biostatistics By SPSS
By: Dr. Hatem Yousef Abu Zaydeh
All Rights Reserved to the author ©
Published by E-Kutub Ltd
Distribution: Amazon, Kindle, Google Books, Play
Store & e-kutub.com
ISBN: 978-1-78058-430-0
First Edition
London, Jan. 2019
You can write to the author at:
[email protected]
It is not allowed to reprint or publish this book, or
any part thereof, by any means without the
permission of the publisher.

You can write to the publisher at:


[email protected]

2
ACKNOWLEDGEMENTS

Praise and Thanks to Allah for His Blessings

We would like to express our deepest appreciation to all


those who provided the possibility to complete this book.

We would like to acknowledge the assistance provided by


the academic staff of the School of Health Sciences at the
Universiti Sains Malaysia for the courses and lectures
through the project of my PHD, in which they gave all their
experts in statistics and the basics of research methodology.

3
4
Contents
ACKNOWLEDGEMENTS ................................3
Introduction ........................................... 7
General Objectives of the Book .................8
Chapter 1: Chart Presentation ................. 9
SPSS statistics program ......................... 10
Creating a new data file ......................... 11
Descriptive and inferential statistics ......... 15
A- Categorical variable : Chart presentation
(Bar chart) ........................................... 18
B-Numerical variable.............................. 25
C-Numerical variable: Chart presentation . 29
Chapter 2: Parametric Statistics ........... 38
Parametric tests .................................... 39
Independent T test and paired t test ........ 39
2-Independent t test .............................. 39
2- Paired t test ...................................... 48
3- One -Way ANOVA .............................. 56
Chapter 3: Non Parametric Statistics .... 75
Non Parametric tests.............................. 76
1-Mann-Whitney test ............................. 76
2-Wilcoxon Signed-Ranks Test ................ 83

5
3-Kruskal–Wallis test ............................. 88
Post Hoc Analysis .................................. 95
1- Chi-Square Test ................................ 98
2- Fisher's Exact Test ........................... 103
Chapter 4: Correlation and Regression 111
1- Correlation ..................................... 112
2- Linear Regression ............................ 121
3- Multiple Logistic Regression .............. 167
References ......................................... 191

6
Introduction
Recently, with the advancement of statistical methods in
applied scientific research, there is need for a simplified
statistical material and easily available among researchers
and graduate students.

Many researchers are not specialized in statistics and are not


required to be specialists in statistics. Sometimes,
researchers need to analyze and interpret data deeply by
themselves. The book included basic of biostatistics and
covers a number of statistical procedures that SPSS
Statistics performs, described easily methods, as enable
researchers to analyze various data and some progressive
statistics

This book is designed to help researchers learn how to


analyze and interpret research and explains the output with
simple presentations. It provide a scientific essential topics
related to the SPSS, which depends on a real data of a health
research conducted, and deals simply with how it works to
insert data, presentation with different graphics, and
conducts basic statistics such as Parametric and
Nonparametric tests.

Advanced statistics are presented such as Linear and


Multiple Regression, Multiple Logistic Regression and Roc
Curve.

7
General Objectives of the Book
Upon completion of this book, the health students will be
able to: -

▪ Creating a new data file with PASW statistics.


▪ Present of categorical and numerical variables.
▪ Perform main parametric and nonparametric
statistics.
▪ Determine the relationship between a dependent
variable and one or more independent variables by
Linear Regression.
▪ Formulate Multiple Linear Regression Analysis.
▪ Describe the association of several independent
variables to a categorical dependent variable by
Multiple Logistic Regression.

8
Chapter 1
Chart Presentation

9
SPSS
Statistics Advance Biostatistics

SPSS statistics program


SPSS is a Windows based program that can be used to
perform data entry and analysis and to create tables and
graphs. SPSS is capable of handling large amounts of data
and can perform all of the analyses covered in the text and
much more. SPSS is commonly used in the Social Sciences,
so familiarity with this program should serve you well in
the future.

10
SPSS
Statistics Advance Biostatistics

Creating a new data file


To create a new data file, variable view should be activated.
There are two types of views:
Data view: this view displays the actual data values or
defined value labels.
Variable view: this view displays variable definition
information, including defined variable and value labels,
data type (for example, string, date, or numeric),
measurement level (nominal, ordinal, or scale), and user-
defined missing values.

1. Open the PASW Statistics program.


2. Double-click a variable name at the top of the column
in data view, or click the variable view tab.
3. Fill up the following of the variable attributes:

11
SPSS
Statistics Advance Biostatistics

a) Name: insert name of the variable, each variable


name must be unique and can be up to 64 bytes long,
and the first character must be a letter.
b) Type: variable type defines the data type for each
variable. You can use a variable type to change the
data type. The main two we will deal with are “
string” and “ numeric” variables.

String variables include letters, and can be used for


names or perhaps brief open-ended responses.

Numeric variables are numbers. In either case,


you’ll need to specify your variable width. You can
do that in the dialogue box, or in the subsequent
width and decimal columns.

12
SPSS
Statistics Advance Biostatistics

If variable is date, select a format from the list. You can


enter dates with slashes, hyphens, periods, commas, or
blank spaces as delimiters.

c) Lable: lable allows to specify a longer variable


name. This longer label will appear on any charts or
graphs you produce. It can contain spaces and
reserved characters that are not allowed in variable
names.

d) Values: values allow to connect the values


(numbered codes) of coding scheme to the original
categories. The categorical variables are coded with
numbers, for example to code male and female
variable put the number in value in box and the name
of category in label in box. Such as 1 for male and 2
for female.

13
SPSS
Statistics Advance Biostatistics

e) Missing values: missing values refer to a missing


data code. This is a special number that is treated as
a unique code to identify places where no data. The
upshot of doing this to avoid including it as a real
number when statistics are computed.
4. After entering all characteristics of the variables is
completed, save and name the file.

14
SPSS
Statistics Advance Biostatistics

Descriptive and inferential statistics


Descriptive statistics: statistical procedures used to
describe and summarize a dataset. It involves collections,
organization, analysis, interpretation and presentation of
sample size. It is presented in a form of tables, graphs and
narrative.

Inferential statistics: inferential statistics is the branch of


statistics that deals with using sample data to make valid
judgments (inferences) about the population from which the
sample data came.

Statistics presentation of categorical and numerical


variables
A. Categorical variable: frequency
1. Absolute count: the actual number of count.
2. Relative frequency: the proportion or percentage.
3. Cumulative relative: the combined percentage of a
given and preceded value in the distribution.

Perform frequencies in SPSS


1- Analyze
2- Descriptive Statistics
3- Frequencies
15
SPSS
Statistics Advance Biostatistics

4- In the box of frequencies, transfer Blood Pressure


[BP] to the Variable(s)
5- Click on OK.

16
SPSS
Statistics Advance Biostatistics

Blood Pressure

Valid
Valid Frequency Percent
Percent
Normal 112 58.9 58.9
Hypertension 58 30.5 30.5
Hypotension 20 10.5 10.5
Total 190 100.0 100.0

17
SPSS
Statistics Advance Biostatistics

A- Categorical variable: Chart presentation


(Bar chart)

i) Chart presentation (Bar chart)


Perform Bar chart s in SPSS
1- Graphs
2- Legacy Dialogs
3- Bar
4- In the box of Bar Charts, select Simple.
5- Then, click on Define.
6- In the box of Define Simple Bar: Summaries for
Group Cases, transfer Blood Pressure [BP] to
Category Axis
7- Click on OK.

18
SPSS
Statistics Advance Biostatistics

19
SPSS
Statistics Advance Biostatistics

20
SPSS
Statistics Advance Biostatistics

ii) Chart presentation (Pie chart)


Perform Pie chart s in SPSS
1- Graphs
2- Legacy Dialogs
3- Pie
4- In the box of Pie Charts, click on Define..
5- In the box of Define Pie: Summaries for Group Cases,
transfer Blood Pressure [BP] to Define Slices.
6- Click on OK.

21
SPSS
Statistics Advance Biostatistics

22
SPSS
Statistics Advance Biostatistics

23
SPSS
Statistics Advance Biostatistics

24
SPSS
Statistics Advance Biostatistics

B-Numerical variable
i- Measurement of central tendency
Central tendency: an average value of any distribution of
data that best represents the middle. Also called centrality.

a. Mean: The mean or the average is a measure of central


tendency and calculated by summation of all values divides
by the number of values.
Example: what is the mean of repeated measurements for
diastolic blood pressure in a week.(90, 95, 100, 85, 90,90, 95).

X = mean for sample


xi = summation of all values
n = number of all values
X = 90 + 95 + 100 + 85 + 90 + 90 + 95
= 92.1
7
b. Median: The median is the middle value in the list of
numbers, in which the numbers before and after median are
equel.
Example: what is the median of these weight
measurements:
( 80, 76, 85, 90, ,87, 95,79).
(76, 79, 80, 85, 87, 90, 95).
The median is 85.
25
SPSS
Statistics Advance Biostatistics

c. Mode: mode is the observation that occurs most


frequently in a data.
Example: what is the mode of these weight measurements:
( 84, 88, 84, 92, ,87, 90,79, 88, 84).
The mode is 84.

ii- Measurement of central tendency


Dispersion indicates to how dispersed or deviated the values
from one another in distribution. It includes variance,
standard deviation interquartile range and range.
a. Variance: Variance refers to the mount of spread of
observation from mean. It is the average of the squared
differences from the mean.

b. Standard deviation (SD): Standard deviation is a


statistical term that measures the amount of variability or
dispersion around an average. It is equal the positive
square root of the variance.

26
SPSS
Statistics Advance Biostatistics

Example of variance and standard deviation


No. xi Xi - X (Xi – X)2
1. 93 6 36
2. 88 1 1
3. 80 -7 49
4. 85 2 4
5. 84 -3 9
6. 84 -3 9
7. 88 1 1
8. 92 5 25
9. 88 1 1
10. 90 3 9
11. 85 -2 4
12. 90 3 9
13. 89 2 4
14. 92 5 25
15. Mean = 87

Variance:
S2 = 30.92

Standard deviation
S = 5.56

27
SPSS
Statistics Advance Biostatistics

c. Range: range refers to the extreme values in a set of


observation.
Range = maximum value – minimum value
( 84, 88, 84, 92, ,87, 90,79, 88, 84).
Range = 92 – 79 = 13

d. Interquartile range (IQR): IQR is the difference


between the third and the first quartiles.

IQR

25% 25% 25% 25%

Min 25th percentile 50th


percentile 75th percentile Max

28
SPSS
Statistics Advance Biostatistics

C-Numerical variable: Chart presentation


a) Histogram
- A common pattern is the bell–shaped curve.
- A symmetric distribution is one in which the 2
"halves" of the histogram appear as mirror-images
of one another.
- About 68% of the data fall between +1SD and mean
-1SD.
- About 95% of the data fall between +2SD and mean
-2SD.
- About 99.7% of the data fall between +3SD and
mean -3SD.

29
SPSS
Statistics Advance Biostatistics

Perform Histogram in SPSS


1- Graphs
2- Legacy Dialogs
3- Histogram
4- In the box of Histogram, transfer Total QoL] to
Variable. Select "Display normal curve".
5- Click on OK.

30
SPSS
Statistics Advance Biostatistics

31
SPSS
Statistics Advance Biostatistics

b) Stem and leaf


A stem-and-leaf plot is a display that organizes data to show
its shape and distribution.
The stem-and-leaf plot has two portions:
- The stem portion is vertical and arranged in order
vertically, smallest to largest.
- The leaf is usually the last digit of the number and
the other digits to the left of the leaf form the stem.

c) Box and whiskers plot


- A box-and-whisker plot display data values and allows
to see data and to draw conclusions as they compare
two or more data sets.
- It turns all of the data into a summary that shows only
five data points.
- The five points are the median, the upper and lower
quartiles, and the smallest and greatest values in the
distribution.
- Whisker represents minimum and maximum values,
wherease the box represents 25th to 75th percentile.
- Outliters represent the observations lie above 75th and
below 25th percentiles.

32
SPSS
Statistics Advance Biostatistics

Perform box plot and stem and leaf in SPSS


1- Analyze
2- Descriptive Statistics
3- Explore
4- In the box of Explore, transfer Total QoL] to
Dependent List.
5- Select Plots
6- Click on OK.

33
SPSS
Statistics Advance Biostatistics

Stem-and-Leaf Plot
F Frequency Stem & Leaf

3.00 Extremes (=<53)


2.00 5. 77
3.00 6. 001
7.00 6. 5566778
16.00 7. 0000112222333444
21.00 7. 555566667788888899999
37.00 8. 0000000001111222222233333444444444444
39.00 8. 555555555556666677777778888888888999999
38.00 9. 00000111111112222222222233333334444444
56.00 9. 55556666666666666666666777777777777778888889999999999999
45.00 10. 000000000000001111111122222222222233333334444
46.00 10. 5555555556666666666677777777788888899999999999
25.00 11. 0000000001111111224444444
20.00 11. 55555666667777889999
5.00 12. 00001
3.00 12. 579
2.00 13. 00
3.00 Extremes (>=139)

Stem width: 10.00


Each leaf: 1 case(s)

34
SPSS
Statistics Advance Biostatistics

Box plot

Parametric and Nonparametric Statistics


A decision of statistical analysis must be taken before
analyzing a data. There are many tests from which to choose
and to select the one that is the most appropriate for a
specific data. Parametric statistics are calculated under the
assumption that the data follow some common distribution
such as the normal distribution. It follows that statistical
tests based on these parametric statistics are called
parametric statistical tests.
However, when a data are not normally distributed, or a
sample is limited (less than 30). nonparametric tests are
used. Nonparametric, or distribution free tests are so-called
because the assumptions underlying their use are “fewer
and weaker than those associated with parametric tests”
(Siegel & Castellan, 1988, p. 34). To put it another way,
35
SPSS
Statistics Advance Biostatistics

nonparametric tests require few if any assumptions about


the shapes of the underlying population distributions. For
this reason, they are often used in place of parametric tests
when the assumptions of the parametric test have been too
grossly violated.

36
SPSS
Statistics Advance Biostatistics

37
SPSS
Statistics Advance Biostatistics

Chapter 2
Parametric Statistics

38
SPSS
Statistics Advance Biostatistics

Parametric tests
Independent T test and paired t test
T tests are used to compare the differences in means
between two groups. There are three types of t test:
1- Independent t test: it is used to compare between
two means of independent samples.
2- Paired t test: it is used to compare between two
means of dependent samples.
3- One sample t test: it is used to determine if the
mean is statistically different from a specific value.

2-Independent t test
The independent t test is used to determine whether
there is a statistically significant difference between
the means of a numerical variable in two unrelated
groups or to compare between the means of two
independent groups.
The dependent variable should be numerical and
the independent variable or factor variable should
be categorical that has two levels.

39
SPSS
Statistics Advance Biostatistics

Assumptions
1- The dependent variable is normally distributed
within each group.
2- The variances of the two groups are measured
to be equal.
3- The two groups are independent to each other.
4- Random sample.

Steps of analysis
1- Hypothesis and question statement
Null hypothesis: There is no mean difference of Quality of
Life (QoL) between males and females in the Gaza Strip.
Research question: Is the mean of QoL in the Gaza Strip
different between males and females.

2- Checking assumptions
i) Random sample (By study design and sampling
method)
ii) The two groups are independent to each other (By study
design)
iii) Normality assumption: since the sample size more than
30 for each group, consider Central Limit Theorem.

40
SPSS
Statistics Advance Biostatistics

But if checking normality is needed, the steps are as


follows:
1-
Analyze
2-
Descriptive Statistics
3-
Explore
4-
In the box of explore, transfer dependent variable
(Total QoL) to the dependent list and
independent variable (gender) to factor list.
iv) By checking the homogeneity of variance through
Levene's test in SPSS, both groups have the same
variances ( p> 0.05).

41
SPSS
Statistics Advance Biostatistics

42
SPSS
Statistics Advance Biostatistics

3- Perform independent t test in SPSS


1. Analyze
2. Compare means
3. Independent-Sample T Test

43
SPSS
Statistics Advance Biostatistics

4. In the box of Independent-Sample T Test, transfer


dependent variable (Total QoL) to the Test Variable
(s) and independent variable (gender) to Grouping
Variable
5. Click on Define Groups to specify two levels of
gender variable, a new dialog box is appeared, then
inter 1 as coded for male and 2 as coded for female.
6. Click continue, then OK.

44
SPSS
Statistics Advance Biostatistics

45
SPSS
Statistics Advance Biostatistics

4- Results interpretation and conclusion


Descriptive statistics in the table below display means and
standard deviations of both groups.

Group Statistics
Std. Std. Error
Gender N Mean
Deviation Mean
Female 179 100.3687 21.563 1.611
Total
Male 192 92.7708 18.893 1.363

Levene's test showed p = 0.182 > 0.05, therefore the


assumption for equal variance is met. If p > 0.05 we should
read the statistics in the upper row.
The results in the row labeled "Equal variance assumed"
showed p value of t statistic for the difference between
males and females is <0.001

46
SPSS
Statistics Advance Biostatistics

Independent Samples Test


Levene'
s Test
for
Equality t-test for Equality of Means
of
Varianc
es
95%
Sig. Std. Confidence
Mean
(2- Error Interval
F Sig. t df Differen
taile Differen of the
ce
d) ce Difference
Lower Upper
Equal
varianc
1.7 3.61 3.4657 11.730
es .182 369 .000 7.59788 2.10138
9 6 1 06
assume
d
Equal
varianc
3.59 354.66 3.4460 11.749
es not .000 7.59788 2.11110
9 6 3 73
assume
d

Conclusion: There is significant mean difference of


Quality of Life (QoL) between males and females in the
Gaza Strip.

47
SPSS
Statistics Advance Biostatistics

5- Result presentation

Table: Characteristics between males and females

t statistics
Variable Mean (SD) *p value
(df)
Gender Males Females
92.77 100.36 3.616
<0.001
(18.8) (21.5) (369)
*Independent t test

2- Paired t test
The paired t test is used when the data is from only one
group of samples with two time measurement. The test
explore the difference between two dependent or paired
sample means such as:
1- Matched samples.
2- Two observations in same subject (before and after
intervention).
3- Closely related subjects.

Assumptions
1. Test variables are numerical.
2. The measurements are dependent or paired.
3. Random sample.

48
SPSS
Statistics Advance Biostatistics

4. The observation differences are normally


distributed, if sample size is > 30, apply Central
Limit Theorem. The normality of differences could
be measured by following steps:
a) Transform and Compute variable
b) In the box of compute variable, name the new
variable in Target Variable.
c) Transfer pre and post from Type & Label to
Numeric expression to calculate the difference
between pre and post.
d) Click on OK. A new variable named difference
appeared.
e) To check normality of the new variable, the steps
are as follows:
i) Analyze
ii) Descriptive Statistics
iii) Explore
iv) In the box of explore, transfer
difference variable to the dependent
list.
v) Click on Plots and choose
Histogram from the new box.
vi) Click on OK.

49
SPSS
Statistics Advance Biostatistics

50
SPSS
Statistics Advance Biostatistics

51
SPSS
Statistics Advance Biostatistics

Steps of analysis
1. Hypothesis and question statement
Null hypothesis: There is no mean difference in body weight
of pre and post.
Research question: Is the mean of body weight different
between pre and post tests.
2. Checking assumptions
3. Perform paired t test in SPSS
1- Analyze
2- Compare means
3- paired-Sample T Test
4- Transfer pre test to Variable1 and post test to
Variable2.
5- Click OK.
52
SPSS
Statistics Advance Biostatistics

53
SPSS
Statistics Advance Biostatistics

4- Results interpretation and conclusion


Descriptive statistics in the table below display means and
standard deviations of score of pre and post tests.

Paired Samples Statistics


Std. Error
Mean N Std. Deviation
Mean
Pre Test 84.62 50 8.998 1.272
Pair 1
Post Test 80.40 50 8.475 1.198

The mean body weight score between pre and post showed
p value of t statistic for the difference is <0.001.
The mean (SD) body weight after intervention is less than
pre-intervention [80.40 (8.4) vs 84.6 (8.9)]

54
SPSS
Statistics Advance Biostatistics

Paired Samples Test


Paired Differences
Mean Std. Std. 95%
Deviation Error Confidence Sig.
Mean Interval of t df (2-
the tailed)
Difference
Lower Upper
Pre
Test
Pair1 - 4.2200 2.764 0.391 3.43 5.00 10.793 49 .000
Post
Test

Conclusion: There is mean difference of body weight


between pre and post.

5- Result presentation
Table: change of body weight before and after
intervention

Pre Post
t
score score
Variable statistics *p value
mean mean
(df)
(SD) (SD)

84.62 80.40 10.793


<0.001
(8.99) (8.47) (49)

*Paired t test

55
SPSS
Statistics Advance Biostatistics

3- One -Way ANOVA


One-Way ANOVA is used to compare the means of three
or more groups to determine whether they differ
significantly from one another. Another important function
is to estimate the differences between specific groups by
post-hoc test.
Assumptions
1. Dependent variables are numerical.
2. The groups are independent of each other.
3. Random sample.
4. Homogeneity of variances, the groups should come
from population with equal
variances.
5. The observation differences of each group are
normally distributed, if sample size is > 30, apply
Central Limit Theorem.

Steps of analysis
1-Hypothesis and question statement
Null hypothesis: There is no mean difference of Quality of
Life (QoL) among the five areas in the Gaza Strip.
Research question: are the means of QoL in the Gaza Strip
different among the five areas in the Gaza Strip.
2- Checking assumptions
i) Random sample (By study design and sampling
method)
56
SPSS
Statistics Advance Biostatistics

ii) The two groups are independent to each other (By


study design)
iii) Normality assumption: since the sample size more
than 30 for each group, consider Central Limit
Theorem. But if checking normality is needed, the
steps are as follows:
1. Analyze
2. Descriptive Statistics
3. Explore
4. In the box of explore, transfer dependent
variable (Total QoL) to the dependent list and
independent variable (gender) to factor list.
i) By checking the homogeneity of variance through
Levene's test in SPSS, both groups have the same
variances ( p> 0.05).

3- Perform One -Way ANOVA in SPSS


1- Analyze
2- Compare means
3- One -Way ANOVA
4- In the box of One -Way ANOVA, transfer dependent
variable (Total QoL) to the Dependent List and
independent variable (Residence) to Factor
5- Click on Option, a new dialog box is appeared, in One
-Way ANOVA: Options box select Descriptive and
Homogeneity of variance test.
6- Click continue, then OK.

57
SPSS
Statistics Advance Biostatistics

58
SPSS
Statistics Advance Biostatistics

4- Results interpretation and conclusion


Descriptive statistics in the table below display means and
standard deviations of five residence area, the highest mean
was in Gaza Town.

59
SPSS
Statistics Advance Biostatistics

Descriptive
Total QoL
95%
Confidence
Std.
Std. Interval for Minimu Maximu
N Mean Deviatio
Error Mean m m
n
Lower Upper
Bound Bound
North
95.811 21.8600 2.6316 90.560 101.062
of 69 67.00 150.00
6 3 4 2 9
Gaza
Gaza 101.13 18.1029 2.0765 96.994 105.268
76 70.00 150.00
Town 1 2 5 9 3
Middl
90.466 17.5193 2.0229 86.435
e 75 94.4975 49.00 121.00
7 6 6 8
Areas
Ghan 96.162 14.8139 1.7220 92.730
74 99.5943 50.00 127.00
Yunis 2 4 9 1
98.441 27.0951 3.0877 92.291 104.591
Rafah 77 30.00 150.00
6 0 7 7 4
37 96.436 20.5525 1.0670 94.338
Total 98.5349 30.00 150.00
1 7 9 4 4

Levene's test showed p = 0.004 < 0.05, therefore the


variance is heterogeneous between groups. Although the
assumption is not met, there is a remedy for this in post hoc
analysis (select Dunnett's C).

Test of Homogeneity of Variances


Total QoL
Levene Statistic df1 df2 Sig.
3.914 4 366 .004

60
SPSS
Statistics Advance Biostatistics

The results in the ANOVA output showed p value of t


statistic for the difference between groups is <0.001.
Therefore reject the null hypothesis and there is a
significant difference. We do not know which group are
different. Moreover, post hoc test must be priced.

ANOVA
Total QoL
Sum of Mean
df F Sig.
Squares Square
Between
4690.319 4 1172.580 2.831 0.025
Groups
Within Groups 151600.943 366 414.210
Total 156291.261 370

4.1 Post Hoc Test


1- Analyze
2- Compare means
3- One -Way ANOVA
4- Post Hoc
5- Dunnett's C

61
SPSS
Statistics Advance Biostatistics

62
SPSS
Statistics Advance Biostatistics

5- Results interpretation and conclusion


Multiple comparisons table shows that:
1- There is a significant difference of QoL between
Gaza Town and Middle Areas.
2- There is no a significant difference of QoL between
Gaza Town and North of Gaza, Khan Yunis, Rafah
areas.
3- There is no a significant difference of QoL between
North of Gaza and Middle Areas, Khan Yunis,
Rafah areas.
4- There is no a significant difference of QoL between
Middle Areas and Khan Yunis, Rafah areas.
5- There is no a significant difference of QoL between
Khan Yunis and Rafah.

63
SPSS
Statistics Advance Biostatistics

Multiple Comparisons
Dependent Variable: Total QoL
Dunnett C
95% Confidence
Mean
(I) (J) Std. Interval
Difference
Residence Residence Error Lower Upper
(I-J)
Bound Bound
Gaza Town -5.31998- 3.35225 -14.7050- 4.0650
Middle
North of 5.34493 3.31932 -3.9493- 14.6392
Areas
Gaza
Khan Yunis -.35057- 3.14501 -9.1590- 8.4579
Rafah -2.62996- 4.05708 -13.9805- 8.7205
North of
5.31998 3.35225 -4.0650- 14.7050
Gaza
Gaza Middle
10.66491* 2.89904 2.5601 18.7698
Town Areas
Khan Yunis 4.96942 2.69771 -2.5734- 12.5123
Rafah 2.69002 3.72107 -7.7090- 13.0890
North of
-5.34493- 3.31932 -14.6392- 3.9493
Gaza
Middle
Gaza Town -10.66491-* 2.89904 -18.7698- -2.5601-
Areas
Khan Yunis -5.69550- 2.65668 -13.1251- 1.7341
Rafah -7.97489- 3.69144 -18.2921- 2.3423
North of
.35057 3.14501 -8.4579- 9.1590
Gaza
Khan Gaza Town -4.96942- 2.69771 -12.5123- 2.5734
Yunis Middle
5.69550 2.65668 -1.7341- 13.1251
Areas
Rafah -2.27940- 3.53552 -12.1612- 7.6024
North of
2.62996 4.05708 -8.7205- 13.9805
Gaza
Gaza Town -2.69002- 3.72107 -13.0890- 7.7090
Rafah
Middle
7.97489 3.69144 -2.3423- 18.2921
Areas
Khan Yunis 2.27940 3.53552 -7.6024- 12.1612
*. The mean difference is significant at the 0.05 level.

64
SPSS
Statistics Advance Biostatistics

5- Final Interpretation of ANOVA and post hoc analysis


i) One -Way ANOVA test is significant (p = 0.004
< 0.05) where it suggests that at least one pair of
mean QoL between the areas was significantly
different.
ii) Subsequent post hoc test (Dunnett C) suggests
that the means of QoL between Gaza Town and
Middle Areas was significantly different.

6- Results presentation
Table: Mean difference of QoL among the five areas in
the Gaza Strip
Residential
Mean (SD) t statistics (df) p value
areas
North of Gaza 95.81 (21.86)
Gaza Town 101.13 (18.10)
Middle Areas 90.46 (17.51) 2.831 (2, 366) 0.025
Ghan Yunis 96.16 (14.81)
Rafah 95.81 (27.09)
Post hoc analysis: the mean difference of QoL is significant
between Gaza Town and Middle Areas only.

4- Multi-Factorial ANOVA
Multi-Factorial ANOVA is used to define the effects of
more than two independent variables on numerical variable.
A factorial ANOVA compares means across two or more

65
SPSS
Statistics Advance Biostatistics

independent variables. Multi-Factorial has one independent


variable that splits the sample into two or more groups, it
has two or more independent variables that split the sample
in four or more groups.

Assumptions
6. Dependent variable is numerical.
7. The groups are independent of each other.
8. Random sample.
9. Homogeneity of variances, the groups should come
from population
with equal variances.
10. The observation differences of each group are
normally distributed, if sample size is > 30, apply
Central Limit Theorem.

Steps of analysis
1-Hypothesis and question statement
Null hypothesis: There are no significant effects of variables
(Gender, Citizenship, Residency, Income and Age) on
stress among children in the Gaza Strip.
Research question: are there significant effects of Gender,
Citizenship, Residency, Income and Age variables on stress
of children in the Gaza Strip.

66
SPSS
Statistics Advance Biostatistics

2- Checking assumptions
iv) Random sample (By study design and sampling
method).
v) The groups are independent to each other (By study
design).
vi) Normality assumption: since the sample size more
than 30 for each group, consider Central Limit
Theorem. But if checking normality is needed.

3- Perform Multi-Factorial ANOVA in SPSS


1. Analyze
2. Compare means
3. General Linear Model
4. Univariate
5. In the box of Univariate, transfer dependent variable
(Mean Stress) to the Dependent variable,
independent variable (Gender, Citizenship,
Residency, Income and Age) to Fixed factor(s).
6. Click on Model in the same box, Univariate: Model,
dialog box is appeared, select Custom, and then
transfer variables: Gender, Citizenship, Residency,
Income and Age from Factors & Covariates to
Model. Be sure that Type is Main effects.
7. Click continue, then OK.
8. In the box of Univariate, click on options,
Univariate: Options dialog box is appeared, transfer
variables from Factor (s) and factor interaction box
to Display Means for box.

67
SPSS
Statistics Advance Biostatistics

7- In addition, select Descriptive statistics and


Homogeneity tests from the Univariate: Options box,
then click continue.

68
SPSS
Statistics Advance Biostatistics

69
SPSS
Statistics Advance Biostatistics

4- Results interpretation and conclusion


Descriptive statistics in the table below display means and
standard deviations of the five independent variables
(Gender, Citizenship, Residency, Income and Age).

70
SPSS
Statistics Advance Biostatistics

N Std.
Variable Mean Std. Error
Deviation

Male 157 1.058 0.312 0.024


Gender
Female 147 0.959 0.405 0.033
Citizen 213 0.936 0.304 0.020
Citizenship
Regugee 91 1.182 0.426 0.044
North of Gaza 58 1.132 0.453 0.059
Gaza Area 107 0.879 0.347 0.033
Residency Middle Area 75 0.953 0.172 0.019
Ghaniones Area 51 1.055 0.162 0.022
Rafah Area 14 1.648 0.490 0.131
More than 2000 71 0.829 0.389 0.046
1000-2000 77 0.978 0.170 0.019
Income
500-100 51 0.958 0.172 0.024
Less than 500 98 1.193 0.450 0.045
13 23 0.531 0.545 0.113
14 179 0.991 0.236 0.017
Age
15 76 0.995 0.168 0.019
16 25 1.640 0.500 0.100

The table below indicates that, there are significant effects


of Gender, Citizenship, Residency and Age on the total
mean of stress among children of Gaza. The R2 is 0.517,
which mean that 51.7 % of mean stress variation is
explained by Gender, Citizenship, Residency and Age
variables.

71
SPSS
Statistics Advance Biostatistics

ests of Between-Subjects Effects


Dependent Variable: MeanStress
Source Type III Sum of df Mean Square F Sig.
Squares
Corrected Model 20.607a 12 1.717 25.293 .000
Intercept 125.590 1 125.590 1849.729 .000
Age 4.942 3 1.647 24.264 .000
Residency 2.649 4 .662 9.755 .000
Income .312 3 .104 1.533 .206
Citizenship 1.062 1 1.062 15.634 .000
Gender .544 1 .544 8.005 .005
Error 19.215 283 .068
Total 342.178 296
Corrected Total 39.822 295
a. R Squared =.517 (Adjusted R Squared =.497)

The results showed:


There is a significant main effect for Age variable on the
mean stress of children, F = 24.264, p = < 0.001.
There is a significant main effect for Gender variable on the
mean stress of children, F = 8.005, p = 0.05.
There is a significant main effect for Citizenship variable on
the mean stress of children, F = 15.634, p = < 0.001.
There is a significant main effect for Residency variable on
the mean stress of children, F = 9.755, p = < 0.001.

Conclusion: there are significant effects of Gender,


Citizenship, Residency, and Age variables on stress of
children in the Gaza Strip. 51.7 % of mean stress variation
is explained by these variables.

72
SPSS
Statistics Advance Biostatistics

Result presentation

Table: The effects of Gender, Citizenship, Residency,


and Age variables on stress of children in the Gaza
Strip.

F Statistics P value
Adjusted Mean (df)
Factors
(95%CI)

Male 1.16 (0.03, 1.23)


Gender 1.05 )0.03, 1.11) 8.005 (1, 283) 0.005
Female

Citizen 1.02 (0.96, 1.08)


15.634 (1, < 0.001
Citizenship 1.19 (0.034, 1.26) 283)
Refugee

North of 1.12 (1.040, 1.203)


Gaza
Gaza Area 1.01 (0.032, 1.078)
Middle 0.88 (0.041, 0.963)
Residency Area 9.755 (4, 283) < 0.001
Ghaniones 1.10 (0.049, 1.202)
Area
Rafah 1.42 (0.084, 1.584)
Area
13 0.72 (0.064, 0.847)
14 1.10 (0.027, 1.160) 24.264 (3, < 0.001
Age
15 1.10 (0.038, 1.179) 283)
16 1.50 (0.061, 1.623)

73
SPSS
Statistics Advance Biostatistics

74
SPSS
Statistics Advance Biostatistics

Chapter 3
Non Parametric
Statistics

75
SPSS
Statistics Advance Biostatistics

Non Parametric tests


As mentioned previously, parametric tests are appropriate
for small samples that are not normally distributed, which
have less power and wider confidence interval

1-Mann-Whitney test
This test is used to compare the median of a numerical
variable of two independent samples, it is equivalent to a
two independent samples t test. The test is based on ranks
of observation.

Assumptions
1. Random sample.
2. Independent sample.
3. The dependent variable is a numerical.

Steps of analysis
1. Hypothesis and question statement

Null hypothesis: There is no median difference in body


weight of males and females.

Research question: Is the median of body weight different


between males and females.

76
SPSS
Statistics Advance Biostatistics

2- Checking assumptions
i) Random sample (By study design and sampling
method)
ii) The two groups are independent to each other (By
study design)
iii) Normality assumption: since the sample size less
than 30 for each group, consider Central Limit
Theorem. But if checking normality is needed, the
steps are as follows:
1- Analyze
2- Descriptive Statistics
3- Explore
4- In the box of explore, transfer dependent
variable (body weight) to the dependent list and
independent variable (gender) to factor list.
5- In the box of explore select Both.

Histogram show irregular distribution of body weight for


both male and female and sample size is less than 30.

77
SPSS
Statistics Advance Biostatistics

78
SPSS
Statistics Advance Biostatistics

Descriptive
Std.
Gender Variable Statistic
Error
Mean 76.5455 1.98776
95% Lower
72.4117
Confidence Bound
Interval for Upper
80.6792
Mean Bound
5% Trimmed Mean 76.3333
Median 75.0000
Male Variance 86.926
Std. Deviation 9.32343
Minimum 62.00
Maximum 95.00
Range 33.00
Interquartile Range 17.25
Skewness .478 .491
Body Kurtosis -.877- .953
Weight Mean 81.1667 1.80459
95% Lower
77.3593
Confidence Bound
Interval for Upper
84.9740
Mean Bound
5% Trimmed Mean 81.1296
Median 78.0000
Female Variance 58.618
Std. Deviation 7.65622
Minimum 68.00
Maximum 95.00
Range 27.00
Interquartile Range 10.25
Skewness .322 .536
Kurtosis -.484- 1.038

79
SPSS
Statistics Advance Biostatistics

3- Perform Mann-Whitney test in SPSS

1- Analyze
2- Nonparametric test
3- Legacy Dialogs
4- 2-independent Samples
5- In the box of Two-Independent-Samples Tests, transfer
body weight to Test Variable List and Gender to
Grouping Variable. Select Mann-Whitney U.
6- After transferring Gender to Grouping Variable, click
on Define Groups, in the box of Two Independent Sa…..
inter code 1 of males to Group 1 and code 2 to Group 2.
7- Click Continue, then OK.

80
SPSS
Statistics Advance Biostatistics

81
SPSS
Statistics Advance Biostatistics

4- Results interpretation and conclusion

Table of Ranks shows the descriptive statistics of rank


transformation of data. All data are transformed into rank
number, thus it is not true observed value of the data.

Table of Test Statistics shows the result of inferential


statistics. The p value = 0.091, which is > 0.05.
Consequently, there is no statistically significant difference
of medians between males and females. Therefore, null
hypothesis is accepted, "There is no median difference in
body weight of males and females".

Ranks
Gender Mean Sum of
N
Variable Rank Ranks
Body Male 22 17.68 389.00
Weight Female 18 23.94 431.00
Total 40

82
SPSS
Statistics Advance Biostatistics

Test Statistics a
Body Weight
Mann-Whitney U 136.000
Wilcoxon W 389.000
Z -1.692-
Asymp. Sig. (2-tailed) .091
Exact Sig. [2*(1-tailed
.095b
Sig.)]
a. Grouping Variable: Gender Variable
b. Not corrected for ties.

5- Result presentation
Table: The median of body weight of males and females
Variable Meadian (IQR) Z statistics *p value
Gender Males Females
75.00 78.00 -1.692 0.091
(17.25) (10.25)
* Mann-Whitney test

2-Wilcoxon Signed-Ranks Test


Wilcoxon signed-ranks test applies to two-sample designs
involving repeated measures, matched pairs, or "pre" and
"post" measures. This test is equivalent to paired t test.

Assumptions
1- The two groups are dependent.
2- The variables are continuous variable.
3- Random samples.
83
SPSS
Statistics Advance Biostatistics

Steps of analysis
1. Hypothesis and question statement
Null hypothesis: There is no median difference in body
weight in pre and post tests.
Research question: Is the median of body weight
different in pre and post tests?.

2. Checking assumptions
i) Random sample (By study design and sampling
method)
ii) The two groups are dependent to each other (By
study design)
iii) Normality assumption: since the sample size
less than 30 for each group, consider Central
Limit Theorem.

3. Perform Wilcoxon Signed-Ranks Test t in SPSS


1- Analyze
2- Nonparametric test
3- Legacy Dialogs
4- 2 Related Samples
5- In the box of Two-R-Samples Tests, transfer pre test
to Variable1 and post test to Variable2. Select
Wilcoxon and click on Option.

84
SPSS
Statistics Advance Biostatistics

6- In the box of Two-R-Samples Tests Options, select


Quartiles, then click on continue.
7- Click, then OK.

85
SPSS
Statistics Advance Biostatistics

86
SPSS
Statistics Advance Biostatistics

4- Results interpretation and conclusion


Descriptive statistics in the table below display the medians
of pre and post tests.
The rank table is the result of descriptive statistics of the
rank, it shows the positive and negative ranks of pre and
post score.
Table of Test Statistics shows the result of inferential
statistics. The p value <0.001. Consequently, there is a
statistically significant difference of medians between
males and females. Therefore, null hypothesis is rejected, "
There is no median difference in body weight in pre and
post tests".
Descriptive Statistics
N Percentiles
25th 50th (Median) 75th
Pre Test 25 78.5000 83.0000 90.5000
Post Test 25 75.5000 78.0000 85.0000

Ranks
N Mean Sum of
Rank Ranks
Post Test - Pre Negative 24a 12.50 300.00
Test Ranks
Positive 0b .00 .00
Ranks
Ties 1c
Total 25
a. Post Test < Pre Test
b. Post Test > Pre Test
c. Post Test = Pre Test

87
SPSS
Statistics Advance Biostatistics

Test Statisticsa
Post Test - Pre Test
Z -4.297-b
Asymp. Sig. (2-tailed) .000
a. Wilcoxon Signed Ranks Test
b. Based on positive ranks.

5- Result presentation
Table: The median of body weight of pre and post tests
Variable Meadian (IQR) Z statistics *p value
Pre Post
83.00 78.00 -4.297 <0.001
(78,90) (75.85)
* Wilcoxon Signed Ranks Test

3-Kruskal–Wallis test
It is the non-parametric analogue of a one-way ANOVA
that compares the medians of three or more groups.

Assumptions
1. The groups are independent.
2. The dependent variable is continuous variable.
3. Random samples.

88
SPSS
Statistics Advance Biostatistics

Steps of analysis

1. Hypothesis and question statement


Null hypothesis: There is no difference in median of
QoL between areas.
Research question: Is the median of QoL different
between areas?.

2. Checking assumptions
i) Random sample (By study design and sampling
method)
ii) The groups are independent to each other (By
study design)
iii) Normality assumption: checking normality is
needed, the steps are as follows:
1- Analyze
2- Descriptive Statistics
3- Explore
4- In the box of explore, transfer dependent
variable (Total QoL) to the Dependent List
and independent variable (Residence) to
Factor List.
5- Click on Plots and choose Histogram from
the new box.
6- Click OK

89
SPSS
Statistics Advance Biostatistics

As shown by histograms, the justification to use a non


parametric test is non normal distributed.

90
SPSS
Statistics Advance Biostatistics

From the descriptive results, the Gaza Town have the


median QoL (100.5),
North of Gaza (89.5) and Middle Areas.

Descriptive
Std.
Residence Statistic
Error
Mean 98.4800 3.42789
Lower
91.5914
95% Confidence Bound
Interval for Mean Upper
105.3686
Bound
5% Trimmed Mean 97.1333
Median 89.5000
North of Variance 587.520
Total
Gaza
QoL Std. Deviation 24.23881
Minimum 70.00
Maximum 150.00
Range 80.00
Interquartile Range 21.25
Skewness 1.252 .337
Kurtosis .223 .662
Mean 103.2200 2.96968

91
SPSS
Statistics Advance Biostatistics

Lower
97.2522
95% Confidence Bound
Interval for Mean Upper
109.1878
Bound
5% Trimmed Mean 102.4000
Median 100.5000
Gaza Variance 440.951
Town Std. Deviation 20.99882
Minimum 70.00
Maximum 150.00
Range 80.00
Interquartile Range 20.50
Skewness .890 .337
Kurtosis .632 .662
Mean 88.3800 2.62700
Lower
83.1008
95% Confidence Bound
Interval for Mean Upper
93.6592
Bound
5% Trimmed Mean 88.6556
Median 87.5000
Middle Variance 345.057
Areas
Std. Deviation 18.57570
Minimum 49.00
Maximum 121.00
Range 72.00
Interquartile Range 29.25
Skewness -.201- .337
Kurtosis -.594- .662

3. Perform Kruskal Wallis Test in SPSS


1- Analyze
2- Nonparametric test
3- Legacy Dialogs
4- K Independent Samples

92
SPSS
Statistics Advance Biostatistics

5- In the box of Test of Several Independent Samples


Tests, transfer (Total QoL) to the Test Variable List
and independent variable (Residence) to Grouping
Variable.
6- Click on Define Range to specify three levels of
residence variable, a new dialog box is appeared, then
inter 1 as coded for Minimum and 3 as coded for
Maximum.
7- Click continue, then OK.

93
SPSS
Statistics Advance Biostatistics

4- Results interpretation and conclusion


Table of Ranks shows the descriptive statistics of rank
transformation of data. All data are transformed into rank
number; thus, it is not true observed value of the data.

94
SPSS
Statistics Advance Biostatistics

Table of Test Statistics shows the result of inferential


statistics. The p value = 0.003, which is < 0.05.
Consequently, there is a statistically significant difference
of medians between areas. Therefore, null hypothesis is
rejected, " There is no difference in median of QoL between
areas".

Ranks
Residence N Mean Rank
North of Gaza 50 72.86
Gaza Town 50 91.31
Total QoL
Middle Areas 50 62.33
Total 150

Test Statistics a,b


Total QoL
Chi-Square 11.408
df 2
Asymp. Sig. 0.003
a. Kruskal Wallis Test
b. Grouping Variable: Residence

Post Hoc Analysis


Since the p value < 0.05, post hoc analysis is needed to
determine which pair of groups is different. In Kruskal
Wallis there is no post hoc, therefore, a comparison by pairs
separately by Mann-Whitney test is needed.

95
SPSS
Statistics Advance Biostatistics

The comparisons that are needed are:


1- North of Gaza with Gaza Town
2- North of Gaza with Middle Areas
3- Gaza Town with Middle Areas

After performed Mann-Whitney test 3 times, a comparison


of the p value with the corrected alpha, 05/number of pairs
= 0.0167.

1- North of Gaza with Gaza Town


Table of Test Statistics below shows the p value =0.024.
This p value more than alpha 0.0167. Consequently, there
is no a statistically significant difference of median QoL
between North of Gaza and Gaza Town.

Test Statisticsa
Total QoL
Mann-Whitney U 922.000
Wilcoxon W 2197.000
Z -2.262-
Asymp. Sig. (2-tailed) .024
a. Grouping Variable: Residence

2- North of Gaza with Middle Areas


Table of Test Statistics below shows the p value =0.176.
This p value more than alpha 0.0167. Consequently, there
is no a statistically significant difference of median QoL
between North of Gaza and Middle Areas.
96
SPSS
Statistics Advance Biostatistics

Test Statisticsa
Total QoL
Mann-Whitney U 1054.000
Wilcoxon W 2329.000
Z -1.352-
Asymp. Sig. (2-tailed) .176
a. Grouping Variable: Residence

3- Gaza Town with Middle Areas


Table of Test Statistics below shows the p value =0.001.
This p value less than alpha 0.0167. Consequently, there is
a statistically significant difference of median QoL between
Gaza Town with Middle Areas.
Test Statisticsa
Total QoL
Mann-Whitney U 787.500
Wilcoxon W 2062.500
Z -3.189-
Asymp. Sig. (2-tailed) .001
a. Grouping Variable: Residence

5- Result presentation
Table: The median of QoL according to residence in the
Gaza Strip
Meadian Z statistics *p value
Residence
(IQR)
North of Gaza 89.5 (21.25)
Gaza town 100.5 (20.50 11.408 (2) 0.003
Middle Areas 87.5 (29.25)

97
SPSS
Statistics Advance Biostatistics

Categorical Data Analysis

1- Chi-Square Test
The chi-square test is a statistical test used to examine
differences with categorical variables.
Assumptions
1. The groups are independent of each other.
2. Random sample.
3. Expected frequency of less than 5 is less than 20%of
the cells.

Steps of analysis
1- Hypothesis and question statement
Null hypothesis: Gender is not associated with
satisfaction level of QoL.
Research question: Is Gender associated with
satisfaction level of QoL?

2- Checking assumptions
i) Random sample (By study design and
sampling method)
ii) The groups are independent to each
other (By study design)
iii) The expected frequency is checked
through doing the analysis.

98
SPSS
Statistics Advance Biostatistics

3- Perform Chi-Square Test in SPSS


1- Analyze
2- Descriptive Statistics
3- Crosstab
4- In the crosstabs box, transfer dependent variable
(Level of Satisfaction) to Row(s), and
independent variable (Gender) to Column(s).
5- In the crosstabs box, click on Statistics. From
Crosstabs: Statistics, select Chi-Square, then
click on continue.
6- In the crosstabs box, click on Cells. From
Crosstabs: Cell Display, select Expected and
Row, then click on continue.
7- Click on OK

99
SPSS
Statistics Advance Biostatistics

100
SPSS
Statistics Advance Biostatistics

4- Results interpretation and conclusion


The crosstabulation table displays the observed and
expected count of each cell. In the row of expected count,
there are no cells that have expected count less than 5. Thus,
the assumption of expected frequency less than 5 in less
than 20% of the cell is met.

Gender * Level of Satisfaction Crosstabulation


Level of Satisfaction
Total
Satisfied Unsatisfied
Count 141 38 179
Female Expected Count 101.3 77.7 179.0
% within Gender 78.8% 21.2% 100.0%
Gender
Count 69 123 192
Male Expected Count 108.7 83.3 192.0
% within Gender 35.9% 64.1% 100.0%
Count 210 161 371
Total Expected Count 210.0 161.0 371.0
% within Gender 56.6% 43.4% 100.0%

101
SPSS
Statistics Advance Biostatistics

Table of Chi-Square Tests shows the p value of Pearson


Chi-Square < 0.001, which is < 0.05. Consequently, there is
a significant association between Gender and satisfaction
level of QoL. Therefore, null hypothesis is rejected, "
Gender is not associated with satisfaction level of QoL ".

Chi-Square Tests
Asymp.
Exact Sig. Exact Sig.
Value df Sig. (2-
(2-sided) (1-sided)
sided)
Pearson Chi-
69.191a 1 .000
Square
Continuity
67.458 1 .000
Correctionb
Likelihood Ratio 71.973 1 .000
Fisher's Exact Test .000 .000
Linear-by-Linear
69.004 1 .000
Association
N of Valid Cases 371
a. 0 cells (0.0%) have expected count less than 5. The minimum
expected count is 77.68.
b. Computed only for a 2x2 table

5- Result presentation
Table: Association between gender and satisfaction level
of QoL
Variable Satisfaction level of X2 (df) p value
QoL
Gender Satisfied Unsatisfied
Male 69(35.9) 38(21.2) 69.191 < 0.001
Female 141(78.8) 123(64.1)
*Pearson Chi-Square

102
SPSS
Statistics Advance Biostatistics

2- Fisher's Exact Test


The Fisher's exact test is a statistical test used to examine
the association between two categorical variables from a
small sample size. The test is used when the assumption for
Chi-Square test is not met, when expected count of cell <5
is more than 20%.

Assumptions
1. The groups are independent of each other.
2. Random sample.
3. Expected frequency of less than 5 is more than
20% of the cells.

Steps of analysis
1- Hypothesis and question statement
Null hypothesis: Gender variable is not associated with
Anemia.

Research question: Is Gender variable associated with


Anemia?

2- Checking assumptions
i) Random sample (By study design and sampling
method)

103
SPSS
Statistics Advance Biostatistics

ii) The groups are independent to each other (By


study design)
iii)The expected frequency is checked through doing
the analysis.

3- Perform Fisher's Exact Test in SPSS


1- Analyze
2- Descriptive Statistics
3- Crosstab
4- In the crosstabs box, transfer Gender variable to
Row(s), and Anemia to Column(s).
5- In the crosstabs box, click on Statistics. From
Crosstabs: Statistics, select Chi-Square, then
click on continue.
8- In the crosstabs box, click on Cells. From
Crosstabs: Cell Display, select Expected and
Row, then click on continue.
9- Click on OK

104
SPSS
Statistics Advance Biostatistics

105
SPSS
Statistics Advance Biostatistics

106
SPSS
Statistics Advance Biostatistics

4- Results interpretation and conclusion


The crosstabulation table displays the observed and
expected count of each cell. In the row of expected count,
there are 3 cells that have expected count less than 5. Thus,
the assumption of expected frequency less than 5 in less
than 20% of the cell is met.
Gender * Anemia Crosstabulation
Anemia
Total
Yes No
Count 2 8 10
Male Expected Count 4.7 5.3 10.0
Gender % within Gender 20.0% 80.0% 100.0%
Count 5 0 5
Female
Expected Count 2.3 2.7 5.0

107
SPSS
Statistics Advance Biostatistics

% within Gender 100.0% 0.0% 100.0%


Count 7 8 15
Total Expected Count 7.0 8.0 15.0
% within Gender 46.7% 53.3% 100.0%

Table of Chi-Square Tests shows the p value of Fisher's


exact < 0.001, which is < 0.05. Consequently, there is a
significant association between Gender and Anemia.
Therefore, null hypothesis is rejected, "Gender variable is
not associated with Anemia".

Chi-Square Tests
Asymp. Exact Exact
Value df Sig. (2- Sig. (2- Sig. (1-
sided) sided) sided)
Pearson Chi-
8.571a 1 .003
Square
Continuity
5.658 1 .017
Correctionb
Likelihood Ratio 10.720 1 .001
Fisher's Exact
.007 .007
Test
Linear-by-Linear
8.000 1 .005
Association
N of Valid Cases 15
a. 3 cells (75.0%) have expected count less than 5. The minimum
expected count is 2.33.
b. Computed only for a 2x2 table

108
SPSS
Statistics Advance Biostatistics

5- Result presentation
Table: Association between gender variable and Anemia
Variable Anemia p value
Gender Anemia Non Anemia
Male 2(20.0) 8(80.0)
0.007
Female 5(100.0) 0(0)
*Fisher's Exact Test

109
SPSS
Statistics Advance Biostatistics

110
SPSS
Statistics Advance Biostatistics

Chapter 4
Correlation and
Regression

111
SPSS
Statistics Advance Biostatistics

1- Correlation
Correlation is used to determine the association between
two numerical variables. The measure of association in
correlation model indicates measure of strength.
Correlation can vary from +1 to -1. Values close to +1
indicate a high-degree of positive correlation, and values
close to -1 indicate a high degree of negative correlation.
Values close to zero indicate poor correlation of either kind,
and 0 indicates no correlation at all.

Assumptions
1. The data is independent of each other.
2. Random sample.
3. The variables are normally distributed.
4. The relationship between the two variables is
linear.

Steps of analysis
1- Hypothesis and question statement
Null hypothesis: There is no correlation between
physical health of QoL in the Gaza Strip and
psychological health.

112
SPSS
Statistics Advance Biostatistics

Research question: Is physical health of QoL in the


Gaza Strip correlates with psychological health?

2- Checking assumptions
i) Random sample (By study design and sampling
method)
ii) The data is independent of each other.
iii)Linearity assumption can be checked as follows:
1- Graphs
2- Legacy Dialogs
3- Scatter Dot
4- In the Scatter/Dot box, select Simple
Scatter, then Define
5- In the Simple Scatterplot, transfer
Physical Health to Y Axis and
Psychological Health to X Axis. then
click on OK.

113
SPSS
Statistics Advance Biostatistics

As shown in the figure of linearity, the


relationship between the two variable is linear.

114
SPSS
Statistics Advance Biostatistics

115
SPSS
Statistics Advance Biostatistics

iv) Normality assumption: the checking


normality can be checked as follows:
1- Analyze
2- Descriptive Statistics
3- Explore
4- In the box of explore, transfer physical
Health and psychological health to the
dependent list.
5- Click on Plots and choose Histogram
from the new box.
6- Click on OK.

116
SPSS
Statistics Advance Biostatistics

117
SPSS
Statistics Advance Biostatistics

118
SPSS
Statistics Advance Biostatistics

3-Perform Correlation Test in SPSS


1. Analyze
2. Correlate
3. Bivariate
4. In the Bivariate Correlation box, transfer
physical health and psychological health to
Variable. Select Pearson.
5. Click OK

119
SPSS
Statistics Advance Biostatistics

4- Results interpretation and conclusion


The correlation table shows the correlation coefficient
which is the strength of the relationship, and shows p value.
The correlation coefficient is 0.951, and p value is <0.001.
Thus, reject the null hypothesis. In conclusion, there is a
significant correlation between physical health and
psychological health of QoL in the Gaza Strip.

Correlations
Physical
Psychological
Health
Pearson Correlation 1 .951**
Physical Health Sig. (2-tailed) .000
N 371 371
Pearson Correlation .951** 1
Psychological Sig. (2-tailed) .000
N 371 371
**. Correlation is significant at the 0.01 level (2-tailed).

120
SPSS
Statistics Advance Biostatistics

5- Result presentation
Table: Correlation between physical health and
psychological health of QoL
Psychological health
r P value*
Physical health
0.951 <0.001

*Pearson Correlation
Note: if one or both variables are not normally distributed,
Pearson correlation must be used. In analysis with SPSS,
we only choose ( Pearson) from Bivariate Correlations box
and the same steps follow such as in the analysis by Pearson
correlation.

2- Linear Regression
Regression analysis is used to determine the relationship
between a dependent variable and one or more independent
variables (which are also called predictor or explanatory
variables). Linear regression explores relationships that can
be readily described by straight lines.

Linear regression is used to:


1- Determine the linear association between a
dependent and the independent variables.

121
SPSS
Statistics Advance Biostatistics

2- Define the change in the value of the dependent


variable based on the change on the independent
variable.
3- Predict the value of the dependent variable for a
specific value of the independent variable.
4- Control for confounding effect of the independent
variable.

2. A: Simple linear regression


Simple linear regression is used to determine the
relationship between numerical variables. In this type of
regression, there is one independent variable and one
dependent variable.

Assumptions
1- Linear relationship between the two variables.
2- Fit least square line
3- Checking residual diagnostics:
3.1- Linearity of the numerical independent.
3.2- Normality of residual.
3.3- Equal variance of residual.
4- Random samples.

122
SPSS
Statistics Advance Biostatistics

Steps of analysis
1- Hypothesis and question statement
Null hypothesis: There is no linear relationship between
family monthly income and QoL in the Gaza Strip.
Research question: Is the income variable a predictor
factor of QoL in the Gaza Strip?

2- Checking assumptions
i) Random sample (By study design and sampling
method)
ii) Linearity assumption can be checked as
follows:
1- Graphs
2- Legacy Dialogs
3- Scatter Dot
4- In the Scatter/Dot box, select Simple Scatter,
then Define
5- In the Simple Scatterplot, transfer Total QoL
to Y Axis and Monthly Family Income to X
Axis. then click on OK.

123
SPSS
Statistics Advance Biostatistics

124
SPSS
Statistics Advance Biostatistics

As shown in the figure of linearity, the relationship between


the two variable is linear and looks acceptable.

125
SPSS
Statistics Advance Biostatistics

iii)Fit least square line


1- Put the indicator in the graph and double
click.
2- From the Chart Editor box, click on Add Fit
Line at Total, then select Linear from the
Properties box.

The scatter plot shows a regression line, least square


line is the best linear (straight) which performed by
using least square regression.

126
SPSS
Statistics Advance Biostatistics

127
SPSS
Statistics Advance Biostatistics

128
SPSS
Statistics Advance Biostatistics

3.1- Linearity of the numerical independent variable


This assumption can be checked by blotting scatter of
unstandardized residual values of the independent variable.

Residual: residual is the balance of observed value minus


the predicted value of the dependent variable.
1- Analyze
2- Regression
3- Linear
4- In the Linear Regression box, transfer Total QoL to
Dependent, and Monthly Family Income to
Independent(s), then select Save.
5- In the Linear Regression: Save box, select
Unstandardized from the column labeled Predicted

129
SPSS
Statistics Advance Biostatistics

Values and Unstandardized from the column labeled


residuals. Then click on continue.
6- Click on OK.
New two variables are appeared; Unstandardized
Predicted Value and Unstandardized Residual.
To define the linearity, plot scatter of the new
variable is needed. Graphs, Legacy Dialogs, Scatter
Dot. In the Scatter/Dot box, select Simple Scatter,
then Define.
7- In the Simple Scatter Plot, transfer Unstandardized
Residual to Y Axis and Monthly Family Income to
X Axis. Then click on OK.

The scatter plot shows no pattern of relationship between


residual and Monthly Family Income variable,
consequently, the assumption of linearity is met.

130
SPSS
Statistics Advance Biostatistics

131
SPSS
Statistics Advance Biostatistics

132
SPSS
Statistics Advance Biostatistics

133
SPSS
Statistics Advance Biostatistics

3.2- Normality of residual


Normality can be checked by plotting a histogram of the
residual.
1- Graphs
2- Legacy Dialogs
3- Histogram
4- In the Histogram box, transfer Unstandardized
Residual to Variable.
5- Click on OK.

134
SPSS
Statistics Advance Biostatistics

The histogram of residual shows normal


distribution, consequently the assumption of
normality is met.

135
SPSS
Statistics Advance Biostatistics

136
SPSS
Statistics Advance Biostatistics

3.3- Equal variance of residual


Normality can be checked by scatterplot of Unstandardized
Residual versus Unstandardized Predicted value.
1- Graphs
2- Legacy Dialogs
3- Scatter/Dot
4- In the Scatter/Dot box, select Simple Scatter, then
Define
5- In the Simple Scatterplot, transfer Unstandardized
Residual to Y Axis and Unstandardized Predicted to
X Axis.
6- Click on OK.

The scatter plot shows no pattern of


relationship between residual and predicted
values, consequently, the assumption of equal
variances is met.

137
SPSS
Statistics Advance Biostatistics

138
SPSS
Statistics Advance Biostatistics

139
SPSS
Statistics Advance Biostatistics

3- Perform Linear Regression Test in SPSS


1- Analyze
2- Regression
3- Linear
4- In the Linear Regression box, transfer Total QoL
to Dependent, and Monthly Family Income to
Independent(s), then select Statistics.
5- In the Linear Regression: Statistics box, select
Confidence intervals. Then click on continue.
6- Click on OK.

140
SPSS
Statistics Advance Biostatistics

141
SPSS
Statistics Advance Biostatistics

4- Results interpretation and conclusion


The model summary table shows the correlation coefficient
and correlation determination. 69.0% of the variation in the
Total QoL is explained by the Monthly Family Income.

Model Summary
Adjusted R Std. Error of
Model R R Square
Square the Estimate
1 .831a .690 .689 8.81543
a. Predictors: (Constant), Monthly Family Income

The ANOVA table shows the p value of the regression


(p<0.001). There is a significant linear relationship
between monthly family income and the total QoL.
ANOVAa
Sum of Mean
Model df F Sig.
Squares Square
Regression 54781.185 1 54781.185 704.928 .000b
1 Residual 24556.906 316 77.712
Total 79338.091 317
a. Dependent Variable: Total QoL
b. Predictors: (Constant), Monthly Family Income

The coefficient table below shows the equation of the


regression line, p value of the regression, the slope of the
line and the intercept at y axis.

142
SPSS
Statistics Advance Biostatistics

Coefficientsa
95.0%
Unstandardized Standardized Confidence
Coefficients Coefficients Interval for
Model t Sig.
B
Std. Lower Upper
B Beta
Error Bound Bound
(Constant) 47.476 1.867 25.427 .000 43.802 51.149

1 Monthly
Family .019 .001 .831 26.550 0.000 .017 .020
Income

a. Dependent Variable: Total QoL

1- There is a significant linear relationship between


monthly family income and the total QoL.
2- 69.0% of the variation in the Total QoL is explained by
the Monthly Family Income
3- The slope of the regression line is 0.019 with y axis
intercept at 47.476.
4- Increase 1 unit of monthly family income will increase
total QoL with 0.019 unit.
5- The regression equation: Total QoL = 47.476 + 0.019
monthly family income.

In conclusion, p<0.001, Thus, reject the null hypothesis.


There is linear relationship between family monthly income
and QoL in the Gaza Strip. Family monthly income is a
significant predictor factor for QoL.

143
SPSS
Statistics Advance Biostatistics

5- Result presentation of linear regression


Table: Relationship between family monthly income
and total QoL
t
Family b (95% CI) p value* r2
statistics
monthly
0.019(.017,
income 26.550 <0.001 .690
0.020)
*Simple Linear Regression

2. B: Multiple Linear Regression Analysis


Multiple linear regression analysis is an extension of simple
linear regression analysis, used to assess the association
between two or more independent variables (numerical or
categorical) and a single continuous dependent variable. It
also used to identify the relationship strength of the
independents variable separately and to determine predictor
factors of a continuous dependent variable by controlling
possible confounders.
Assumptions
1- Linear relationship between the two variables
(dependent and independent).
2- Checking residual diagnostics:
3.1- Linearity of the numerical independent.
3.2- Normality of residual.
3.3- Equal variance of residual.
3- Random samples.

144
SPSS
Statistics Advance Biostatistics

Steps of analysis

1- Hypothesis and objective

Research question: What are the predictor factors of


QoL in the Gaza Strip?

Research objective: To determine the predictor factors


of QoL in the Gaza Strip.

2- Descriptive statistics of variables

1- Analyze
2- Descriptive
3- Frequencies
4- In the frequencies box, transfer variables of Gender,
Age, Sick and Monthly Family Income to
variable(s), then select Statistics.
5- In the Frequencies Statistics box, select Mean,
Median and Std. deviation, then click continue.
6- Click on OK

145
SPSS
Statistics Advance Biostatistics

146
SPSS
Statistics Advance Biostatistics

The table below shows the descriptive statistics of the four


variables (two numerical and two categorical).

Table: Descriptive statistics of the four independent


variables
Median Frequency
Variable Mean (SD)
(IQR) (%)
Monthly
Family 2581.7
2700.0
Income (709.9)
(Shekel)
Age (year) 28.9 (6.7) 28.0
Male 162 (50.9)
Gender
Female 156 (49.1)
Yes 77 (24.2)
Sick
No 240 (75.5)

147
SPSS
Statistics Advance Biostatistics

3- Bivariable Analyses (Simple Linear Regression)


Simple linear regression is needed to identify possible
significant important independent variables for the
multivariable analysis. Scatter plot is also needed to define
the linear relationship between each numerical independent
variable and dependent variable.

A) Numerical variables
Checking assumptions

i) Linearity of the numerical variable (relationship


between unstandardized residual value and
numerical independent variable)

The scatter plot shows no pattern of


relationship between residual and predicted
values, consequently, the assumption of equal
variances is met.

148
SPSS
Statistics Advance Biostatistics

149
SPSS
Statistics Advance Biostatistics

ii) Normality of residual (normality of residual)


The histogram of residual shows normal distribution,
consequently the assumption of normality is met.

iii) Equal variance of residual (relationship between


unstandardized residual versus unstandardized
predicted value).
The scatter plot shows no pattern of relationship between
residual and predicted values, consequently, the assumption
of equal variances is met.

150
SPSS
Statistics Advance Biostatistics

3. A: Simple Linear Regression for Monthly Family


Income variable
The procedures of Simple Linear Regression can be
performed as the previous example.

The model summary table shows the correlation coefficient


and correlation determination. 69.0% of the variation in the
Total QoL is explained by the Monthly Family Income.

151
SPSS
Statistics Advance Biostatistics

Model Summary
Adjusted R Std. Error of
Model R R Square
Square the Estimate
1 .831a .690 .689 8.81543
a. Predictors: (Constant), Monthly Family Income

The coefficient table below shows the equation of the


regression line, p value of the regression, the slope of the
line and the intercept at y axis.
The slope of the regression line is 0.019 with y axis
intercept at 47.476.
Increase 1 unit of monthly family income will increase
total QoL with 0.019 unit.
The regression equation: Total QoL = 47.476 + 0.019
monthly family income.

Coefficientsa
95.0%
Unstandardized Standardized
Confidence
Coefficients Coefficients
Model t Sig. Interval for B
Std. Lower Upper
B Beta
Error Bound Bound
(Constant) 47.476 1.867 25.427 .000 43.802 51.149
1 Monthly
Family .019 .001 .831 26.550 .000 .017 .020
Income
a. Dependent Variable: Total QoL

152
SPSS
Statistics Advance Biostatistics

The scatter plot between Monthly Family


Income and QoL shows linear relationship.

3. B: Simple Linear Regression for Age variable


The model summary table shows the correlation coefficient
and correlation determination. 35.3% of the variation in the
Total QoL is explained by the Monthly Family Income.
Model Summary
Adjusted R Std. Error of
Model R R Square
Square the Estimate
1 .594a .353 .351 12.74640
a. Predictors: (Constant), Age

The coefficient table below shows the equation of the


regression line, p value of the regression, the slope of the
line and the intercept at y axis.
The slope of the regression line is 1.399 with y axis
intercept at 54.786.

153
SPSS
Statistics Advance Biostatistics

Increase 1 year of age will increase total QoL with 1.399


unit.
The regression equation: Total QoL = 54.786+ 1.399 age.

Coefficientsa
Unstandardized Standardized 95.0% Confidence
Model Coefficients Coefficients t Sig. Interval for B
B Std. Error Beta Lower Bound
Upper Bound
(Constant) 54.786 3.167 17.302.000 48.555 61.016
1
Age 1.399 .107 .594 13.127.000 1.189 1.608
a. Dependent Variable: Total QoL

The scatter plot between age and QoL shows


linear relationship.

B) Categorical variables
3.A: Simple Linear Regression for Gender variable
Females have 1.89 higher total QoL than males, but the
relationship is not significant (p = 0.287).

154
SPSS
Statistics Advance Biostatistics

Coefficientsa
Standardize 95.0%
Unstandardize
d Confidence
d Coefficients
Model Coefficients t Sig. Interval for B
Std. Lower Upper
B Beta
Error Bound Bound
(Constant 34.78 .00 92.58 103.68
98.135 2.821
) 6 0 5 6
1
- .28 -
Gender 1.892 1.774 -.060- 1.599
1.066- 7 5.383-
a. Dependent Variable: Total QoL

3. B: Simple Linear Regression for Gender variable

Healthy people have significantly 7.556 higher total QoL


than sick, the relationship is significant (p <0.001).
Coefficientsa
95.0%
Unstandardized Standardized Confidence
Coefficients Coefficients Interval for
Model t Sig.
B
Std. Lower Upper
B Beta
Error Bound Bound
(Constant) 81.926 3.404 24.067 .000 75.229 88.623
1
Sick 7.556 1.863 .222 4.057 .000 3.891 11.221
a. Dependent Variable: Total QoL

Summary of bivariable analyses


Table below shows the results of simple linear regression,
in which independent variables with p values less than 0.05
can be included in multiple linear regression.
155
SPSS
Statistics Advance Biostatistics

Table: Predictor factors of total QoL from simple linear


regression
Variable b (95% CI) p value
Monthly Family 0.019 (0.0170, <0.001
Income 0.020)
Age 1.399 (1.189, 1.608) <0.001
Gender (Female) 1.892 (-5.383-, 0.287
1.599)
Sickness (Healthy) 7.556 (3.891, <0.001
11.221)
4- Perform Building Preliminary Model (Variable
selection)
1- Analyze
2- Regression
3- Linear
4- In the Linear Regression box, transfer Total QoL
to Dependent, and monthly family income, age
and sickness to Independent(s), from Method,
select Stepwise, then select Statistics.
5- In the Linear Regression: Statistics box, select
Confidence intervals. Then click on continue.
6- Click on OK.

156
SPSS
Statistics Advance Biostatistics

157
SPSS
Statistics Advance Biostatistics

5- Results interpretation
The model summary table shows the R and R2 of each
model. The model 3 is the better model because 70.6% of
the model variation explained by the independent's
variables compared to the two other models.
Model Summary
Adjusted R Std. Error of
Model R R Square
Square the Estimate
1 .831a .690 .689 8.81543
2 .838b .702 .700 8.66576
3 .840c .706 .703 8.62044
a. Predictors: (Constant), Monthly Family Income
b. Predictors: (Constant), Monthly Family Income, Sick
c. Predictors: (Constant), Monthly Family Income, Sick, Age

158
SPSS
Statistics Advance Biostatistics

The ANOVA table shows the p value of each model


(p<0.001). There is a linear relationship between the three
independents variable and the total QoL.

ANOVAa
Sum of Mean
Model df F Sig.
Squares Square
Regression 54781.185 1 54781.185 704.928 .000b
1 Residual 24556.906 316 77.712
Total 79338.091 317
Regression 55683.032 2 27841.516 370.748 .000c
2 Residual 23655.059 315 75.095
Total 79338.091 317
Regression 56004.118 3 18668.039 251.212 .000d
3 Residual 23333.973 314 74.312
Total 79338.091 317
a. Dependent Variable: Total QoL
b. Predictors: (Constant), Monthly Family Income
c. Predictors: (Constant), Monthly Family Income, Sick

d. Predictors: (Constant), Monthly Family Income, Sick, Age

The coefficient table below shows the selecting


independent variables, the preliminary model is the model
3.

159
SPSS
Statistics Advance Biostatistics

Coefficientsa
95.0%
Unstandardized Standardized Confidence
Coefficients Coefficients Interval for
Model t Sig.
B
Std. Lower Upper
B Beta
Error Bound Bound
(Constant) 47.476 1.867 25.427 .000 43.802 51.149
Monthly
1
Family .019 .001 .831 26.550 .000 .017 .020
Income
(Constant) 41.884 2.444 17.138 .000 37.075 46.692
Monthly
2 Family .018 .001 .816 26.253 .000 .017 .020
Income
Sick 3.657 1.055 .108 3.465 .001 1.581 5.734
(Constant) 39.508 2.686 14.707 .000 34.223 44.794
Monthly
Family .017 .001 .762 18.849 .000 .015 .019
3
Income
Sick 3.524 1.052 .104 3.351 .001 1.455 5.594
Age .198 .095 .084 2.079 .038 .011 .385
a. Dependent Variable: Total QoL

The tables below show the excluded independent variable


in the models.
Excluded Variablesa
Collinearity
Partial
Model Beta In t Sig. Statistics
Correlation
Tolerance
Age .092b 2.251 .025 .126 .576
1
Sick .108b 3.465 .001 .192 .980
2 Age .084c 2.079 .038 .117 .574
a. Dependent Variable: Total QoL
b. Predictors in the Model: (Constant), Monthly Family Income
c. Predictors in the Model: (Constant), Monthly Family Income, Sick

160
SPSS
Statistics Advance Biostatistics

6- Checking multicollinearity and interaction


i) Interaction
The interaction between the independent variables
should be checked one by one. There are three
independent variables, so three pair of interaction needs
to be checked. Before interaction checking, interaction
term should be created.
1- Transform
2- Compute Variable
3- In Target Variable box, name the interaction
variable (Interaction Age).
4- Transfer Monthly Family Income to Numeric
Expression box and click on *, then transfer
Age to Numeric Expression box again.
5- Click on OK.
6- Repeat the previous steps to create new
interaction variables, (Monthly Family Income
with Sick and Age with Sick).
7- Repeat the steps of multiple linear regression. In
addition, the three interaction variables should
be included and the Method is (Enter).

161
SPSS
Statistics Advance Biostatistics

The results in tables below show that the three interaction


variables are not significant. p values > 0.05. Therefore, the
interaction variables will not be included in the model.

162
SPSS
Statistics Advance Biostatistics

Coefficientsa
Standa
Un 95.0%
rdized
standardized Confidence
Coeffic
Model Coefficients t Sig. Interval for B
ients
Std. Lower Upper
B Beta
Error Bound Bound
(Constant) 37.308 6.988 5.339 .000 23.559 51.057
Monthly
Family .018 .003 .802 6.419 .000 .012 .023
Income
1 Age .270 .232 .115 1.163 .246 -.187- .726
Sick 3.496 1.057 .103 3.309 .001 1.417 5.575
-
Income*Age 2.774E .000 -.065- -.341- .733 .000 .000
-005
a. Dependent Variable: Total QoL

Coefficientsa
95.0%
Unstandardized Standardized
Confidence
Coefficients Coefficients
Model t Sig. Interval for B
Std. Lower Upper
B Beta
Error Bound Bound
(Constant) 40.702 6.674 6.099 .000 27.570 53.833
Monthly
Family .016 .003 .739 5.992 .000 .011 .022
Income
1
Age .197 .095 .084 2.063 .040 .009 .384
-
Sick 2.844 3.634 .084 .783 .434 9.995
4.306-
IncomeSick .000 .001 .033 .195 .845 -.003- .003
a. Dependent Variable: Total QoL

163
SPSS
Statistics Advance Biostatistics

Coefficientsa
95.0%
Unstandardized Standardized Confidence
Coefficients Coefficients Interval for
Model t Sig.
B
Std. Lower Upper
B Beta
Error Bound Bound
(Constant) 33.014 8.736 3.779 .000 15.826 50.203
Monthly
Family .017 .001 .757 18.474 .000 .015 .019
Income
1
Age .429 .311 .182 1.379 .169 -.183- 1.041
-
Sick 7.145 4.752 .210 1.503 .134 16.496
2.206-
Age*Sick -.122- .157 -.154- -.781- .435 -.430- .186
a. Dependent Variable: Total QoL
ii) Multicollinearity
The stability of the regression model should be checked,
which mean that the independent variables have to be
unrelated. Multicollinearity indicates that the independent
variables are highly correlated and it is checked by
determine Variance Inflation Factor (VIF). VIP must be less
than 10.
To identify VIF, all steps of multiple linear regression are
repeated with the three independent variables and Method
is Enter. In Linear Regression: Statistics box, select
Collinearity diagnosis in addition to Confidence intervals.

164
SPSS
Statistics Advance Biostatistics

The results in table below shows that all VIF of the three
variables are < 10 which are acceptable.
Coefficientsa
95.0%
Unstandardized Standardized Confidence Collinearity
Coefficients Coefficients Interval for Statistics
Model t Sig.
B
Std. Lower Upper
B Beta Tolerance VIF
Error Bound Bound
(Constant) 39.508 2.686 14.707 .000 34.223 44.794
Monthly
Family .017 .001 .762 18.849 .000 .015 .019 .574 1.743
1
Income
Age .198 .095 .084 2.079 .038 .011 .385 .574 1.742
Sick 3.524 1.052 .104 3.351 .001 1.455 5.594 .977 1.024
a. Dependent Variable: Total QoL

165
SPSS
Statistics Advance Biostatistics

7- Interpretation of the final model


As shown in the final model, monthly family income, age
and sickness variable are the significant predictor
factors for QoL in the Gaza Strip.

An increase 1 unit of monthly family income will increase


total QoL with 0.017 unit. An increase 1 unit of age will
increase total QoL with 0.198 unit. Healthy people have
3.524 higher total QoL than sick.

The prediction model of total QoL in the Gaza Strip is:

Total QoL = 39.508 + [0.017* monthly family income]


+.[ 0.198*age] + [3.524*sick].

In conclusion, the predictor factors of QoL in the Gaza


Strip are monthly family income, age and sickness
variables.

166
SPSS
Statistics Advance Biostatistics

8- Result presentation of prediction model

Table: Predictor factors of QoL in the Gaza Strip

SLR* MLR**
p t p
Variables b (95% Adjusted b
valu statisti valu
CI) (95% CI)
e cs e
Monthly 0.019
<0.0 0.017 (0.015, <0.0
family (0.0170, 18.849
01 0.019) 01
income 0.020)
1.399
<0.0 0.198 (0.011, 0.03
Age (1.189, 2.079
01 0.385) 8
1.608)
7.556
Sickness <0.0 3.524 (1.455, 0.00
(3.891, 3.351
(Healthy) 01 5.594) 1
11.221)
1.892 (-
Gender 0.28
5.383-, - - -
(Female) 7
1.599)
*Simplelinear regression, ** Multiple linear regression
Model assumption are met. There are no interaction and multicollinearity
problem.

3- Multiple Logistic Regression


Multiple Logistic Regression is used to describe the
association of several independent variables to a categorical
dependent variable. Binary Logistic Regression model is
used when the dependent variable is not continuous but
instead has only two possible outcomes, 1 or 0. This model
is typically used when predicting an event which has two

167
SPSS
Statistics Advance Biostatistics

possible outcomes, for e.g. ‘sick vs. healthy’, ‘alive vs.


dead’, etc.
Regular regression models cannot be used for such
variables because the predicted value needs to be
constrained between 0 and 1, which is not possible in
regular regression.

Steps of analysis
1- Hypothesis and question statement
Research question: What are factors associated with
depression in a sample of adults in the Gaza Strip?
objective: to identify the associated factors of
depression in a sample of adults in the Gaza Strip.

2- Descriptive statistics of variables


The table below shows the descriptive statistics of the four
variable (one numerical and three categorical).
Table: Descriptive statistics of the four independent
variables
Variable Mean (SD) Frequency (%)
Age (year) 29.5 (8.0)
Yes 162 (50.9)
Obesity
No 156 (49.1)
Yes 144 (45.3)
Smoking
No 174 (54.7)
Male 146 (45.9)
Gender
Female 172 (54.1)

168
SPSS
Statistics Advance Biostatistics

3- Bivariable Analyses (Simple Logistic Regression)


Simple logistic regression is needed to identify possible
significant important independent variables for the
multivariable analysis. Only age variable is a numerical
while other independent variables are categorical.

1- Analyze
2- Regression
3- Binary Logistic
4- In the Logistic Regression box, transfer
Depression to Dependent, and age to Covariates,
then select Options.
5- In the Logistic Regression: Options box, select
CI for exp(B). Then click on continue.
6- Click on OK.

169
SPSS
Statistics Advance Biostatistics

170
SPSS
Statistics Advance Biostatistics

Variables in the Equation


95% C.I.for
B S.E. Wald df Sig. Exp(B) EXP(B)
Lower Upper
-
Step Age .015 4.503 1 0.034 0.969 .941 .998
.032-
1a
Constant .741 .450 2.709 1 .100 2.099
a. Variable(s) entered on step 1: Age.

The table shows the p value is 0.034 and the odds ratio of
age is 0.969. People of the sample with an increase in one-
year age, will have an increase 0.969 times the odds to
depression.

171
SPSS
Statistics Advance Biostatistics

For categorical variables: perform simple logistic


regression as above. But in the Logistic Regression box,
select categorical. In Logistic Regression: Define
Categorical Variables box transfer gender to Categorical
Covariance. The option First as shown in figure bellow for
the reference group. If male is reference, selection First is
needed, then click on change. But if female is reference, no
change.

Variables in the Equation


95% C.I.for
B EXP(B)
S.E. Wald df Sig. Exp(B)
Lower Upper
Obesity(1) 2.093 .259 65.358 1 .000 8.112 4.884 13.476
Step 1a -
Constant .196 45.039 1 .000 .268
1.316-
a. Variable(s) entered on step 1: Gender.

172
SPSS
Statistics Advance Biostatistics

The table shows the p value is <0.001 and the odds ratio of
obesity is 8.112. Obese have 8.1 times the odds to
depression than non-obese.

The table shows the p value is <0.001 and the odds ratio of
age is 11.085. Smokers have 11.0 times the odds to
depression than non-smokers.

Variables in the Equation


95% C.I.for
B S.E. Wald df Sig. Exp(B) EXP(B)
Lower Upper
Smoking(1) 2.406 .279 74.122 1 .000 11.085 6.411 19.169
Step
-
1a Constant .228 51.467 1 .000 .195
1.635-
a. Variable(s) entered on step 1: Smoking.

Variables in the Equation


95% C.I.for
B S.E. Wald df Sig. Exp(B) EXP(B)
Lower Upper
Gender .704 .229 9.465 1 .002 2.022 1.291 3.165
Step
-
1a Constant .361 11.744 1 .001 .290
1.237-
a. Variable(s) entered on step 1: Obesity.

The table shows the p value is 0.002 and the odds ratio of
Gender is 2.022. Males have 2.0 times the odds to
depression than females.
173
SPSS
Statistics Advance Biostatistics

4- Summery of bivariable analyses

Table below shows the results of simple logistic regression,


in which independent variables with p values less than 0.05
can be included in multiple logistic regression.

Table: Factors associated with depression from simple


logistic regression

Walid
Crude OR (95%
Variable statistics p value
CI)
(df)
Age 0.97 (0.94, 1.00) 4.50 (1) 0.034
Obesity
Non-obese 1.00 65.35 (1) <0.001
Obese 8.11 (4.88, 13.47)
Smoking
1.00
Non-smoker <0.001
11.08 (6.41, 74.122 (1)
Smoker
19.169)
Gender
Female 1.00
9.46 (1) 0.002
Male 2.02 (1.29, 3.16)

174
SPSS
Statistics Advance Biostatistics

5- Perform Building Preliminary Model (Variable


selection)
1- Analyze
2- Regression
3- Binary Logistic
4- In the Logistic Regression box, transfer
Depression to Dependent, and age, smoking,
gender and obesity to Covariates. Click on
Method then select Forward: LR
5- Select Categorical.
6- In Logistic Regression: Define Categorical
Variables box transfer categorical variables
(smoking, gender and obesity) to Categorical
Covariance. The option First as shown in figure
bellow for the reference group
7- Select options, in the Logistic Regression:
Options box, select CI for exp(B) and At last
step. Then click on continue.
8- lick on OK.

175
SPSS
Statistics Advance Biostatistics

176
SPSS
Statistics Advance Biostatistics

The table below shows the reference group,.00 is the


reference group.

Categorical Variables Codings


Parameter coding
Frequency
(1)
Female 165 0.000
Gender
Male 153 1.000
Obese 156 0.000
Obesity
Non-Obese 162 1.000
Non-Smoker 141 0.000
Smoking
Smoker 177 1.000

177
SPSS
Statistics Advance Biostatistics

The table of Variables in the Equation below shows the


output of forward LR method. The variables in the
preliminary model are smoking and obesity.
Variables in the Equation*
95% C.I.for
B S.E. Wald df Sig. Exp(B) EXP(B)
Lower Upper
Smoking(1) 2.406 .279 74.122 1 .000 11.085 6.411 19.169
Step
a -
1 Constant .228 51.467 1 .000 .195
1.635-
Smoking(1) 2.820 .350 65.004 1 .000 16.785 8.456 33.318
Step Obesity(1) 2.531 .338 56.037 1 .000 12.569 6.478 24.384
2b -
Constant .371 75.656 1 .000 .040
3.230-
a. Variable(s) entered on step 1: Smoking.
b. Variable(s) entered on step 2: Obesity.
*. Method = Forward Stepwise.

6- Checking multicollinearity and interactions


i) Interaction
The interaction between the independent variables
should be checked one by one. There are two
independent variables, so one pair of interaction needs
to be checked. Before interaction checking, interaction
term should be created.
1- Analyze
2- Regression
3- Binary Logistic

178
SPSS
Statistics Advance Biostatistics

4- In the Logistic Regression box, transfer


Depression to Dependent, and smoking and
obesity to Covariates. Click both smoking and
obesity then select >a*b>, then OK.
5- Select Categorical.
6- In Logistic Regression: Define Categorical
Variables box transfer categorical variables
(smoking and obesity) to Categorical
Covariance. The option First for the reference
group
7- Select options, in the Logistic Regression:
Options box, select CI for exp(B). Then click on
continue.
8- Click on OK.

179
SPSS
Statistics Advance Biostatistics

The interaction is significant (p value = 0.033), therefore


the interaction will be included in the model.

Variables in the Equation


95% C.I. for
B S.E. Wald df Sig. Exp(B) EXP(B)
Lower Upper
Obesity(1) by
1.441 .675 4.561 1 .033 4.227 1.126 15.869
Smoking(1)
Step Obesity(1) 1.598 .512 9.740 1 .002 4.942 1.812 13.479
1a Smoking(1) 1.950 .488 15.992 1 .000 7.031 2.703 18.289
-
Constant .424 35.441 1 .000 .080
2.526-
a. Variable(s) entered on step 1: Obesity * Smoking , Obesity,
Smoking.

180
SPSS
Statistics Advance Biostatistics

iii)Multicollinearity
The stability of the logistic regression model should be
checked, which mean that the independent variables have
to be unrelated. Multicollinearity indicates that the
independent variables are highly correlated. There is no
facility to check multicollinearity in logistic regression but
it is alternatively checked by linear regression analysis, in
which determines by Variance Inflation Factor (VIF). VIP
must be less than 10.

To identify VIF, all steps of linear regression are conducted


with the tow independent variables, Method is Enter. In
Linear Regression: Statistics box, select Collinearity
diagnosis in addition to Confidence intervals.

181
SPSS
Statistics Advance Biostatistics

The results in table below shows that all VIF of the three
variable are < 10 which are acceptable.

Coefficientsa
Standardize
Unstandardize Collinearity
d
d Coefficients Statistics
Model Coefficients t Sig.
Toleranc
B Std. Error Beta VIF
e
(Constant
.585 .069 8.516 .000
)
1
Obesity .405 .043 .407 9.475 .000 .978 1.023
Smoking .459 .043 .458 10.673 .000 .978 1.023
a. Dependent Variable: Depression

182
SPSS
Statistics Advance Biostatistics

7- Check Model Fitness


Model Fitness can be checked by
i) Hosmer-Lemeshow test of goodness of fit
ii) ROC curve (Receiver operating characteristic)
i) Hosmer-Lemeshow test of goodness of fit
If the H-L goodness-of-fit test statistic is greater than.05, as
we want for well-fitting models, we fail to reject the null
hypothesis that there is no difference between observed and
model-predicted values, implying that the model's estimates
fit the data at an acceptable level.
To identify Hosmer-Lemeshow test of goodness of fit, all
steps of multiple logistic regression are conducted with the
tow selected independent variables, Method is Enter. In
Logistic Regression: Options box, select classification
plots and Hosmer-Lemeshow test goodness of fit, in
addition to CI for exp(B). then click on continue, OK.

183
SPSS
Statistics Advance Biostatistics

The results in table below shows, p value is 0.088, which is


not significant, the null hypothesis is not rejected.
Consequently, the model is fit.

Hosmer and Lemeshow Test


Step Chi-square df Sig.
1 4.863 2 .088

Classification Tablea
Predicted
Depression
Observed Percentage
Non-
Depressed Correct
Depressed
Non-
166 8 95.4
Depression Depressed
Step 1
Depressed 50 94 65.3
Overall Percentage 81.8
a. The cut value is.500

184
SPSS
Statistics Advance Biostatistics

The results of classification plots shows as indicated below,


81.8 % of cases are predicted correctly whether the
depressed or not. More than 70% is considered a good
model.
ii) ROC curve (Receiver operating characteristic)

To identify ROC curve, all steps of multiple logistic


regression are conducted with the tow selected independent
variables, click on Save. In Logistic Regression: Save box,
select Probability, then click on continue, OK. A new
predicted probability is created.
3-Analyze
4- ROC curve
5- In ROC Curve box, transfer Predicted probability
[PRE_1] to Test Variable and Depression to State
Variable.
6- Define 1 in Value of State Variable.
7- Select ROC curve, With diagonal reference line and
Standard error and confidence interval. Then click on
OK.

185
SPSS
Statistics Advance Biostatistics

186
SPSS
Statistics Advance Biostatistics

187
SPSS
Statistics Advance Biostatistics

The result of the ROC Curve figure shows that the area
under the curve is 0.823 which mean that the model can
accurately discriminate 82.3% of the case. More than 50%
of cases are discriminated significantly.

Area Under the Curve


Test Result Variable(s): Predicted probability
Asymptotic 95% Confidence
Asymptotic
Area Std. Errora b Interval
Sig.
Lower Bound Upper Bound
0.823 0.037 0.000 0.751 0.895
The test result variable(s): Predicted probability has at least one tie
between the positive actual state group and the negative actual state
group. Statistics may be biased.
a. Under the nonparametric assumption
b. Null hypothesis: true area = 0.5

188
SPSS
Statistics Advance Biostatistics

8- Interpretation of the final model

As shown in the final model, smoking, obesity and


smoking*obesity variables are associated factors for
depression.
Variables in the Equation
95% C.I. for
B S.E. Wald df Sig. Exp(B) EXP(B)
Lower Upper
Obesity(1)
by 1.441 .675 4.561 1 .033 4.227 1.126 15.869
Smoking(1)
Step
Obesity(1) 1.598 .512 9.740 1 .002 4.942 1.812 13.479
1a
Smoking(1) 1.950 .488 15.992 1 .000 7.031 2.703 18.289

Constant -2.526- .424 35.441 1 .000 .080


a. Variable(s) entered on step 1: Obesity * Smoking , Obesity, Smoking.

- Stressed people who obese have increase odds to


depression 4.9 times than non-obese.
- Stressed people who smokers have increase odds to
depression 7.0 times than non-smokers.
- Stressed people who are obese and smoker have
increase odds to depression 4.2 times than non-obese
and non-smoker.

189
SPSS
Statistics Advance Biostatistics

The prediction model of depressed among stressed people


in the Gaza Strip is:
Logit (P) = -2.526 + [1.950* Smoker] + [ 1.598*Obese] +
[1.441* Smoker and Obese].

9- Result presentation of prediction model

Table: Factors associated with depression in a sample


from Gaza
Wald
Variabl Crude OR Adjusted statist p**
es (95%CI) * (95%CI)** ics value
(df)**
1
Non-smoker 1 15.99 <0.00
Smoker Smoker
11.08 (6.41,
7.03 (2.70, 18.29)
19.16) (1) 0
Non-obese 1 1 9.74
Obese Obese 8.11 (4.88, 3.47) 4.94 (1.81, 13.48) 0.002
(1)
Smoker No 1 4.56
Yes - 4.23 (1.13, 15.87) 0.033
* Obese (1)

*Simple logistic regression


** Multiple logistic regression

190
References

References
Creswell, J. W. (2002). Research design: qualitative,
quantitative, and mixed methods approaches. 2nd Edition.
California: Sage Publication, Inc., pp. 153-159. (ISBN:
978-0761924425) (Book).

Dimitrov, D. M. & Rumrill, P. D. (2003). Pretest-posttest


designs and measurement of change. Work, 20(2), 159–165.

Field, A. P. (2013). Discovering statistics using IBM SPSS


Statistics: and sex and drugs and rock 'n' roll (fourth
edition). London: Sage publications.

Kuzma, JW. & Bohnenblust, SE. (2005). Basic Statestics


for Health Science. New York: McGraw-Hill companies,
Inc. )ISBN: zxcasdqwe147258369(

Naing, N. N. (2011). A practical guide on determination of


sample size in health sciences. 5th ed. Kota Bharu: Nyi Nyi
Naing, pp. 43. (ISBN: 978-983-867263-4).

191
References

Stacey, B. & Kelvin. E. (2012). Munro's Statistical Methods


for Health Care Research. 6th ed. Philadelphia: Lippincott
Williams & Wilkins.

Shi, L. (2008). Health services research methods. 2nd


Edition. Baltimore: Delmar Cengage Learning, pp. 164-
168. (ISBN: 978-14283252292) (Book).

Ryan, N. M. (1989). Stress-coping strategies identified from


school age children's perspective. Research in Nursing and
Health, 12(2), 111-122.

Xie, J. C.& Lin, R. L. (2009). Research on multiple


intelligences teaching and assessment. Asian Journal of
Management and Humanity Sciences, 4(2-3), 106-124.

192
References

193
References

E-KUTUB
Publisher of publishers
Amazon & Google Books Partner
No 1 in the Arab world
Registered with Companies House in England
under Number: 07513024
Email: [email protected]
Website: www.e-kutub.com
Germany Office: In der Gass 10,
55758 Niederwörresbach,
Rhineland-Palatinate
UK Registered Office:
28 Lings Coppice,
London, SE21 8SY
Tel: (0044)(0)2081334132

194

You might also like