0% found this document useful (0 votes)

39 views47 pages

Advanced Biostatistics with R

The document is a comprehensive guide on Advanced Biostatistics with R, focusing on survival models such as the Accelerated Failure Time (AFT) Model and models based on hazard functions. It includes detailed problems, solutions, and interpretations related to the comparison of treatment effects on lung cancer relapse using different statistical models. The author expresses gratitude to mentors and encourages feedback while allowing the sharing of the document for educational purposes.

Uploaded by

SHAKIB HASAN LIMON

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views47 pages

Advanced Biostatistics with R

Uploaded by

SHAKIB HASAN LIMON

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

Advanced Biostatistics With R

Md. Mostakim

Session: 2022-23

Department of Statistics

University of Dhaka

Published Date:

1st Published: April, 2025

Acknowledgements:

I am deeply grateful to Fatima Tuz Zahura Mam for her exceptional guidance in teaching Advanced
Biostatistics. My sincere thanks also go to the seniors for their insightful lectures.

N.B. You may share this pdf book as much as you like but don’t use it for any unethical purpose.

For any kind of feedback, please contact to [email protected]. Your feedback will be very inspiring for me.
Table of Contents

Accelerated Failure Time (AFT) Model ................................................................................... 1

Problem 1: Log-Normal AFT Model .................................................................................... 1

Problem 2: Log-Logistic AFT Model .................................................................................... 5

Problem 3: Weibull AFT Model........................................................................................... 7

Problem 4: Hypothesis Testing ........................................................................................... 9

Problem 5: Weibull AFT Model with Real Life Data ......................................................... 14

Problem 6: Log-Normal AFT Model with Real Life Data ................................................... 21

Problem 7: Log-Logistic AFT Model with Real Life Data ................................................... 24

Model Based on Hazard Function ........................................................................................ 26

Problem 8: Weibull Parametric PH Model ....................................................................... 27

Problem 9: Inter-relationship between Weibull PH and Weibull AFT Model .................. 31

Problem 10: Semi-Parametric PH Model (Cox-PH Model) ............................................... 32

Multiple Modes of Failure .................................................................................................... 38

Generalized Linear Models................................................................................................... 38

Generalized Linear Mixed Models ....................................................................................... 38

Problem 11: Generalized Linear Mixed Model ................................................................. 38

Problem 12: Generalized Linear Mixed Model ................................................................. 41

References ............................................................................................................................ 43
There are two types of survival models that are commonly used in practice, these are:

i. Accelerated Failure Time (AFT) Model

ii. Model Based on Hazard Function

Accelerated Failure Time (AFT) Model

Problem 1: Log-Normal AFT Model
The time to relapse, in months, for patients on two treatments for lung cancer is compared using the
following accelerated failure time (AFT) regression model: 𝑌 = 𝑙𝑛𝑇 = 2.0 + 0.5𝑥 + 2.0𝑍, where
𝑍~𝑁(0,1) and

1, 𝐼𝑓 𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 𝑖𝑠 𝐴
𝑥={ .
0, 𝐼𝑓 𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 𝑖𝑠 𝐵

a. Compare the survival probabilities of the two treatments at 0, 0.5, 1, 2, 3, 4, 5, and 6 years.
b. Draw survival curves for treatment A and B in the same graph.
c. Find the first, second, and third quartile survival times for treatment A and B.
d. Which treatment is better and why?

Solution:

a. Compare the survival probabilities

Here, the random variable 𝑍 ~ 𝑁𝑜𝑟𝑚𝑎𝑙(0,1) ; so, the (Accelerated Failure Time) AFT model is called
Log Normal AFT model.

The model is given by, 𝑌 = 𝑙𝑛𝑇 = 𝜇 + 𝑥 ′ 𝛽 + 𝜎 𝑍 = 2.0 + 0.5𝑥 + 2.0 𝑍

Therefore, location, 𝜇 = 2 ; scale, 𝜎 = 2 ; regression parameter, 𝛽 = 0.5 .

The linear predictor is defined by, 𝜂 = 𝑥′𝛽

For, treatment A, 𝜂 = 1 × 0.5 and for treatment B, 𝜂 = 0 × 0.5.

The survival probability that a lung cancer patient survives more than 𝑡 time is

𝑙𝑛𝑡 − 𝑥 ′ 𝛽 − 𝜇
𝑆(𝑡) = 1 − 𝛷 ( )
𝜎

1|Page
The 𝑝𝑡ℎ quartile time, denoted by 𝑡𝑝 , is given by-

𝑡𝑝 = exp (𝜇 + 𝑥 ′ 𝛽 + 𝜎 𝑍𝑝 )

where 𝑍𝑝 is the 𝑝𝑡ℎ quartile of standard normal distribution.

#Defining time variable

t<-c(0,0.5,1,2,3,4,5,6)
time<-12*t #time in months

#defining parameters and linear predictor of Y=lnT

location<-2.0
scale<-2.0
beta<-0.5

linear.pred.TrA<-0.5*1
linear.pred.TrB<-0.5*0

#survival probabilities of treatment A and B

sur.pr.A<-1-pnorm((log(time)-linear.pred.TrA-location)/scale)
sur.pr.B<-1-pnorm((log(time)-linear.pred.TrB-location)/scale)
cbind(time,sur.pr.A,sur.pr.B)
time sur.pr.A sur.pr.B
[1,] 0 1.0000000 1.0000000
[2,] 6 0.6383756 0.5414630
[3,] 12 0.5030107 0.4042145
[4,] 24 0.3672947 0.2779216
[5,] 36 0.2939921 0.2142505
[6,] 48 0.2464825 0.1747395
[7,] 60 0.2126755 0.1475101
[8,] 72 0.1871808 0.1274907

Interpretation:

From the above table, it is found that,

• For the lung cancer patients taking treatment A, 63.84% of them do not get the disease back
before 6 months, whereas, those taking treatment B, 54.15% of them do not get the disease back
before 6 months.

• For the lung cancer patients taking treatment A, 50.30% of them do not get the disease back
before 1 year (12 months), whereas, those taking treatment B, 40.42% of them do not get the
disease back before 1 year.

2|Page
• For the lung cancer patients taking treatment A, 36.73% of them do not get the disease back
before 2 years, whereas, those taking treatment B, 27.79% of them do not get the disease back
before 2 years.

• For the lung cancer patients taking treatment A, 29.40% of them do not get the disease back
before 3 years, whereas, those taking treatment B, 21.43% of them do not get the disease back
before 3 years.

• For the lung cancer patients taking treatment A, 24.65% of them do not get the disease back
before 4 years, whereas, those taking treatment B, 17.47% of them do not get the disease back
before 4 years.

• For the lung cancer patients taking treatment A, 21.27% of them do not get the disease back
before 5 years, whereas, those taking treatment B, 14.75% of them do not get the disease back
before 5 years.

• For the lung cancer patients taking treatment A, 18.72% of them do not get the disease back
before 6 years, whereas, those taking treatment B, 12.75% of them do not get the disease back
before 6 years.

b. Draw survival curves for treatment A and B in the same graph.

#survival curves of treatment A and B

plot(time,sur.pr.A, type='s', col='black')
lines(time,sur.pr.B, type='s', col='red')

legend(30, 0.8, c("Treatment A", "Treatment B"), lty=c(1,1), col=c("black", "red"))

title("Survival curves for treatmemnt: Log Normal AFT")

3|Page
Comment:

From the above figure, it is found that the survival probabilities over the time, for patients taking
treatment A is higher than those taking treatment B. As for example, for the lung cancer patients taking
treatment A, (1 − 0.6384) ∗ 100% = 36.26% of them get the disease back after 6 months, on the
other hand, among those taking treatment B, (1 − 0.5415) ∗ 100% = 45.85% get the disease back
after 6 months using Log Normal AFT model.

c. Find the first, second, and third quartile survival times for treatment A and B.

#Qurtile times
z.first.quartile<-qnorm(0.25)
time.first.quartile.A<-exp(location+linear.pred.TrA+scale*z.first.quartile)
time.first.quartile.B<-exp(location+linear.pred.TrB+scale*z.first.quartile)

z.second.quartile<-qnorm(0.5)
time.second.quartile.A<-exp(location+linear.pred.TrA+scale*z.second.quartile)
time.second.quartile.B<-exp(location+linear.pred.TrB+scale*z.second.quartile)

z.third.quartile<-qnorm(0.75)

4|Page
time.third.quartile.A<-exp(location+linear.pred.TrA+scale*z.third.quartile)
time.third.quartile.B<-exp(location+linear.pred.TrB+scale*z.third.quartile)

quartile.A<-cbind(time.first.quartile.A,time.second.quartile.A,time.third.quartile.A)
quartile.B<-cbind(time.first.quartile.B,time.second.quartile.B,time.third.quartile.B)

cbind(quartile.A, quartile.B)

time.first.quartile.A time.second.quartile.A time.third.quartile.A

[1,] 3.161417 12.18249 46.94513
time.first.quartile.B time.second.quartile.B time.third.quartile.B
[1,] 1.917497 7.389056 28.47366

Interpretation:

From the above table we can say that,

• For the lung cancer patients, taking treatment A, 25% of them get back the disease at or before
3.16 months, while this is 1.92 months for those who are taking treatment B.
• For the lung cancer patients, taking treatment A, 50% of them get back the disease at or before
12.18 months, while this is 7.39 months for those who are taking treatment B.
• For the lung cancer patients, taking treatment A, 75% of them get back the disease at or before
46.95 months, while this is 28.47 months for those who are taking treatment B.

d. Which treatment is better and why?

As the event of interest is time to relapse of lung cancer, from survival probability it is found that the
survival probability for the time to relapse of lung cancer for patients taking treatment A is higher than
those taking treatment B. From the quartiles it is found that, those who take treatment A takes more
time to get back disease than those who take treatment B using a Log Normal model. Thus, patients
with lung cancer should take treatment A with the hope of longer survival and better health rather than
treatment B. Therefore, treatment A is better than treatment B.

Problem 2: Log-Logistic AFT Model

The time to relapse, in months, for patients on two treatments for lung cancer is compared using the
following accelerated failure time (AFT) regression model: 𝑌 = 𝑙𝑛𝑇 = 2.0 + 0.5𝑥 + 2.0𝑍, where
𝑍~𝐿𝑜𝑔𝑖𝑠𝑡𝑖𝑐(0,1) and

5|Page
1, 𝐼𝑓 𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 𝑖𝑠 𝐴
𝑥={ .
0, 𝐼𝑓 𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 𝑖𝑠 𝐵

Solution:

Here, the random variable ~ 𝐿𝑜𝑔𝑖𝑠𝑡𝑖𝑐(0,1) ; so, the (Accelerated Failure Time) AFT model is called
Log Logistic AFT model.

The model is given by, 𝑌 = 𝑙𝑛𝑇 = 𝜇 + 𝑥 ′ 𝛽 + 𝜎 𝑍 = 2.0 + 0.5𝑥 + 2.0 𝑍

So, parameters for 𝑌 are, location, 𝜇 = 2 ; scale, 𝜎 = 2 ; regression parameter, 𝛽 = 0.5

1 1
And parameters for 𝑇 are, location, 𝛼 = 𝑒 𝜇 = 𝑒 2 ; scale, 𝛾 = = ; regression parameter, 𝛽 = 0.5 .
𝜎 2

The linear predictor is defined by, 𝜂 = 𝑥′𝛽

For, treatment A, 𝜂 = 1 × 0.5 and for treatment B, 𝜂 = 0 × 0.5.

The survival probability that a lung cancer patient survives more than 𝑡 time is

𝛾 −1
𝑡 𝑒 −𝛽𝑥′
𝑆(𝑡) = [1 + ( ) ]
𝛼

The 𝑝𝑡ℎ quartile time, denoted by 𝑡𝑝 , is given by-

𝑡𝑝 = exp (𝜇 + 𝑥 ′ 𝛽 + 𝜎 𝑍𝑝 )

where 𝑍𝑝 is the 𝑝𝑡ℎ quartile of standard logistic distribution.

time<-c(0,0.5,1,2,3,4,5,6)
t<-12*time
#Given value
mu<-2
sigma<-2
beta<-0.5
#linear predictor

6|Page
lin.pre.A<-0.5*1
lin.pre.B<-0.5*0
#parameter values for log logistic
location<-exp(mu) #shape
gamma<-1/sigma #scale
#survival probabilities
sur.p.A<-1/(1+((t*exp(-lin.pre.A))/location)^gamma)
sur.p.B<-1/(1+((t*exp(-lin.pre.B))/location)^gamma)
cbind(t,sur.p.A,sur.p.B)

t sur.p.A sur.p.B
[1,] 0 1.0000000 1.0000000
[2,] 6 0.5876164 0.5260066
[3,] 12 0.5018867 0.4396819
[4,] 24 0.4160459 0.3568582
[5,] 36 0.3677784 0.3117910
[6,] 48 0.3350125 0.2817899
[7,] 60 0.3106307 0.2597685
[8,] 72 0.2914539 0.2426265

Rest of the calculation and interpretation same as problem 01, try yourself!

Problem 3: Weibull AFT Model

1, 𝐼𝑓 𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 𝑖𝑠 𝐴
𝑥={ .
0, 𝐼𝑓 𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 𝑖𝑠 𝐵

Solution:

Here, the random variable ~ 𝐸𝑥𝑡𝑟𝑒𝑚𝑒 𝑣𝑎𝑙𝑢𝑒 (0,1) ; so, the (Accelerated Failure Time) AFT model
is called Weibull Regression model.

7|Page
The model is given by, 𝑌 = 𝑙𝑛𝑇 = 𝜇 + 𝑥 ′ 𝛽 + 𝜎 𝑍 = 2.0 + 0.5𝑥 + 2.0 𝑍

So, parameters for 𝑌 are, location, 𝜇 = 2 ; scale, 𝜎 = 2 ; regression parameter, 𝛽 = 0.5

1 1
And parameters for 𝑇 are, location, 𝜇 = −𝑙𝑛𝜆 𝑜𝑟 λ = exp(−𝜇) = exp(−2) ; scale, 𝛼 = 𝜎 = 2 ;

regression parameter, 𝛽 = 0.5 .

The linear predictor is defined by, 𝜂 = 𝑥′𝛽

For, treatment A, 𝜂 = 1 × 0.5 and for treatment B, 𝜂 = 0 × 0.5.

The survival probability that a lung cancer patient survives more than 𝑡 time is

𝑆(𝑡) = exp[−{𝜆𝑡𝑒𝑥𝑝(−𝑥 ′ 𝛽)}𝛼 ]

The 𝑝𝑡ℎ quartile time, denoted by 𝑡𝑝 , is given by-

𝑡𝑝 = exp (𝜇 + 𝑥 ′ 𝛽 + 𝜎 𝑍𝑝 )

where 𝑍𝑝 is the 𝑝𝑡ℎ quartile of standard logistic distribution.

time<-c(0,0.5,1,2,3,4,5,6)
t<-12*time
#Given value
mu<-2
sigma<-2
beta<-0.5
#linear predictor

lin.pre.A<-0.5*1
lin.pre.B<-0.5*0
#Parameters for Weibull model
lambda<-exp(-mu) #shape
alpha<-1/sigma #scale
sur.p.A<-exp(-(lambda*t*exp(-lin.pre.A))^alpha)
sur.p.B<-exp(-(lambda*t*exp(-lin.pre.B))^alpha)
cbind(t,sur.p.A,sur.p.B)

t sur.p.A sur.p.B
[1,] 0 1.00000000 1.00000000
[2,] 6 0.49569693 0.40611581
[3,] 12 0.37065568 0.27960657
[4,] 24 0.24571545 0.16493005

8|Page
[5,] 36 0.17924014 0.10999981
[6,] 48 0.13738563 0.07817983
[7,] 60 0.10868988 0.05786851
[8,] 72 0.08794235 0.04408831

Rest of the calculation and interpretation same as problem 01 and 02, try yourself!

Problem 4: Hypothesis Testing

Let 𝑥 be binary covariate taking value either 0 or 1. Assume that 𝑙𝑛𝑡𝑖 |𝑥𝑖 follows a normal distribution
[or extreme value or logistic distribution] with location parameter 𝛽0 + 𝛽1 𝑥𝑖 and scale parameter 𝑏. The
maximum likelihood estimator of 𝜃 = (𝛽0 , 𝛽1 , 𝑙𝑛𝑏)/ and the corresponding variance-covariance
matrix are given below:

0.105 0.158 −0.092 0.000

𝜃̂ = (0.937) 𝑉̂ (𝜃̂) = [−0.092 0.151 0.000]
0.159 0.000 0.000 0.027

Using the above estimates, answer the following

a. Test the null hypotheses 𝐻0 : 𝛽1 = 0 and 𝐻0 : 𝑏 = 1.

b. Estimate the first quartile, median, and third quartile survival times when 𝑥 = 1 and also
find 95% confidence interval for these survival times.

c. Estimate the first quartile, median, and third quartile survival times when 𝑥 = 0 and also
find 95% confidence interval for these survival times.

Solution:

a. Test the null hypotheses 𝑯𝟎 : 𝜷𝟏 = 𝟎 and 𝑯𝟎 : 𝒃 = 𝟏.

Given that,

̂ ̂0 ) = 0.158,
𝛽0 = 0.105, 𝑉𝑎𝑟 (𝛽
̂ ̂1 ) = 0.151,
𝛽1 = 0.937, 𝑉𝑎𝑟 (𝛽
𝑙𝑛𝑏̂ = 0.159, 𝑉𝑎𝑟 (𝑙𝑛𝑏̂ )= 0.027,
̂0 , 𝛽
𝐶𝑜𝑣 (𝛽 ̂1 ) = 𝐶𝑜𝑣 (𝛽
̂1 , 𝛽
̂0 ) = -0.092,
̂0 , 𝑙𝑛𝑏̂ ) = 𝐶𝑜𝑣 (𝑙𝑛𝑏̂, 𝛽
𝐶𝑜𝑣 (𝛽 ̂0 ) = 0.000,
̂1 , 𝑙𝑛𝑏̂ ) = 𝐶𝑜𝑣 (𝑙𝑛𝑏̂, 𝛽
𝐶𝑜𝑣 ( 𝛽 ̂1 ) = 0.000.

9|Page
Hypotheses to be tested:

𝑖) 𝐻0 : 𝛽1 = 0
vs 𝐻0 : 𝛽1 ≠ 0
𝑖𝑖) 𝐻0 : 𝑏 = 1
vs 𝐻0 : 𝑏 ≠ 1

For i)
Test statistic:
̂
𝛽1 −0
w= 𝑆𝐸 ( ̂
𝛽1 )
~ N (0,1), under 𝐻0

where, 𝑆𝐸 ( ̂ ̂1 ) = √0.151
𝛽1 ) = √𝑉𝑎𝑟 (𝛽

beta1.hat<-.937
se.beta1.hat<-sqrt(.151)
wald.beta1<-beta1.hat/se.beta1.hat
pval.beta1<-2*(1-pnorm(abs(wald.beta1)))
pval.beta1
[1] 0.0158958

Here, the p-value is 0.0158958 < 0.05. Thus, null hypothesis 𝐻0 : 𝛽1 = 0 may be rejected at 5% level of
significance.

For ii)
Test statistic
̂−1
b
w= ̂)
𝑆𝐸 (b
~ N (0,1), under 𝐻0
̂
where, b̂ = 𝑒 𝑙𝑛 b = 𝑒 0.159 ,

𝑆𝐸 (b̂) = √𝑉𝑎𝑟 (𝑏̂)

Using Delta method,

2
Var (b) = Var (𝑒 𝑙𝑛𝑏 ) = (𝑒 𝑙𝑛𝑏 ) Var (𝑙𝑛𝑏) = (𝑏)2 Var (𝑙𝑛𝑏)

b.hat<-exp(.159)
var.b.hat<-(b.hat^2)*.027
se.b.hat<-sqrt(var.b.hat)
wald.b.hat<-(b.hat-1)/se.b.hat

10 | P a g e
pval.b.hat<-2*(1-pnorm(abs(wald.b.hat)))
pval.b.hat

[1] 0.3709819

Here, the p-value is 0.3709819 > 0.05. Thus, null hypothesis 𝐻0 : b = 1 may not be rejected at 5% level
of significance.

b. Estimate the first quartile, median, and third quartile survival times when 𝒙 = 𝟏 and also
find 95% confidence interval for these survival times.

Given,
𝑙𝑛𝑡𝑖 |𝑥𝑖 follows a normal distribution [or extreme value or logistic distribution] with location parameter
𝛽0 + 𝛽1 𝑥𝑖 and scale parameter 𝑏.
Thus, the AFT regression model
𝑙𝑛𝑇 = 𝛽0 + 𝛽1 𝑥 + 𝑏𝑧

Then, the p-th quantile survival time

𝑡𝑝 =𝑒 (𝛽0 + 𝛽1 𝑥𝑝+𝑏𝑧𝑝 )
Since, x=1,
𝑡𝑝 = 𝑒 (𝛽0 + 𝛽1 +𝑏𝑧𝑝 )
Here, for first quartile, p = 0.25,
for median, p = 0.50,
for third quartile, p = 0.75.
#Quartiles
z.25<-qnorm(.25)
t.25<-exp(.105+.937+b.hat*z.25)
z.5<-qnorm(.5)
t.5<-exp(.105+.937+b.hat*z.5)
z.75<-qnorm(.75)
t.75<-exp(.105+.937+b.hat*z.75)
quartiles<-cbind(t.25,t.5,t.75)
quartiles
t.25 t.5 t.75
[1,] 1.285657 2.834881 6.250928

11 | P a g e
In the presence of covariate x, 25% observations have survival time less than or equal to 1.285657, 50%
observations have survival time less than or equal to 2.834881 and 75% observations have survival time
less than or equal to 6.250928.

95% confidence interval for the above survival times:

𝑡𝑝 ± 𝑧(1−𝛼) √𝑉𝑎𝑟 (𝑡𝑝 ) ; 𝛼 = 0.05 (i)

where,

2
̂0 ) + 2 𝑥 𝑐𝑜𝑣 (𝛽
√𝑉𝑎𝑟 (𝑡𝑝 ) = √( 𝑡𝑝 ) (𝑉𝑎𝑟 (𝛽 ̂0 , ̂
𝛽1 ) + (𝑥 2 ) 𝑉𝑎𝑟 ( ̂
𝛽1 ) + 𝑧𝑝 2 𝑉𝑎𝑟 (𝑏̂)) (ii)

Since, x=1,
2
̂0 ) + 2 𝑐𝑜𝑣 (𝛽
√𝑉𝑎𝑟 (𝑡𝑝 ) =√( 𝑡𝑝 ) (𝑉𝑎𝑟 (𝛽 ̂0 , ̂
𝛽1 ) + 𝑉𝑎𝑟 ( ̂
𝛽1 ) + 𝑧𝑝 2 𝑉𝑎𝑟 (𝑏̂))

#confidence intervals
se.t.25<-sqrt((t.25^2)*(.158+.151+(z.25^2)*var.b.hat-2*.092))
CI.t25<-c(t.25-qnorm(.975)*se.t.25,t.25+qnorm(.975)*se.t.25)
CI.t25

[1] 0.3365032 2.2348113

se.t.5<-sqrt((t.5^2)*(.158+.151+(z.5^2)*var.b.hat-2*.092))
CI.t5<-c(t.5-qnorm(.975)*se.t.5,t.5+qnorm(.975)*se.t.5)
CI.t5
[1] 0.8704448 4.7993174

se.t.75<-sqrt((t.75^2)*(.158+.151+(z.75^2)*var.b.hat-2*.092))
CI.t75<-c(t.75-qnorm(.975)*se.t.75,t.75+qnorm(.975)*se.t.75)
CI.t75
[1] 1.636095 10.865761

In the presence of covariate x, the intervals (0.3365032, 2.2348113), (0.8704448, 4.7993174) and
(1.636095, 10.865761) will contain the true value of 1st, 2nd and 3rd quartile survival time of the
population respectively with 0.95 probability.

c. Estimate the first quartile, median, and third quartile survival times when x=0 and also
find 95% confidence interval for these survival times.

When, x= 0, from equation (i),

12 | P a g e
the pth quantile survival time
𝑡𝑝 = 𝑒 (𝛽0 +𝑏𝑧𝑝 )
z.25<-qnorm(.25)
t.25<-exp(.105+b.hat*z.25)
z.5<-qnorm(.5)
t.5<-exp(.105+b.hat*z.5)
z.75<-qnorm(.75)
t.75<-exp(.105+b.hat*z.75)
quartiles<-cbind(t.25,t.5,t.75)
quartiles
t.25 t.5 t.75
[1,] 0.5037224 1.110711 2.449123

In the absence of covariate x, 25% observations have survival time less than or equal to 0.5037224,
50% observations have survival time less than or equal to 1.110711 and 75% observations have survival
time less than or equal to 2.449123.

Now, when x=0, from equation (2),

95% confidence interval for the above survival times

𝑡𝑝 ± 𝑧(1−𝛼) √𝑉𝑎𝑟 (𝑡𝑝 ) ; 𝛼 = 0.05

Where,
2
̂0 ) + 𝑧𝑝 2 𝑉𝑎𝑟 (𝑏̂))
√𝑉𝑎𝑟 (𝑡𝑝 ) =√(𝑡𝑝 ) (𝑉𝑎𝑟 (𝛽

#confidence intervals
se.t.25<-sqrt((t.25^2)*(.158+(z.25^2)*var.b.hat))
CI.t25<-c(t.25-qnorm(.975)*se.t.25,t.25+qnorm(.975)*se.t.25)
CI.t25

[1] 0.09085392 0.91659091

se.t.5<-sqrt((t.5^2)*(.158+(z.5^2)*var.b.hat))
CI.t5<-c(t.5-qnorm(.975)*se.t.5,t.5+qnorm(.975)*se.t.5)
CI.t5

[1] 0.245389 1.976032

se.t.75<-sqrt((t.75^2)*(.158+(z.75^2)*var.b.hat))
CI.t75<-c(t.75-qnorm(.975)*se.t.75,t.75+qnorm(.975)*se.t.75)
CI.t75

13 | P a g e
[1] 0.4417362 4.4565095

In the absence of covariate x, the intervals (0.09085392, 0.91659091), (0.245389, 1.976032) and
(0.4417362, 4.4565095) will contain the true value of 1st, 2nd and 3rd quartile survival time of the
population respectively with 0.95 probability.

Problem 5: Weibull AFT Model with Real Life Data

The SPSS file “infant_sur.dat” is the data set on the infant mortality of Bangladesh with some selected
variables. The variable ‘TIME’ measures the time to death in months and the variable ‘CHSURV’
indicates whether the observation is censored or not. Dataset link: Biostat Data
Using the data set,

a. Fit a Weibull AFT regression model. Identify the potential risk factors associated with infant
mortality and interpret the results.
b. Find the overall survival probabilities and curve.
c. Find the survival probabilities and curves for the variable ‘Wealth Index’.

Solution:

# Read the data

inf.data<-read.table("F:/Mostakim/5th Masters/Stat MS-508; Data Analysis 3 Biostat and
Econometrics/Practical_AFT/infant_sur.dat", header=T)

dim(inf.data)
names(inf.data)
attach(inf.data)
TIME<-ifelse(TIME==0, 0.5, TIME) #Those children who have not survived up to at least 1 month, i.e.,
TIME=0 have been replaced with TIME=0.5 because log(0)=infinity
# Load the library file
library(survival)
aft.weibull<-survreg(Surv(TIME,
CHSURV)~AGE+AGESQ+RELIGION+MEDIA+PLACE+NGO+PRIMARY+SECONDAR+HIGHER+POOR+RICH,

14 | P a g e
dist="weibull")
summary(aft.weibull)

Call:
survreg(formula = Surv(TIME, CHSURV) ~ AGE + AGESQ + RELIGION +
MEDIA + PLACE + NGO + PRIMARY + SECONDAR + HIGHER + POOR +
RICH, dist = "weibull")
Value Std. Error z p
(Intercept) 5.18164 1.87120 2.77 0.00562
AGE 0.33773 0.11104 3.04 0.00235
AGESQ -0.00565 0.00163 -3.46 0.00054
RELIGION -0.23318 0.48098 -0.48 0.62781
MEDIA -0.11664 0.31613 -0.37 0.71215
PLACE 0.40775 0.31807 1.28 0.19985
NGO 0.51291 0.28782 1.78 0.07474
PRIMARY 0.61883 0.32604 1.90 0.05770
SECONDAR 1.45686 0.41865 3.48 0.00050
HIGHER 3.99143 1.13256 3.52 0.00042
POOR 0.26368 0.36947 0.71 0.47542
RICH 0.12798 0.38874 0.33 0.74199
Log(scale) 0.86594 0.05593 15.48 < 2e-16
Scale= 2.38

Weibull distribution
Loglik(model)= -1938.3 Loglik(intercept only)= -1972.5
Chisq= 68.47 on 11 degrees of freedom, p= 2.4e-10
Number of Newton-Raphson Iterations: 10
n= 9845

exp(aft.weibull$coef) # Odds ratio

(Intercept) AGE AGESQ RELIGION MEDIA PLACE

177.9745550 1.4017627 0.9943621 0.7920080 0.8899057 1.5034380

NGO PRIMARY SECONDAR HIGHER POOR RICH

1.8567507 4.2924418 54.1322478 1.3017139 1.1365313 1.6701395

Output:

Covariate Regression Standard p-value 𝒆𝜷 (Ratio of mean/quantile

Coefficient Error survival time)

Intercept 5.1816 1.8712 0.0056 177.9746

Age 0.3377 0.1114 0.0024 1.4018

Age Square -0.0056 0.0016 0.0005 0.9944

Religion -0.2332 0.4810 0.6278 0.7920

Media -0.1166 0.3161 0.7121 0.8899

Place 0.4078 0.3181 0.1999 1.5034

NGO 0.5129 0.2878 0.0747 1.6701

15 | P a g e
Education

Primary 0.6188 0.3260 0.0577 1.8568

Secondary 1.4569 0.4187 0.0005 4.2924

Higher 3.9914 1.1327 0.0004 54.1322

Wealth Index

Poor 0.2639 0.3695 0.4754 1.3017

Rich 0.1280 0.3887 0.7420 1.1365

Hypothesis:

𝐻0 = All covariates have no effect 𝑜𝑛 𝑡ℎ𝑒 𝑠𝑢𝑟𝑣𝑖𝑣𝑎𝑙 𝑡𝑖𝑚𝑒

𝑣𝑠 𝐻1 = At least one covariate has effect on the survival time

Since p-value of the overall Weibull survival regression model is 2.4 × 10−10 < 𝛼 = 0.05, hence we
may reject the null hypothesis at 5% level of significance. That is, at least one covariate has effect on
the survival time.

From the results of p-value we can identify the potential factors associated with infant mortality. Age
and Education is potential risk factors associated with infant mortality at 5% level of significance where
NGO may be a potential risk factors associated with infant mortality at 10% level of significance.

Interpretation: Interpretation has been provided only for covariates which are significant.

• Regression Coefficient for Age

Since the covariates age has a quadratic effect, it has a slightly different interpretation.
𝑙𝑛𝑇 = 𝜇̂ + 0.337𝐴𝑔𝑒 − 0.006𝐴𝑔𝑒 2
𝛿𝑙𝑛𝑇
𝐹𝑜𝑟 𝑡ℎ𝑒 𝑙𝑒𝑣𝑒𝑙 𝑜𝑓 𝑎𝑔𝑒 𝑎𝑡 𝑚𝑎𝑥𝑖𝑚𝑢𝑚 𝑚𝑒𝑎𝑛 𝑠𝑢𝑟𝑣𝑖𝑣𝑎𝑙 𝑡𝑖𝑚𝑒: =0
𝛿𝐴𝑔𝑒
0.337 − 2 × 0.006𝐴𝑔𝑒 = 0
𝑜𝑟, 𝐴𝑔𝑒 ≈ 28 𝑦𝑒𝑎𝑟𝑠
2
Since 𝐴𝑔𝑒 has (-) ve sign it will have a flipped U graph:

16 | P a g e
lnT 28 Years Time

At first with increasing of Age, log of Time will increase and will get to maximum of 28 years
and then log of time will decrease with the increasing of Age keeping all other covariates at a
fixed level.

• Regression Coefficient for NGO

Exponentiated regression coefficient for NGO= 1.67, therefore for mothers who are members
of NGO compared to mothers with no NGO membership, the mean survival time of child
increases by (1.67-1) * 100% = 67 %, keeping all other covariates at a fixed level.
• Exponentiated Regression Coefficient for Primary=1.8568
Therefore, for mothers who have primary level education compared to mothers having no
education, the mean survival time of child increases by (1.8568-1) * 100% = 85.68%, keeping
all other covariates at a fixed level.
• Exponentiated Regression Coefficient for Secondary=4.2924
Therefore, for mothers who have secondary level education compared to mothers having no
education, the mean survival time of child increases by (4.2924-1) * 100% = 329.24%, keeping
all other covariates at a fixed level.
• Exponentiated Regression Coefficient for Higher=54.1322
Therefore, for mothers who have primary level education compared to mothers having no
education, the mean survival time of child increases by (54.1322-1) * 100% = 5313.22%,
keeping all other covariates at a fixed level.

b. Find the overall survival probabilities and curve.

# Scale parameter of the baseline Weibull distribution

lambda<-exp(-aft.weibull$coef[1])
cat("Scale:", "\n")
#Shape parameter of baseline Weibull distribution

17 | P a g e
alpha<-1/aft.weibull$scale
cat("Shape parameter:", "\n")
# Overall survival probability (At mean values of covariates)

cov<-c(mean(AGE),mean(AGESQ), mean(RELIGION), mean(MEDIA), mean(PLACE), mean(NGO),

mean(PRIMARY), mean(SECONDAR), mean(HIGHER), mean(POOR), mean(RICH))

acf.wei<-exp(sum(cov*aft.weibull$coef[2:12]))
#Defining the time variable
time1<-TIME[CHSURV==1]
time2<-sort(time1)
time3<-unique(time2)
time<-c(0,time3)
sur.over<-exp(-(lambda*time/acf.wei)^alpha) #Overall survival probability
cbind(time, sur.over)

time sur.over
[1,] 0.0 1.0000000
[2,] 0.5 0.9922538
[3,] 1.0 0.9896450
[4,] 2.0 0.9861639
[5,] 3.0 0.9836120
[6,] 4.0 0.9815234
[7,] 5.0 0.9797236
[8,] 6.0 0.9781251
[9,] 7.0 0.9766769
[10,] 8.0 0.9753461
[11,] 9.0 0.9741101
[12,] 10.0 0.9729529
[13,] 11.0 0.9718622
[14,] 12.0 0.9708287

##Survival Curve
plot(time, sur.over, xlab="Survival Time", ylab="Survival Probabilities", ylim=c(0.96,1.0), type='s',
cex.lab=0.8)
title("Figure 1: Overall Survival Curve for Weibull Regression Model", cex.main=0.8)

18 | P a g e
The overall survival probabilities are quite high as the probabilities lie between 1 to 0.97. Over the span
of one year the survival probabilities show a gradual declining pattern but the decline is not too steep.

c. Find the survival probabilities and curves for the variable ‘Wealth Index’.

#Survival probability: Wealth Index

cov.poor<-c(mean(AGE),mean(AGESQ), mean(RELIGION), mean(MEDIA), mean(PLACE), mean(NGO),
mean(PRIMARY), mean(SECONDAR), mean(HIGHER), 1,0)
cov.mid<-c(mean(AGE),mean(AGESQ), mean(RELIGION), mean(MEDIA), mean(PLACE), mean(NGO),
mean(PRIMARY), mean(SECONDAR), mean(HIGHER), 0,0)
cov.rich<-c(mean(AGE),mean(AGESQ), mean(RELIGION), mean(MEDIA), mean(PLACE), mean(NGO),
mean(PRIMARY), mean(SECONDAR), mean(HIGHER), 0,1)

acf.wei.poor<-exp(sum(cov.poor*aft.weibull$coef[2:12]))
acf.wei.mid<-exp(sum(cov.mid*aft.weibull$coef[2:12]))
acf.wei.rich<-exp(sum(cov.rich*aft.weibull$coef[2:12]))

sur.poor<-exp(-(lambda*time/acf.wei.poor)**alpha)
sur.mid<-exp(-(lambda*time/acf.wei.mid)**alpha)
sur.rich<-exp(-(lambda*time/acf.wei.rich)**alpha)
cbind(time, sur.poor, sur.mid, sur.rich)

time sur.poor sur.mid sur.rich

[1,] 0.0 1.0000000 1.0000000 1.0000000
[2,] 0.5 0.9926100 0.9917467 0.9921776
[3,] 1.0 0.9901207 0.9889682 0.9895433
[4,] 2.0 0.9867983 0.9852612 0.9860282
[5,] 3.0 0.9843625 0.9825442 0.9834515

19 | P a g e
[6,] 4.0 0.9823687 0.9803209 0.9813427
[7,] 5.0 0.9806504 0.9784053 0.9795255
[8,] 6.0 0.9791242 0.9767042 0.9779116
[9,] 7.0 0.9777414 0.9751630 0.9764493
[10,] 8.0 0.9764706 0.9737470 0.9751057
[11,] 9.0 0.9752903 0.9724321 0.9738579
[12,] 10.0 0.9741851 0.9712009 0.9726895
[13,] 11.0 0.9731434 0.9700407 0.9715884
[14,] 12.0 0.9721563 0.9689413 0.9705449

#Survival Curves
plot(time, sur.mid, xlab="Survival Time", ylab="Survival Probabilities", col="black", ylim=c(0.95 ,1.0),
type='s', cex.lab=0.8)
lines(time, sur.poor, type='s', col="red")
lines(time, sur.rich, type='s', col="blue")
legend(0.75,.965, c("Middle", "Poor", "Rich"), lty=c(1,1,1), col=c("black", "red", "blue"), cex=0.8)
title("Figure 2: Survival Curves for Wealth Index: Weibull Regression Model", cex.main=0.8)

Comment: Children born to mothers who are poor have the highest probability of survival during
infancy and those born to mothers who are middle class have the least probability of survival during
infancy as observed from the survival curves.

20 | P a g e
Problem 6: Log-Normal AFT Model with Real Life Data
The SPSS file “infant_sur.dat” is the data set on the infant mortality of Bangladesh with some selected
variables. The variable ‘TIME’ measures the time to death in months and the variable ‘CHSURV’
indicates whether the observation is censored or not. Dataset link: Biostat Data
Using the data set,

a. Fit a Log-Normal AFT regression model. Identify the potential risk factors associated with
infant mortality and interpret the results.
b. Find the overall survival probabilities and curve.
c. Find the survival probabilities and curves for the variables ‘Place of Residence’, ‘Education’,
‘Wealth Index’.

Solution:

The relevant R codes are provided. Try yourself the outputs and interpretation!

# AFT Model: Log-Normal

#a.

# Read the data

inf.data<-read.table("F:/Mostakim/5th Masters/Stat MS-508; Data Analysis 3 Biostat and
Econometrics/Practical_AFT/infant_sur.dat", header=T)
attach(inf.data)
TIME<-ifelse(TIME==0, 0.5, TIME) #Those children who have not survived up to at least 1 month, i.e.,
TIME=0 have been replaced with TIME=0.5 because log(0)=infinity
aft.lognorm<-survreg(Surv(TIME,
CHSURV)~AGE+AGESQ+RELIGION+MEDIA+PLACE+NGO+PRIMARY+SECONDAR+HIGHER+POOR+RICH,
dist="lognormal")
summary(aft.lognorm)
exp(aft.lognorm$coef)

#b.
#Overall Survival Probability
#Mean of the baseline Log-Normal distribution
mu<-aft.lognorm$coef[1]
cat("Mean:", "\n")
mu

21 | P a g e
#Scale parameter of baseline Log-Normal distribution
alpha<-aft.lognorm$scale
cat("Scale:", "\n")
alpha
#Overall survival probability (At mean values of covariates)
cov<-c(mean(AGE),mean(AGESQ), mean(RELIGION), mean(MEDIA), mean(PLACE), mean(NGO),
mean(PRIMARY), mean(SECONDAR), mean(HIGHER), mean(POOR), mean(RICH))
acf.lognorm<-sum(cov*aft.lognorm$coef[2:12])
time1<-TIME[CHSURV==1]
time2<-sort(time1)
time3<-unique(time2)
time<-c(0,time3)
sur.over.ln<-1-pnorm((log(time)-acf.lognorm-mu)/alpha)
cbind(time,sur.over.ln)
#Survival Curve
plot(time, sur.over.ln, xlab="Survival Time", ylab="Survival Probabilities", ylim=c(0.96,1.0), type='s',
cex.lab=0.8)
title("Figure 1: Overall Survival Curve for Log-Normal Regression Model", cex.main=0.8)

#c.
# Survival probability: Place of Residence
cov.u<-c(mean(AGE),mean(AGESQ), mean(RELIGION), mean(MEDIA), 1, mean(NGO),
mean(PRIMARY), mean(SECONDAR), mean(HIGHER), mean(POOR), mean(RICH))
cov.r<-c(mean(AGE),mean(AGESQ), mean(RELIGION), mean(MEDIA), 0, mean(NGO),
mean(PRIMARY), mean(SECONDAR), mean(HIGHER), mean(POOR), mean(RICH))
acf.ln.u<-sum(cov.u*aft.lognorm$coef[2:12])
acf.ln.r<-sum(cov.r*aft.lognorm$coef[2:12])
sur.ur<-1-pnorm((log(time)-acf.ln.u-mu)/alpha)
sur.ru<-1-pnorm((log(time)-acf.ln.r-mu)/alpha)
#Survival curves: Urban versus Rural
plot(time, sur.ur, xlab="Survival Time", ylab="Survival Probabilities", col="black", ylim=c(0.95 ,1.0),
type='s', cex.lab=0.8)
lines(time, sur.ru, type='s', col="red")
legend(6,.99, c("Urban", "Rural"), lty=c(1,1), col=c("black", "red"), cex=0.8)

22 | P a g e
title("Figure 2: Survival Curves for Place of Residence: Log-Normal Regression Model", cex.main=0.8)
#Survival probability: Education
cov.n<-c(mean(AGE),mean(AGESQ), mean(RELIGION), mean(MEDIA), mean(PLACE), mean(NGO),
0,0,0, mean(POOR), mean(RICH))
cov.p<-c(mean(AGE),mean(AGESQ), mean(RELIGION), mean(MEDIA), mean(PLACE), mean(NGO),
1,0,0, mean(POOR), mean(RICH))
cov.s<-c(mean(AGE),mean(AGESQ), mean(RELIGION), mean(MEDIA), mean(PLACE), mean(NGO),
0,1,0, mean(POOR), mean(RICH))
cov.h<-c(mean(AGE),mean(AGESQ), mean(RELIGION), mean(MEDIA), mean(PLACE), mean(NGO),
0,0,1, mean(POOR), mean(RICH))
acf.ln.n<-sum(cov.n*aft.lognorm$coef[2:12])
acf.ln.p<-sum(cov.p*aft.lognorm$coef[2:12])
acf.ln.s<-sum(cov.s*aft.lognorm$coef[2:12])
acf.ln.h<-sum(cov.h*aft.lognorm$coef[2:12])
sur.ne<-1-pnorm((log(time)-acf.ln.n-mu)/alpha)
sur.pe<-1-pnorm((log(time)-acf.ln.p-mu)/alpha)
sur.se<-1-pnorm((log(time)-acf.ln.s-mu)/alpha)
sur.he<-1-pnorm((log(time)-acf.ln.h-mu)/alpha)
#Survival Curves
plot(time, sur.ne, xlab="Survival Time", ylab="Survival Probabilities", col="black", ylim=c(0.95 ,1.0),
type='s', cex.lab=0.8)
lines(time, sur.pe, type='s', col="red")
lines(time, sur.se, type='s', col="blue")
lines(time, sur.he, type='s', col="green3")
legend(0.75,.965, c("No Education", "Primary", "Secondary", "Higher"), lty=c(1,1,1,1), col=c("black",
"red", "blue", "green3"), cex=0.8)
title("Figure 3: Survival Curves for Education: Log-Normal Regression Model", cex.main=0.8)
#Survival probability: Wealth Index
cov.poor<-c(mean(AGE),mean(AGESQ), mean(RELIGION), mean(MEDIA), mean(PLACE), mean(NGO),
mean(PRIMARY), mean(SECONDAR), mean(HIGHER), 1,0)
cov.mid<-c(mean(AGE),mean(AGESQ), mean(RELIGION), mean(MEDIA), mean(PLACE), mean(NGO),
mean(PRIMARY), mean(SECONDAR), mean(HIGHER), 0,0)
cov.rich<-c(mean(AGE),mean(AGESQ), mean(RELIGION), mean(MEDIA), mean(PLACE), mean(NGO),
mean(PRIMARY), mean(SECONDAR), mean(HIGHER), 0,1)

23 | P a g e
acf.ln.poor<-sum(cov.poor*aft.lognorm$coef[2:12])
acf.ln.mid<-sum(cov.mid*aft.lognorm$coef[2:12])
acf.ln.rich<-sum(cov.rich*aft.lognorm$coef[2:12])
sur.poor<-1-pnorm((log(time)-acf.ln.poor-mu)/alpha)
sur.mid<-1-pnorm((log(time)-acf.ln.mid-mu)/alpha)
sur.rich<-1-pnorm((log(time)-acf.ln.rich-mu)/alpha)
# Survival Curves
plot(time, sur.mid, xlab="Survival Time", ylab="Survival Probabilities", col="black", ylim=c(0.95 ,1.0),
type='s', cex.lab=0.8)
lines(time, sur.poor, type='s', col="red")
lines(time, sur.rich, type='s', col="blue")
legend(0.75,.965, c("Middle", "Poor", "Rich"), lty=c(1,1,1), col=c("black", "red", "blue"), cex=0.8)
title("Figure 4: Survival Curves for Wealth Index: Log-Normal Regression Model", cex.main=0.8)

Problem 7: Log-Logistic AFT Model with Real Life Data

a. Fit a Log-Logistic AFT regression model. Identify the potential risk factors associated with
infant mortality and interpret the results.
b. Estimate the odds ratio of infant mortality prior to a specified time point for NGO members and
interpret the results.
c. Find the survival probabilities and curves for the variable ‘Wealth Index’.

Solution:

# Load the library file

library(survival)
# Read the data
inf.data<-read.table("F:/Mostakim/5th Masters/Stat MS-508; Data Analysis 3 Biostat and
Econometrics/Practical_AFT/infant_sur.dat", header=T)

24 | P a g e
attach(inf.data)
TIME<-ifelse(TIME==0, 0.5, TIME) #Those children who have not survived up to at least 1 month, i.e.,
TIME=0 have been replaced with TIME=0.5 because log(0)=infinity
# AFT Model: Log-Logistic
aft.loglogist<-survreg(Surv(TIME,
CHSURV)~AGE+AGESQ+RELIGION+MEDIA+PLACE+NGO+PRIMARY+SECONDAR+HIGHER+POOR+RICH,
dist="loglogistic")
summary(aft.loglogist)

Call:
survreg(formula = Surv(TIME, CHSURV) ~ AGE + AGESQ + RELIGION +
MEDIA + PLACE + NGO + PRIMARY + SECONDAR + HIGHER + POOR +
RICH, dist = "loglogistic")
Value Std. Error z p
(Intercept) 4.93053 1.88492 2.62 0.00890
AGE 0.34469 0.11194 3.08 0.00208
AGESQ -0.00577 0.00165 -3.50 0.00047
RELIGION -0.24180 0.48371 -0.50 0.61716
MEDIA -0.11245 0.31840 -0.35 0.72397
PLACE 0.40632 0.32005 1.27 0.20424
NGO 0.51006 0.28947 1.76 0.07805
PRIMARY 0.62718 0.32834 1.91 0.05611
SECONDAR 1.46941 0.41997 3.50 0.00047
HIGHER 3.98728 1.12373 3.55 0.00039
POOR 0.26597 0.37275 0.71 0.47551
RICH 0.12751 0.39157 0.33 0.74470
Log(scale) 0.85313 0.05556 15.36 < 2e-16

Scale= 2.35

Log logistic distribution

Loglik(model)= -1937.2 Loglik(intercept only)= -1971.6
Chisq= 68.8 on 11 degrees of freedom, p= 2.1e-10
Number of Newton-Raphson Iterations: 8
n= 9845

exp(aft.loglogist$coef)

(Intercept) AGE AGESQ RELIGION MEDIA PLACE

138.4533473 1.4115567 0.9942504 0.7852166 0.8936461 1.5012870
NGO PRIMARY SECONDAR HIGHER POOR RICH
1.6653974 1.8723168 4.3466686 53.9082094 1.3047007 1.1359944

Interpretation and output try yourself!

For a specific covariate 𝑥𝑗 , keeping all other covariates at a fixed level, the OR becomes:
𝛾
𝑂𝑅 = [exp{−𝛽𝑗 (𝑥1𝑗 − 𝑥2𝑗 )}]

For log-logistic the parameters for are,

25 | P a g e
location, 𝛼 = 𝑒 𝜇 = 𝑒 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡

1 1
scale, 𝛾 = 𝜎 = 𝑠𝑐𝑎𝑙𝑒

alpha<-exp(aft.loglogist$coefficient[1])
gamma<-1/aft.loglogist$scale
exp(-aft.loglogist$coef["NGO"])^gamma

NGO
0.8046661

Interpretation: The odds of infant mortality decreases by 19.5% for NGO member mothers compared
to that of non-members mothers, keeping all other covariates at fixed level.

Try yourself!

Model Based on Hazard Function

One can assess the influence of a set of covariates on survival time by constructing regression model
using hazard function. Two types of such survival regression models are available in literature. These
are:

i. Multiplicative hazard model

ii. Additive hazard model

Multiplicative hazard models are most population because of computational simplicity and
interpretation. We’ll do the practical based on multiplicative hazard model. Multiplicative hazard
model is also known as proportional hazard (PH) model.

In the presence of covariates x, the multiplicative hazard (/PH) model is given by:

h(t) = ℎ0 (t) ∗ exp(𝑥 ′ 𝛽)

where ℎ(𝑡)𝑎𝑛𝑑 ℎ0 (𝑡) be the hazard functions in the presence and absence of covariates respectively.

26 | P a g e
• When the baseline hazard function is defined parametrically, the PH model is known as
parametric PH model.
• When the baseline hazard function is left as an arbitrary function, the PH model is known as
semi parametric PH model.

Problem 8: Weibull Parametric PH Model

Using the data set “infant_sur.dat”. Dataset link: Biostat Data

a. Find the potential risk factors of infant mortality by fitting Weibull parametric PH model.
b. Find the overall survival probabilities and curve.
c. Find the survival probabilities and curves for variables ‘Type of place of residence’,
‘Education’, and ‘Wealth Index’.

Solution:

# Read the data

inf.data<-read.table("F:/Mostakim/5th Masters/Stat MS-508; Data Analysis 3 Biostat and
Econometrics/Practical_AFT/infant_sur.dat", header=T)
attach(inf.data)
TIME<-ifelse(TIME==0, 0.5, TIME)
# Load the library file
library(survival)
library(eha)
#PH Model: Weibull
ph.weibull<-phreg(Surv(TIME,
CHSURV)~AGE+AGESQ+RELIGION+MEDIA+PLACE+NGO+PRIMARY+SECONDAR+HIGHER+POOR+RICH,
dist="weibull")
summary(ph.weibull)

Covariate Mean Coef Rel.Risk S.E. LR p

AGE 32.077 -0.142 0.868 0.046 0.0026
AGESQ 1107.987 0.002 1.002 0.001 0.0006
RELIGION 0.904 0.098 1.103 0.202 0.6231
MEDIA 0.635 0.049 1.050 0.133 0.7120
PLACE 0.378 -0.172 0.842 0.133 0.1956
NGO 0.392 -0.216 0.806 0.121 0.0708
PRIMARY 0.305 -0.260 0.771 0.136 0.0546
SECONDAR 0.281 -0.613 0.542 0.173 0.0003
HIGHER 0.071 -1.679 0.187 0.467 0.0000

27 | P a g e
POOR 0.347 -0.111 0.895 0.155 0.4772
RICH 0.463 -0.054 0.948 0.163 0.7424

Events 312
Total time at risk 108168
Max. log. likelihood -1938.3
LR test statistic 68.47
Degrees of freedom 11
Overall p-value 2.37972e-10

Since p-value of the overall Weibull PH model is 2.37972e-10 < 0.05, hence we may reject the null
hypothesis at 5% level of significance. That is, at least one covariate has effect on the survival time.

Interpretation:

• Coefficient for covariate Age is -0.142 and 𝐴𝑔𝑒 2 is 0.002.

Since the covariates age has a quadratic effect, it has a slightly different interpretation.
𝑙𝑛ℎ(𝑇) = 𝜇̂ − 0.142𝐴𝑔𝑒 + 0.002𝐴𝑔𝑒 2
𝛿𝑙𝑛ℎ(𝑇)
𝐹𝑜𝑟 𝑙𝑒𝑣𝑒𝑙 𝑜𝑓 𝑎𝑔𝑒 𝑎𝑡 𝑚𝑖𝑛𝑖𝑚𝑢𝑚 𝑙𝑒𝑣𝑒𝑙 𝑜𝑓 ℎ𝑎𝑧𝑎𝑟𝑑: =0
𝛿𝐴𝑔𝑒
−0.142 + 2 × 0.002𝐴𝑔𝑒 = 0
𝑜𝑟, 𝐴𝑔𝑒 ≈ 35.5 𝑦𝑒𝑎𝑟𝑠
Since 𝐴𝑔𝑒 2 has (+) ve sign it will have a U graph:
lnh(T)

35.5 Years Time

The log hazard of infant mortality is highest at both early and late maternal ages, while it is lowest at
middle maternal ages. Specifically, the log hazard reaches its minimum at approximately 35.5 years of
age, keeping all other covariates fixed.

• Hazard ratio for covariate Secondary Education is 0.542

Therefore, for mothers who have secondary level education compared to mothers having no education,
hazard rate of infant mortality decreases by (1- 0.542) * 100% = 45.8%, keeping all other covariates at
a fixed level.

28 | P a g e
• Hazard ratio for covariate Higher Education is 0.187

Therefore, for mothers who have higher level education compared to mothers having no education,
hazard rate of infant mortality decreases by (1- 0.187) * 100% = 81.3%, keeping all other covariates at
a fixed level.

b. Find overall survival probabilities and curve.

# Overall survival probability (At mean values of covariates)

# Scale parameter of the baseline Weibull distribution
lambda<-exp(ph.weibull$coef[12])
cat("Scale:", "\n")

# Shape parameter of baseline Weibull distribution

alpha<-exp(ph.weibull$coef[13])
cat("Shape parameter:", "\n")

cov<-c(mean(AGE),mean(AGESQ), mean(RELIGION), mean(MEDIA), mean(PLACE),mean(NGO),

mean(PRIMARY), mean(SECONDAR),mean(HIGHER), mean(POOR), mean(RICH))
mf.wei<-exp(sum(cov*ph.weibull$coef[1:11]))
time1<-TIME[CHSURV==1]
time2<-sort(time1)
time3<-unique(time2)
time<-c(0,time3)
sur.over<-(exp(-(lambda*time)**alpha))**mf.wei
cbind(time,sur.over)

time sur.over
[1,] 0.0 1.00000000
[2,] 0.5 0.54434692
[3,] 1.0 0.44305796
[4,] 2.0 0.33633738
[5,] 3.0 0.27464371
[6,] 4.0 0.23257985
[7,] 5.0 0.20148122
[8,] 6.0 0.17732651
[9,] 7.0 0.15792356
[10,] 8.0 0.14194978
[11,] 9.0 0.12854889
[12,] 10.0 0.11713692
[13,] 11.0 0.10729924
[14,] 12.0 0.09873189

29 | P a g e
##Survival Curve
plot(time, sur.over, xlab="Survival Time", ylab="Survival Probabilities", ylim=c(0.096,1.0),type='s', ce
x.lab=0.8)
title("Figure 1: Overall Survival Curve for Weibull PH Regression Model", cex.main=0.8)

The overall survival probabilities are decreasing as the survival probabilities show a sharp declining
pattern during the period of infancy.

c. Find Survival probabilities and curves for place of residence.

#Survival probability: Place of residence

cov.u<-c(mean(AGE),mean(AGESQ), mean(RELIGION), mean(MEDIA), 1,
mean(NGO),mean(PRIMARY), mean(SECONDAR),mean(HIGHER), mean(POOR), mean(RICH))
cov.r<-c(mean(AGE),mean(AGESQ), mean(RELIGION), mean(MEDIA), 0,
mean(NGO),mean(PRIMARY), mean(SECONDAR),mean(HIGHER), mean(POOR), mean(RICH))
mf.wei.u<-exp(sum(cov.u*ph.weibull$coef[1:11]))
mf.wei.r<-exp(sum(cov.r*ph.weibull$coef[1:11]))
sur.ur<-(exp(-(lambda*time)**alpha))**mf.wei.u
sur.ru<-(exp(-(lambda*time)**alpha))**mf.wei.r

##Survival curves: Urban versus rural

plot(time, sur.ur, xlab="Survival Time", ylab="Survival Probabilities", col="black", ylim=c(0.08,1.0),
type='s', cex.lab=0.8)

30 | P a g e
lines(time, sur.ru, type='s', col="red")
legend(6,.99, c("Urban", "Rural"), lty=c(1,1), col=c("black", "red"), cex=0.8)
title("Figure 2: Survival Curves for Place of Residence: Weibull PH Regression Model",cex.main=0.8)

Comment: Infants born to mothers who reside in urban areas have higher probability of survival
compared to infants born to mothers who reside in rural areas as observed from the survival curves.

Similarly try for Education and Wealth Index!

Problem 9: Inter-relationship between Weibull PH and Weibull AFT Model

̂
Check the relationship of (𝛽 ̂
𝑃𝐻 ) = −𝛼 ∗ (𝛽𝐴𝐹𝑇 ) between Weibull PH and Weibull AFT model by

comparing the result of Problem 5 and Problem 8.

Solution:

aft.weibull<-survreg(Surv(TIME,
CHSURV)~AGE+AGESQ+RELIGION+MEDIA+PLACE+NGO+PRIMARY+SECONDAR+HIGHER+POOR+RICH,
dist="weibull")
aft.rcoef<-aft.weibull$coef[2]
aft.scale<-exp(aft.weibull$coef[1])
aft.shape<-1/aft.weibull$scale

31 | P a g e
aft.res<-cbind(aft.rcoef,aft.scale,aft.shape)

ph.weibull<-phreg(Surv(TIME,
CHSURV)~AGE+AGESQ+RELIGION+MEDIA+PLACE+NGO+PRIMARY+SECONDAR+HIGHER+POOR+RICH,
dist="weibull")
ph.rcoef<-ph.weibull$coef[1]
ph.scale<-exp(ph.weibull$coef[2])
ph.shape<-exp(ph.weibull$coef[3])
ph.res<-cbind(ph.rcoef,ph.scale,ph.shape)

cbind(aft.res,ph.res)

aft.rcoef aft.scale aft.shape ph.rcoef ph.scale ph.shape

AGE 0.3377305 177.9746 0.4206546 -0.1420679 1.002381 1.103062

#Here
alpha<-0.4206546
beta_aft<-0.3377305
beta_ph<--alpha*beta_aft
beta_ph

[1] -0.1420679

Here, our calculated beta_ph = 𝛽̂

𝑃𝐻 = −0.142, therefore the inter-relationship is justified.

Problem 10: Semi-Parametric PH Model (Cox-PH Model)

Using the data set “infant_sur.dat”. Dataset link: Biostat Data

a. Find the potential risk factors of infant mortality by fitting semi parametric PH model.
b. Find the baseline hazard and plot them.
c. Find overall survival probabilities and curve.
d. Find the survival probabilities and curves for variables ‘Type of place of residence’,
‘Education’, and ‘Wealth Index’.

Solution:

# Read the data

inf.data<-read.table("F:/Mostakim/5th Masters/Stat MS-508; Data Analysis 3 Biostat and
Econometrics/Practical_AFT/infant_sur.dat", header=T)

32 | P a g e
attach(inf.data)
TIME<-ifelse(TIME==0, 0.5, TIME)
#Start of Cox PH Model
coxph.model<-coxph(Surv(TIME,CHSURV)~AGE+AGESQ+RELIGION+MEDIA+PLACE+NGO+
PRIMARY+SECONDAR+HIGHER+RICH+POOR)
summary(coxph.model)

Call:
coxph(formula = Surv(TIME, CHSURV) ~ AGE + AGESQ + RELIGION +
MEDIA + PLACE + NGO + PRIMARY + SECONDAR + HIGHER + RICH +
POOR)

n= 9845, number of events= 312

coef exp(coef) se(coef) z Pr(>|z|)

AGE -0.132870 0.875579 0.046061 -2.885 0.003918 **
AGESQ 0.002267 1.002270 0.000675 3.359 0.000782 ***
RELIGION 0.103947 1.109542 0.202245 0.514 0.607274
MEDIA 0.052445 1.053845 0.132946 0.394 0.693223
PLACE -0.170816 0.842977 0.133452 -1.280 0.200553
NGO -0.209118 0.811300 0.120553 -1.735 0.082801 .
PRIMARY -0.259219 0.771654 0.136393 -1.901 0.057364 .
SECONDAR -0.612225 0.542143 0.172873 -3.541 0.000398 ***
HIGHER -1.684126 0.185607 0.467115 -3.605 0.000312 ***
RICH -0.053956 0.947473 0.163549 -0.330 0.741468
POOR -0.112970 0.893177 0.155277 -0.728 0.466894
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

exp(coef) exp(-coef) lower .95 upper .95

AGE 0.8756 1.1421 0.8000 0.9583
AGESQ 1.0023 0.9977 1.0009 1.0036
RELIGION 1.1095 0.9013 0.7464 1.6493
MEDIA 1.0538 0.9489 0.8121 1.3675
PLACE 0.8430 1.1863 0.6490 1.0950
NGO 0.8113 1.2326 0.6406 1.0275
PRIMARY 0.7717 1.2959 0.5906 1.0081
SECONDAR 0.5421 1.8445 0.3863 0.7608
HIGHER 0.1856 5.3877 0.0743 0.4637
RICH 0.9475 1.0554 0.6876 1.3055
POOR 0.8932 1.1196 0.6588 1.2109

Concordance= 0.637 (se = 0.015 )

Likelihood ratio test= 69.63 on 11 df, p=1e-10
Wald test = 64.11 on 11 df, p=2e-09
Score (logrank) test = 69.46 on 11 df, p=2e-10

• Coefficient for covariate Age is -0.142 and 𝐴𝑔𝑒 2 is 0.002.

Since the covariates age has a quadratic effect, it has a slightly different interpretation.
𝑙𝑛ℎ(𝑇) = 𝜇̂ − 0.1329𝐴𝑔𝑒 + 0.0023𝐴𝑔𝑒 2
𝛿𝑙𝑛ℎ(𝑇)
𝐹𝑜𝑟 𝑙𝑒𝑣𝑒𝑙 𝑜𝑓 𝑎𝑔𝑒 𝑎𝑡 𝑚𝑖𝑛𝑖𝑚𝑢𝑚 𝑙𝑒𝑣𝑒𝑙 𝑜𝑓 ℎ𝑎𝑧𝑎𝑟𝑑: =0
𝛿𝐴𝑔𝑒
−0.1329 + 2 × 0.0023𝐴𝑔𝑒 = 0
𝑜𝑟, 𝐴𝑔𝑒 ≈ 29 𝑦𝑒𝑎𝑟𝑠

33 | P a g e
Since 𝐴𝑔𝑒 2 has (+) ve sign it will have a U graph:

lnh(T)
29 Years Age

The log hazard of infant mortality is highest at both early and late maternal ages, while it is lowest at
the middle period of maternal ages. Specifically, the log hazard reaches its minimum at approximately
29 years of age, keeping all other covariates fixed. That means, mother aged around 29 years of age are
more likely to have healthy babies.

b. Find the baseline hazard function and plot them.

# Estimated baseline hazard

base.haz<-basehaz(coxph.model)
base.haz
plot(base.haz$time, base.haz$hazard, xlab="Time", ylab="Hazard Rate", type="h")
title("Baseline Hazard Function for COX PH Model")

34 | P a g e
c. Find overall survival function and curve.

# Overall survival function (At mean values of covariates)

cox.sur<-survfit(coxph.model)
summary(cox.sur)
plot(cox.sur, xlab="Time", ylab="Survival Probability", ylim=c(0.92,1))
title("Overall Survival Curve for COX PH Model")

35 | P a g e
d. Find the survival probabilities and curves for variables ‘Type of place of residence’,
‘Education’, and ‘Wealth Index’.

## Survival curves for the type of place of residence

## Two types 0: Rural and 1: Urban

data1<-data.frame(AGE=rep(mean(AGE),2),AGESQ=rep(mean(AGESQ),2),
RELIGION=rep(mean(RELIGION),2), MEDIA=rep(mean(MEDIA),2), PLACE=c(0,1),
NGO=rep(mean(NGO),2), PRIMARY=rep(mean(PRIMARY),2),
SECONDAR=rep(mean(SECONDAR),2), HIGHER=rep(mean(HIGHER),2), POOR=rep(mean(POOR),2),
RICH=rep(mean(RICH),2))

sur.place<-survfit(coxph.model, newdata=data1)
summary(sur.place)
plot(sur.place, xlab="Time", ylab="Survival Probability", ylim=c(0.96,1), lty=1:2, col=c("black", "red"))
legend(6, 0.995, c("Rural", "Urban"), lty=1:2, col=c("black", "red"))
title("Survival curves for place of residence")

## Survival curves for Education

## No education, primary, secondary, and higher

data2<-data.frame(AGE=rep(mean(AGE),4), AGESQ=rep(mean(AGESQ),4),
RELIGION=rep(mean(RELIGION),4), MEDIA=rep(mean(MEDIA),4), PLACE=rep(mean(PLACE),4),
NGO=rep(mean(NGO),4), PRIMARY=c(0,1,0,0), SECONDAR=c(0,0,1,0), HIGHER=c(0,0,0,1),
POOR=rep(mean(POOR),4),RICH=rep(mean(RICH),4))

sur.educa<-survfit(coxph.model, newdata=data2)
summary(sur.educa)
plot(sur.educa, xlab="Time", ylab="Survival Probability", ylim=c(0.92,1), lty=1:4, col=c("black", "red",
"blue", "green3"))
legend(6, 0.955, c("No Edu", "Primary", "Secondary", "Higher"), lty=1:4, col=c("black", "red", "blue",
"green3"))
title("Survival curves for education")

##Survival curves for wealth index

## Poor, Middle, and Rich
data3<-data.frame(AGE=rep(mean(AGE),3),AGESQ=rep(mean(AGESQ),3),
RELIGION=rep(mean(RELIGION),3), MEDIA=rep(mean(MEDIA),3), PLACE=rep(mean(PLACE),3),
NGO=rep(mean(NGO),3), PRIMARY=rep(mean(PRIMARY),3),SECONDAR=rep(mean(SECONDAR),3),
HIGHER=rep(mean(HIGHER),3), POOR=c(1,0,0), RICH=c(0,0,1))

sur.wealth<-survfit(coxph.model, newdata=data3)
summary(sur.wealth)

plot(sur.wealth, xlab="Time", ylab="Survival Probability", ylim=c(0.95,1), lty=1:3, col=c("black",

36 | P a g e
"red", "blue"))
legend(6, 0.97, c("Poor", "Middle", "Rich"), lty=1:3, col=c("black", "red", "blue"))
title("Survival curves for wealth index")

37 | P a g e
Multiple Modes of Failure
Check Master’s Theory and Practical Lectures of Advanced Biostatistics.

Generalized Linear Models

Check 4th Year Theory and Practical Lectures of Generalized Linear Models.

Generalized Linear Mixed Models

Problem 11: Generalized Linear Mixed Model
You are given a data set on postnatal care of children (“postnatal.sav”) in Bangladesh where the variable
“pnc” is defined as a binary variable (1: received postnatal care within six weeks of delivery; 0:
otherwise). Description of control variables is provided in the data set. Dataset link: Biostat Data

a. Fit a GLMM taking into account the random effect of clusters and interpret the results.

b. Compute 95% confidence intervals for the odds ratios. Hence identify the potential
factors of receiving postnatal care for children.

c. Find the intra-cluster correlation and interpret the result.

Solution:

data_pnc<-read.csv("F:/Mostakim/5th Masters/Stat MS-508; Data Analysis 3 B

iostat and Econometrics/Practical_GLMM/postnatal.csv",header=T)

names(data_pnc)

attach(data_pnc)

library(glmmTMB)

model_rem1<-glmmTMB(pnc~factor(edu)+factor(media)+factor(w_index)+

factor(anc)+factor(ngo)+factor(resi)+factor(p_deli)+

factor(sex)+(1|cluster),zi=~0,family=binomial,data=d
ata_pnc)# (1|cluster) for Random intercept for each cluster and zi~0 for z
ero inflation

summary(model_rem1)

38 | P a g e
## Family: binomial ( logit )
## Formula:
## pnc ~ factor(edu) + factor(media) + factor(w_index) + factor(anc) +
## factor(ngo) + factor(resi) + factor(p_deli) + factor(sex) +
## (1 | cluster)
## Data: data_pnc
##
## AIC BIC logLik -2*log(L) df.resid
## 3790.1 3873.4 -1882.1 3764.1 4454
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## cluster (Intercept) 2.706 1.645
## Number of obs: 4467, groups: cluster, 593
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.30734 0.24750 -1.242 0.21432
## factor(edu)1 0.18566 0.14946 1.242 0.21415
## factor(edu)2 0.24569 0.15340 1.602 0.10924
## factor(edu)3 0.71341 0.25275 2.823 0.00476 **
## factor(media)1 0.37824 0.11955 3.164 0.00156 **
## factor(w_index)1 -0.14011 0.13796 -1.016 0.30982
## factor(w_index)2 0.11920 0.14974 0.796 0.42599
## factor(anc)1 0.58682 0.12282 4.778 1.77e-06 ***
## factor(ngo)1 -0.03300 0.11920 -0.277 0.78192
## factor(resi)1 -0.47236 0.19901 -2.374 0.01762 *
## factor(p_deli)1 3.48441 0.15670 22.236 < 2e-16 ***
## factor(sex)2 0.09783 0.09539 1.026 0.30509
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

coef<-coef(summary(model_rem1))$cond[,1]
se<-coef(summary(model_rem1))$cond[,2]

or<-exp(coef[2:12]) #Odds ratio

se.or<-or*se[2:12] #Standard error of odds ratio

cbind(or,se.or)

## or se.or
## factor(edu)1 1.2040181 0.1799544
## factor(edu)2 1.2785031 0.1961210
## factor(edu)3 2.0409342 0.5158523
## factor(media)1 1.4597085 0.1745142
## factor(w_index)1 0.8692644 0.1199197
## factor(w_index)2 1.1265970 0.1686947
## factor(anc)1 1.7982533 0.2208639
## factor(ngo)1 0.9675413 0.1153317
## factor(resi)1 0.6235307 0.1240889

39 | P a g e
## factor(p_deli)1 32.6031107 5.1089391
## factor(sex)2 1.1027794 0.1051972

Interpretation of Education:

Mothers with primary education level have 20% higher odds of receiving postnatal care within six
weeks of delivery compared to no education group, but it’s not statistically significant. Mothers with
higher education level have more than double the (+104%) odds of receiving postnatal care within six
weeks of delivery compared to no education group, and the p-value 0.04<0.05 suggests that it is
statistically significant at 5% level of significance.

In a similar manner, attempt to interpret the remaining variables yourself!

lower.ci<-or-1.96*se.or
upper.ci<-or+1.96*se.or
cbind(lower.ci,upper.ci)

## lower.ci upper.ci
## factor(edu)1 0.8513075 1.5567287
## factor(edu)2 0.8941059 1.6629002
## factor(edu)3 1.0298636 3.0520048
## factor(media)1 1.1176607 1.8017563
## factor(w_index)1 0.6342218 1.1043070
## factor(w_index)2 0.7959555 1.4572385
## factor(anc)1 1.3653601 2.2311465
## factor(ngo)1 0.7414913 1.1935914
## factor(resi)1 0.3803165 0.8667449
## factor(p_deli)1 22.5895900 42.6166313
## factor(sex)2 0.8965928 1.3089659

The relevant hypothesis:

𝐻𝑜 : 𝑂𝑅 = 1
If the confidence interval of odds ratio includes the value 1, then it is not statistically significant.
Therefore, from the above table, we can say that Education (especially higher education), media
exposure, antenatal care, residence (urban/rural), and place of delivery are important factors for
postnatal care.

Intra cluster correlation is defined as:

40 | P a g e
𝜎𝑢2
𝐼𝐶𝐶 = 2
𝜎𝑢 + 𝜎𝑒2
Where, 𝜎𝑢2 is variance of the cluster which we get from our output 2.706.
𝜋2
And 𝜎𝑒2 is the residual variance of the model which is 3
(For Binary Response)

sig_clu<-2.706
sig_resi<-pi^2/3
icc<-sig_clu/(sig_clu+sig_resi)
icc

## [1] 0.4513108

About 45% of the total variation in whether children received postnatal care can be explained by
differences between clusters.

This indicates that cluster-level factors (e.g., local healthcare services, community characteristics) play
a strong role in postnatal care coverage.

Problem 12: Generalized Linear Mixed Model

You are given a data set on birth weight of newborns (“birth_weight.sav”) in a country. Description of
the variables are provided in the data set. Identify potential determinants of birth weight by fitting a
GLMM taking into account the random effect of clusters and interpret the results. Also, find the intra-
cluster correlation and interpret the result. Dataset link: Biostat Data

Solution:

data<-read.csv("F:/Mostakim/5th Masters/Stat MS-508; Data Analysis 3 Biost

at and Econometrics/Practical_GLMM/birth_weight.csv",header=T)
names(data)

attach(data)

## The following objects are masked from data_pnc:

##
## anc, cluster, sex, w_index

library(glmmTMB)
model_rem2<-glmmTMB(birth_weight~factor(area)+factor(location)+factor(w_ed
u)+factor(w_media)+factor(w_index)+factor(violence)+ factor(b_order)+facto
r(anc)+Age_year+Age_year_sqr+(1|cluster), zi=~0, family=gaussian, data=dat
a)

summary(model_rem2)

41 | P a g e
## Family: gaussian ( identity )
## Formula:
## birth_weight ~ factor(area) + factor(location) + factor(w_edu) +
## factor(w_media) + factor(w_index) + factor(violence) + factor(b_ord
er) +
## factor(anc) + Age_year + Age_year_sqr + (1 | cluster)
## Data: data
##
## AIC BIC logLik -2*log(L) df.resid
## 7601.6 7715.2 -3782.8 7565.6 4054
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## cluster (Intercept) 0.007425 0.08617
## Residual 0.368021 0.60665
## Number of obs: 4072, groups: cluster, 2185
##
## Dispersion estimate for gaussian family (sigma^2): 0.368
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.2755416 0.2202667 10.331 < 2e-16 ***
## factor(area)2 0.0717613 0.0237573 3.021 0.002523 **
## factor(location)1 0.1109202 0.0283502 3.912 9.13e-05 ***
## factor(location)2 0.1049401 0.0271760 3.861 0.000113 ***
## factor(location)3 -0.0483205 0.0276711 -1.746 0.080769 .
## factor(w_edu)1 0.0426592 0.0626226 0.681 0.495737
## factor(w_edu)2 0.0680711 0.0592783 1.148 0.250832
## factor(w_edu)3 0.0897342 0.0616742 1.455 0.145677
## factor(w_media)1 -0.0187051 0.0248048 -0.754 0.450793
## factor(w_index)1 0.0360164 0.0302632 1.190 0.234006
## factor(w_index)2 0.0611729 0.0282450 2.166 0.030327 *
## factor(violence)1 0.0241818 0.0249576 0.969 0.332587
## factor(b_order)1 -0.0655412 0.0263496 -2.487 0.012869 *
## factor(anc)1 -0.0249231 0.0203968 -1.222 0.221741
## Age_year 0.0372948 0.0157723 2.365 0.018051 *
## Age_year_sqr -0.0005638 0.0002746 -2.053 0.040064 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Intra cluster correlation is defined as:

𝜎𝑢2
𝐼𝐶𝐶 = 2
𝜎𝑢 + 𝜎𝑒2
From output, 𝜎𝑢2 is variance of the cluster 0.007425
And 𝜎𝑒2 is the residual variance of the model 0.368021

Interpretation and calculation try yourself!

42 | P a g e
References
1. Practical and Theory Lectures of Fatima Tuz Zahura Mam

2. Survival Analysis Techniques for Censored and Truncated Data by John P. Klein and
Moeshberger.

3. Statistical Models and Methods for Lifetime Data by Jeral F. Lawless.

43 | P a g e
Three Weedings

Anyone who knew Abed in his youth would have told you that he
was destined to end up with a certain someone. But that someone
was not Haifa or Asmahan. It was a girl called Ghazl. They met in
the mid-1980s, when Anata was quiet and rural, more village than
town. Ghazl was a fourteen-year-old freshman at the Anata girls'
school. Abed was a senior at the boys' school across the street. Back
then, everyone knew each other in Anata. More than half the village
came from one of three large families all descended from the same
ancestor, a man named Alawi. Abed's family, the Salamas, was the
largest. Ghazl's, the Hamdans, was the second largest.

A Day in the Life of Abed Salama: Anatomy of a Jerusalem

44 | P a g e
Tragedy by Nathan Thrall (Winner of Pulitzer Prize Award)

Design Calculation For Milk Chilling Center
100% (2)
Design Calculation For Milk Chilling Center
10 pages
Physical Pharmaceutics
From Everand
Physical Pharmaceutics
Manavalan R
5/5 (3)
Husen Cood
No ratings yet
Husen Cood
24 pages
Jie Zhi Qi Thesis
No ratings yet
Jie Zhi Qi Thesis
89 pages
Ijssv12n2 15
No ratings yet
Ijssv12n2 15
12 pages
Survival Data Analysis HW3
No ratings yet
Survival Data Analysis HW3
5 pages
Factors Affecting Survival in Patients With Lung.13
No ratings yet
Factors Affecting Survival in Patients With Lung.13
7 pages
izzati fyp
No ratings yet
izzati fyp
21 pages
Accelerated Failure Time Models: Patrick Breheny
No ratings yet
Accelerated Failure Time Models: Patrick Breheny
25 pages
Research paper
No ratings yet
Research paper
9 pages
Biometrics 79 4 3066
No ratings yet
Biometrics 79 4 3066
16 pages
20250205112516
No ratings yet
20250205112516
16 pages
Assignment_3
No ratings yet
Assignment_3
2 pages
Handout 9 PDF
No ratings yet
Handout 9 PDF
79 pages
Lung Disease Prediction System Using Data Mining Techniques
No ratings yet
Lung Disease Prediction System Using Data Mining Techniques
6 pages
Assignment_3_2
No ratings yet
Assignment_3_2
1 page
PM510: Principles of Biostatistics: V. Combining 2x2 Tables: The Mantel-Haenszel Method and Survival Analysis
No ratings yet
PM510: Principles of Biostatistics: V. Combining 2x2 Tables: The Mantel-Haenszel Method and Survival Analysis
38 pages
New Method For Dynamic Treatment Regimes
No ratings yet
New Method For Dynamic Treatment Regimes
45 pages
Ray 1
No ratings yet
Ray 1
3 pages
Epidemiological Data and Survival of Lung Cancer in Indonesia
No ratings yet
Epidemiological Data and Survival of Lung Cancer in Indonesia
2 pages
Short-Term_Lung_Cancer_Survival_Prediction_Combining_Linear_Regression_and_Convolutional_Neural_Network
No ratings yet
Short-Term_Lung_Cancer_Survival_Prediction_Combining_Linear_Regression_and_Convolutional_Neural_Network
6 pages
Lung Cancer Detection Using Machine Learning
No ratings yet
Lung Cancer Detection Using Machine Learning
24 pages
Lancet Oncology
No ratings yet
Lancet Oncology
8 pages
Thesis Lukas Lofling
No ratings yet
Thesis Lukas Lofling
83 pages
Evaluating The Proportional Hazard (PH) Assumption: Jerry D.T. Purnomo, PH.D
No ratings yet
Evaluating The Proportional Hazard (PH) Assumption: Jerry D.T. Purnomo, PH.D
26 pages
Modelling lung cancer diagnostic pathways using discrete event simulation
No ratings yet
Modelling lung cancer diagnostic pathways using discrete event simulation
12 pages
Medical Statistics
No ratings yet
Medical Statistics
334 pages
A Neuro-Heuristic Approach For Recognition of Lung Diseases From X-Ray
No ratings yet
A Neuro-Heuristic Approach For Recognition of Lung Diseases From X-Ray
15 pages
Therneau Slides For Packet
No ratings yet
Therneau Slides For Packet
508 pages
Advanced Non-Small Cell Lung Cancer in Patients Aged 45 Years or Younger: Outcomes and Prognostic Factors
No ratings yet
Advanced Non-Small Cell Lung Cancer in Patients Aged 45 Years or Younger: Outcomes and Prognostic Factors
8 pages
A Novel Accelerated Failure Time Model Characterizations Validation Testing Different Estimation Methods and Applications in Engineering and Medicine
No ratings yet
A Novel Accelerated Failure Time Model Characterizations Validation Testing Different Estimation Methods and Applications in Engineering and Medicine
27 pages
Measurement of Length - Screw Gauge (Physics) Question Bank
From Everand
Measurement of Length - Screw Gauge (Physics) Question Bank
Mohmmad Khaja Shareef
No ratings yet
M8 - Cox PH Model For Time Dependence Variable
No ratings yet
M8 - Cox PH Model For Time Dependence Variable
31 pages
2012.13926v1
No ratings yet
2012.13926v1
18 pages
Beyond Fixed Restriction Time Adaptive Restricted Mean Survival Time Methods in Clinical Trials
No ratings yet
Beyond Fixed Restriction Time Adaptive Restricted Mean Survival Time Methods in Clinical Trials
43 pages
Clinical Decisions Using An Article About Prognosis: Noel L. Espallardo, MD, MSC
No ratings yet
Clinical Decisions Using An Article About Prognosis: Noel L. Espallardo, MD, MSC
38 pages
Brain Image Processing Project
No ratings yet
Brain Image Processing Project
1 page
Modelling Survival Data in Medical Research, 4th Edition Latest Edition Download
93% (14)
Modelling Survival Data in Medical Research, 4th Edition Latest Edition Download
16 pages
DeepSurv Using A Cox Proportional Hasards DeepNets 1652051740
No ratings yet
DeepSurv Using A Cox Proportional Hasards DeepNets 1652051740
12 pages
PIIS1556086421005153
No ratings yet
PIIS1556086421005153
2 pages
Assignment_3_3
No ratings yet
Assignment_3_3
1 page
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
From Everand
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
SUJAUL CHOWDHURY
No ratings yet
MCQ Post-test (PCI ROT. W14)
No ratings yet
MCQ Post-test (PCI ROT. W14)
7 pages
Veri 6
No ratings yet
Veri 6
6 pages
JCM 10 04675 v2
No ratings yet
JCM 10 04675 v2
12 pages
Lung Cancer Prediction by Using Machine Learning Models With Distributed System and Weka Visualization Ijariie24170
No ratings yet
Lung Cancer Prediction by Using Machine Learning Models With Distributed System and Weka Visualization Ijariie24170
15 pages
Keynote 042
No ratings yet
Keynote 042
12 pages
(2021) Time-To-Event Data - An Overview and Analysis Considerations
No ratings yet
(2021) Time-To-Event Data - An Overview and Analysis Considerations
8 pages
Modelling Survival Data in Medical Research, 4th Edition Full-Resolution Download
No ratings yet
Modelling Survival Data in Medical Research, 4th Edition Full-Resolution Download
15 pages
Lung Cancer Dataset
No ratings yet
Lung Cancer Dataset
10 pages
The Simulation of A Stochastic Model For Tumour-Immune System
No ratings yet
The Simulation of A Stochastic Model For Tumour-Immune System
6 pages
Not Fatal Virus
No ratings yet
Not Fatal Virus
11 pages
First Hitting Time Regression Models: Lifetime Data Analysis Based on Underlying Stochastic Processes
From Everand
First Hitting Time Regression Models: Lifetime Data Analysis Based on Underlying Stochastic Processes
Chrysseis Caroni
No ratings yet
Practical Activity ARLopez
No ratings yet
Practical Activity ARLopez
4 pages
M7 - Evaluating The Cox Model Assumptions
No ratings yet
M7 - Evaluating The Cox Model Assumptions
28 pages
Survival Analysis of Treatment Defaulters
No ratings yet
Survival Analysis of Treatment Defaulters
4 pages
Case Study #3
No ratings yet
Case Study #3
26 pages
Research Plan - Causes of Lung Cancer
No ratings yet
Research Plan - Causes of Lung Cancer
8 pages
Psycho-Oncology.2020298788853.894
No ratings yet
Psycho-Oncology.2020298788853.894
9 pages
Cox Proportional Hazard Model
No ratings yet
Cox Proportional Hazard Model
34 pages
Dissertation 22
No ratings yet
Dissertation 22
72 pages
Sample Statement of Purpose UK
No ratings yet
Sample Statement of Purpose UK
2 pages
The Impact of Human Resources Information Systems On Individual Innovation Capability in Tunisian Companies - The Moderating Role of Affective Commitment
No ratings yet
The Impact of Human Resources Information Systems On Individual Innovation Capability in Tunisian Companies - The Moderating Role of Affective Commitment
8 pages
Transactions For 50100036654971 From 2022-08-01 To 2023-02-06
No ratings yet
Transactions For 50100036654971 From 2022-08-01 To 2023-02-06
39 pages
Double Reduction: Worm Gear Units
No ratings yet
Double Reduction: Worm Gear Units
18 pages
Diamec PHC 4
No ratings yet
Diamec PHC 4
152 pages
221902001-Lab Report 7
No ratings yet
221902001-Lab Report 7
3 pages
White Verision
No ratings yet
White Verision
45 pages
Copia de PRECIOS INSUMOS
No ratings yet
Copia de PRECIOS INSUMOS
75 pages
Forward Galleys: Effectivity:All
No ratings yet
Forward Galleys: Effectivity:All
10 pages
Concrete Mix Design: STEP 1: Choice of Slump
No ratings yet
Concrete Mix Design: STEP 1: Choice of Slump
12 pages
Proxy vivo
No ratings yet
Proxy vivo
3 pages
Distributed Systems Principles and Paradigms: Second Edition Andrew S. Tanenbaum Maarten Van Steen
No ratings yet
Distributed Systems Principles and Paradigms: Second Edition Andrew S. Tanenbaum Maarten Van Steen
22 pages
Why Processor Performance Is More Than Frequency and Core Counts v10 13 23
No ratings yet
Why Processor Performance Is More Than Frequency and Core Counts v10 13 23
7 pages
TCS Operations Tracker - Template
No ratings yet
TCS Operations Tracker - Template
140 pages
Ysk MMCC
No ratings yet
Ysk MMCC
20 pages
Notes Topic 3, 4, 5
No ratings yet
Notes Topic 3, 4, 5
237 pages
PROJECT REPORTblood
No ratings yet
PROJECT REPORTblood
80 pages
Week 06. Programming of Safety Critical Systems - MISRA-C
No ratings yet
Week 06. Programming of Safety Critical Systems - MISRA-C
32 pages
915E (Tier 3) Cummins 进康
No ratings yet
915E (Tier 3) Cummins 进康
2 pages
Lab2 - Java Variables
No ratings yet
Lab2 - Java Variables
4 pages
A E PRESSURE CONTROL VALVE
No ratings yet
A E PRESSURE CONTROL VALVE
1 page
02 Script DD With Oracle11gR2
No ratings yet
02 Script DD With Oracle11gR2
26 pages
System for Non-Conformity Report (NCR) Handling
No ratings yet
System for Non-Conformity Report (NCR) Handling
4 pages
LEH Series: Liquid Ring Vacuum Pump
No ratings yet
LEH Series: Liquid Ring Vacuum Pump
5 pages
ART Multiscale Finite Element Calculations in Python Using SfePy
No ratings yet
ART Multiscale Finite Element Calculations in Python Using SfePy
25 pages
Noris Tacho, Me Safety & Alarm System
No ratings yet
Noris Tacho, Me Safety & Alarm System
35 pages
What's New Dental System 10-2024 en
No ratings yet
What's New Dental System 10-2024 en
26 pages
وقايه محول شركه ميجر
100% (1)
وقايه محول شركه ميجر
56 pages
Calculation of Average Mutual Information (AMI) and False-Nearest Neighbors (FNN) For The Estimation of Embedding Parameters of Multidimensional Time Series in Matlab
No ratings yet
Calculation of Average Mutual Information (AMI) and False-Nearest Neighbors (FNN) For The Estimation of Embedding Parameters of Multidimensional Time Series in Matlab
10 pages