Statistics 262: Intermediate Biostatistics: Kaplan-Meier Methods and Parametric Regression Methods
Statistics 262: Intermediate Biostatistics: Kaplan-Meier Methods and Parametric Regression Methods
Intermediate Biostatistics
Kaplan-Meier methods and Parametric Regression
methods
KM (product-limit)
estimator, formally
k distinct event times t1 t j ... t k
at each event time t j , there are n j individuals at - risk
d j is the number who have the event at time t j
S (t )
dj
[1 n
j:t j t
KM (product-limit)
estimator, formally
Observed event times
[1 n
S (t )
2
3
4
7
7
8
8
9
9
9
11
24
24
Data from: Luthra P, Bland JM, Stanton SL. Incidence of pregnancy after laparoscopy
and hydrotubation. BMJ 1982; 284: 1013-1014
2
3
4
7
7
8
8
9
9
9
11
24
24
Data from: Luthra P, Bland JM, Stanton SL. Incidence of pregnancy after laparoscopy
and hydrotubation. BMJ 1982; 284: 1013-1014
2
3
4
7
7
8
8
9
9
9
11
24
24
Data from: Luthra P, Bland JM, Stanton SL. Incidence of pregnancy after laparoscopy
and hydrotubation. BMJ 1982; 284: 1013-1014
10
11
2.1
3
4
7
7
8
8
9
9
9
11
24
24
(i.e., we know
the woman survived her 2nd cycle
pregnancy-free).
Think of it as
2+ months, e.g., 2.1 months.
Data from: Luthra P, Bland JM, Stanton SL. Incidence of pregnancy after laparoscopy
and hydrotubation. BMJ 1982; 284: 1013-1014
13
S(t=2) = ( 84.2%)*(84.4%)=71.1%
Can get an estimate of the hazard rate
here, h(t=2)= 5/32=15.6%. Given
that you didnt get pregnant in month
1, you have an estimated 5/32 chance
of conceiving in the 2nd month.
And estimate of density (marginal
probability of conceiving in month 2):
f(t)=h(t)*S(t)=(.711)*(.156)=11%
14
2.1
3.1
4
7
7
8
8
9
9
9
11
24
24
Risk set at 3
months
includes 26
women
Data from: Luthra P, Bland JM, Stanton SL. Incidence of pregnancy after laparoscopy
and hydrotubation. BMJ 1982; 284: 1013-1014
16
S(t=3) = ( 84.2%)*(84.4%)*(88.5%)=62.8%
17
2
3.1
4
7
7
8
8
9
9
9
11
24
24
Risk set at 4
months
includes 22
women
Data from: Luthra P, Bland JM, Stanton SL. Incidence of pregnancy after laparoscopy
and hydrotubation. BMJ 1982; 284: 1013-1014
19
S(t=4) = ( 84.2%)*(84.4%)*(88.5%)*(86.4%)=54.2%
2
3
4.1
7
7
8
8
9
9
9
11
24
24
Risk set at 6
months
includes 18
women
Data from: Luthra P, Bland JM, Stanton SL. Incidence of pregnancy after laparoscopy
and hydrotubation. BMJ 1982; 284: 1013-1014
22
S(t=6) = (54.2%)*(88.8%)=42.9%
23
S(t=13) 22%
(eyeball approximation)
24
2
3
4
7
7
8
8
9
9
9
11
24
24
2 remaining at 16
months (9th event
time)
Data from: Luthra P, Bland JM, Stanton SL. Incidence of pregnancy after laparoscopy
and hydrotubation. BMJ 1982; 284: 1013-1014
S(t=16) =( 22%)*(2/3)=15%
time
0.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
2.0000
2.0000
2.0000
2.0000
2.0000
2.0000*
3.0000
3.0000
3.0000
3.0000*
4.0000
4.0000
4.0000
4.0000*
Survival
1.0000
.
.
.
.
.
0.8421
.
.
.
.
0.7105
.
.
.
0.6285
.
.
.
0.5428
.
Failure
0
.
.
.
.
.
0.1579
.
.
.
.
0.2895
.
.
.
0.3715
.
.
.
0.4572
.
Survival
Standard
Error
0
.
.
.
.
.
0.0592
.
.
.
.
0.0736
.
.
.
0.0789
.
.
.
0.0822
.
Number
Failed
Number
Left
0
1
2
3
4
5
6
7
8
9
10
11
11
12
13
14
14
15
16
17
17
38
37
36
35
34
33
32
31
30
29
28
27
26
25
24
23
22
21
20
19
18
27
Survival
.
6.0000
7.0000*
7.0000*
8.0000*
8.0000*
9.0000
9.0000
9.0000
9.0000*
9.0000*
9.0000*
10.0000
11.0000*
13.0000
16.0000
24.0000*
24.0000*
Failure
.
0.4825
.
.
.
.
.
.
0.3619
.
.
.
0.3016
.
0.2262
0.1508
.
.
Standard
Error
.
0.5175
.
.
.
.
.
.
0.6381
.
.
.
0.6984
.
0.7738
0.8492
.
.
18
0.0834
.
.
.
.
.
.
0.0869
.
.
.
0.0910
.
0.0944
0.0880
.
.
Number
Failed
Number
Left
17
19
19
19
19
19
20
21
22
22
22
22
23
23
24
25
25
25
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
28
Survival
(1=died/0=censored)
10
12
14
10
29
S (t ) e
h ( u ) du
0
Linear cumulative
hazard function
indicates a
constant hazard.
31
Cumulative Hazard
Function
0
If the hazard function is constant, e.g. h(t)=k, then the cumulative hazard function
will be linear (and higher hazards will have steeper slopes):
kdu kt
0
If the hazard function is increasing with time, e.g. h(t)=kt, then the cumulative
hazard function will be curved up, for example h(t)=kt gives a quadratic:
t
kt 2
ktdu
2
0
If the hazard function is decreasing over time, e.g. h(t)=k/t, then the
cumulative hazard function should be curved down, for example:
t
k
du k log(t )
t
32
Kaplan-Meier: example 2
Researchers randomized 44 patients with chronic active
hepatitis were to receive prednisolone or no treatment
(control), then compared survival curves.
33
Control (n=22)
12
54
56 *
10
68
22
89
28
96
29
96
32
125*
37
128*
40
131*
41
140*
54
141*
61
143
63
145*
71
146
127*
148*
140*
162*
146*
168
158*
173*
167*
181*
182*
*=censored
Kaplan-Meier: example 2
Are these two curves
different?
35
Control group:
Survival
time
0.000
2.000
3.000
4.000
7.000
10.000
22.000
28.000
29.000
32.000
37.000
40.000
41.000
54.000
61.000
63.000
71.000
127.000*
140.000*
146.000*
158.000*
167.000*
182.000*
6 controls
made it
past 100
months.
Survival
1.0000
0.9545
0.9091
0.8636
0.8182
0.7727
0.7273
0.6818
0.6364
0.5909
0.5455
0.5000
0.4545
0.4091
0.3636
0.3182
0.2727
.
.
.
.
.
.
Failure
0
0.0455
0.0909
0.1364
0.1818
0.2273
0.2727
0.3182
0.3636
0.4091
0.4545
0.5000
0.5455
0.5909
0.6364
0.6818
0.7273
.
.
.
.
.
.
Standard
Error
0
0.0444
0.0613
0.0732
0.0822
0.0893
0.0950
0.0993
0.1026
0.1048
0.1062
0.1066
0.1062
0.1048
0.1026
0.0993
0.0950
.
.
.
.
.
.
Number
Failed
Number
Left
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
16
16
16
16
16
16
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
treated group:
time
5/6 of 54%
rapidly
drops the
curve to
45%.
2/3 of 45%
rapidly
drops the
curve to
30%.
0.000
2.000
6.000
12.000
54.000
56.000*
68.000
89.000
96.000
96.000
125.000*
128.000*
131.000*
140.000*
141.000*
143.000
145.000*
146.000
148.000*
162.000*
168.000
173.000*
181.000*
Survival
1.0000
0.9545
0.9091
0.8636
0.8182
.
0.7701
0.7219
.
0.6257
.
.
.
.
.
0.5475
.
0.4562
.
.
0.3041
.
.
Failure
0
0.0455
0.0909
0.1364
0.1818
.
0.2299
0.2781
.
0.3743
.
.
.
.
.
0.4525
.
0.5438
.
.
0.6959
.
.
Survival
Standard
Error
0
0.0444
0.0613
0.0732
0.0822
.
0.0904
0.0967
.
0.1051
.
.
.
.
.
0.1175
.
0.1285
.
.
0.1509
.
.
Number
Failed
Number
Left
0
1
2
3
4
4
5
6
7
8
8
8
8
8
8
9
9
10
10
10
11
11
11
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
Point-wise confidence
intervals
38
Log-rank test
Test of Equality over Strata
Test
Log-Rank
Wilcoxon
-2Log(LR)
Chi-Square
4.6599
6.5435
5.4096
Pr >
DF
1
1
1
Chi-Square
0.0309
0.0105
0.0200
Log-rank test
40
Event
No Event
Group 1
Group 2
Nk
(ak E (ak ))]2
i 1
Var (a )
k
i 1
~ 12
E ( ak )
(ak bk ) * (ak ck )
Nk
Var (ak )
Event
No Event
Group 1
Group 2
Nk
(ak E (ak ))]2
i 1
Var (a )
k
i 1
~ 12
E (ak )
row1k * col1k
Nk
Var (ak )
Z
standard
deviation
thatVarthis
eventsis a chisquare
with 1 df?
Z
k event times
2
1
(a
No Event
Group 1
Group 2
k event times
k event times
Event
E (ak ))]
i 1
Var (a )
k
2
1
row1k * col1k
Nk
Var (ak )
i 1
1st
event
at
month
2.
0.000
2.000
3.000
4.000
7.000
10.000
22.000
28.000
29.000
32.000
37.000
40.000
41.000
54.000
61.000
63.000
71.000
127.000*
140.000*
146.000*
158.000*
167.000*
182.000*
Survival
1.0000
0.9545
0.9091
0.8636
0.8182
0.7727
0.7273
0.6818
0.6364
0.5909
0.5455
0.5000
0.4545
0.4091
0.3636
0.3182
0.2727
.
.
.
.
.
.
Failure
0
0.0455
0.0909
0.1364
0.1818
0.2273
0.2727
0.3182
0.3636
0.4091
0.4545
0.5000
0.5455
0.5909
0.6364
0.6818
0.7273
.
.
.
.
.
.
Standard
Error
0
0.0444
0.0613
0.0732
0.0822
0.0893
0.0950
0.0993
0.1026
0.1048
0.1062
0.1066
0.1062
0.1048
0.1026
0.0993
0.0950
.
.
.
.
.
.
Number
Failed
Number
Left
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
16
16
16
16
16
16
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
At
risk=2
2
Survival
0.000
1.0000
0.9545
0.9091
0.8636
0.8182
.
0.7701
0.7219
.
0.6257
.
.
.
.
.
0.5475
.
0.4562
.
.
0.3041
.
.
1st
2.000
event 6.000
12.000
at
54.000
month 56.000*
68.000
2.
89.000
96.000
96.000
125.000*
128.000*
131.000*
140.000*
141.000*
143.000
145.000*
146.000
148.000*
162.000*
168.000
173.000*
181.000*
Failure
0
0.0455
0.0909
0.1364
0.1818
.
0.2299
0.2781
.
0.3743
.
.
.
.
.
0.4525
.
0.5438
.
.
0.6959
.
.
Survival
Standard
Error
0
0.0444
0.0613
0.0732
0.0822
.
0.0904
0.0967
.
0.1051
.
.
.
.
.
0.1175
.
0.1285
.
.
0.1509
.
.
Number
Failed
Number
Left
0
1
2
3
4
4
5
6
7
8
8
8
8
8
8
9
9
10
10
10
11
11
11
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
At
risk=2
2
Event
No Event
treated
21
control
21
44
a1 1
( 22) * (2)
1
44
(22) * (22) * (2) * (42)
Var ( a1 )
.244
2
44 (43)
E (a1 )
Next
event
at
month
3.
0.000
2.000
3.000
4.000
7.000
10.000
22.000
28.000
29.000
32.000
37.000
40.000
41.000
54.000
61.000
63.000
71.000
127.000*
140.000*
146.000*
158.000*
167.000*
182.000*
Survival
1.0000
0.9545
0.9091
0.8636
0.8182
0.7727
0.7273
0.6818
0.6364
0.5909
0.5455
0.5000
0.4545
0.4091
0.3636
0.3182
0.2727
.
.
.
.
.
.
Failure
0
0.0455
0.0909
0.1364
0.1818
0.2273
0.2727
0.3182
0.3636
0.4091
0.4545
0.5000
0.5455
0.5909
0.6364
0.6818
0.7273
.
.
.
.
.
.
Standard
Error
0
0.0444
0.0613
0.0732
0.0822
0.0893
0.0950
0.0993
0.1026
0.1048
0.1062
0.1066
0.1062
0.1048
0.1026
0.0993
0.0950
.
.
.
.
.
.
Number
Failed
Number
Left
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
16
16
16
16
16
16
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
At
risk=2
1
Survival
0.000
1.0000
0.9545
0.9091
0.8636
0.8182
.
0.7701
0.7219
.
0.6257
.
.
.
.
.
0.5475
.
0.4562
.
.
0.3041
.
.
2.000
No
6.000
events12.000
at 3 54.000
56.000*
month 68.000
89.000
s
96.000
96.000
125.000*
128.000*
131.000*
140.000*
141.000*
143.000
145.000*
146.000
148.000*
162.000*
168.000
173.000*
181.000*
Failure
0
0.0455
0.0909
0.1364
0.1818
.
0.2299
0.2781
.
0.3743
.
.
.
.
.
0.4525
.
0.5438
.
.
0.6959
.
.
Survival
Standard
Error
0
0.0444
0.0613
0.0732
0.0822
.
0.0904
0.0967
.
0.1051
.
.
.
.
.
0.1175
.
0.1285
.
.
0.1509
.
.
Number
Failed
Number
Left
0
1
2
3
4
4
5
6
7
8
8
8
8
8
8
9
9
10
10
10
11
11
11
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
At
risk=2
1
Event
No Event
At 3 months, 1 died in
the control group.
treated
21
control
20
42
a1 0
(1) * ( 21)
.5
42
(21) * (21) * (1) * (41)
Var (a1 )
.25
2
42 (41)
E (a1 )
1 event
at month
4.
0.000
2.000
3.000
4.000
7.000
10.000
22.000
28.000
29.000
32.000
37.000
40.000
41.000
54.000
61.000
63.000
71.000
127.000*
140.000*
146.000*
158.000*
167.000*
182.000*
Survival
1.0000
0.9545
0.9091
0.8636
0.8182
0.7727
0.7273
0.6818
0.6364
0.5909
0.5455
0.5000
0.4545
0.4091
0.3636
0.3182
0.2727
.
.
.
.
.
.
Failure
0
0.0455
0.0909
0.1364
0.1818
0.2273
0.2727
0.3182
0.3636
0.4091
0.4545
0.5000
0.5455
0.5909
0.6364
0.6818
0.7273
.
.
.
.
.
.
Standard
Error
0
0.0444
0.0613
0.0732
0.0822
0.0893
0.0950
0.0993
0.1026
0.1048
0.1062
0.1066
0.1062
0.1048
0.1026
0.0993
0.0950
.
.
.
.
.
.
Number
Failed
Number
Left
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
16
16
16
16
16
16
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
At
risk=2
0
Survival
1.0000
0.9545
0.9091
0.8636
0.8182
.
0.7701
0.7219
.
0.6257
.
.
.
.
.
0.5475
.
0.4562
.
.
0.3041
.
.
Failure
0
0.0455
0.0909
0.1364
0.1818
.
0.2299
0.2781
.
0.3743
.
.
.
.
.
0.4525
.
0.5438
.
.
0.6959
.
.
Survival
Standard
Error
0
0.0444
0.0613
0.0732
0.0822
.
0.0904
0.0967
.
0.1051
.
.
.
.
.
0.1175
.
0.1285
.
.
0.1509
.
.
Number
Failed
Number
Left
0
1
2
3
4
4
5
6
7
8
8
8
8
8
8
9
9
10
10
10
11
11
11
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
At
risk=2
1
Event
No Event
treated
21
control
19
a1 0
(1) * (21)
.51
41
(21) * (20) * (1) * (40)
Var (a1 )
.25
2
41 (40)
E (a1 )
41
Etc.
22
(a k E (a k ))] 2
i 1
22
Var (a
i 1
4.66
.244 .25 .25 .....
More sensitive to
differences at earlier time
points.
Test
Log-Rank
Wilcoxon
-2Log(LR)
Chi-Square
4.6599
6.5435
5.4096
Pr >
DF
1
1
1
Chi-Square
0.0309
0.0105
0.0200
54
Estimated log(S(t))
Maybe hazard
function decreases a
little then increases
a little? Hard to say
exactly
55
Approximated h(t)
56
Uses of Kaplan-Meier
58
Limitations of Kaplan-Meier
Mainly descriptive
Doesnt control for covariates
Requires categorical predictors
60
Parameters of
the Weibull 61
62
63
Exponential Model :
HR e
Weibull Model :
HR e
scale
64
65
Example 1
Using data from pregnancy study
Recall: roughly, hazard rates were
similar over time
(implies exponential model should
be a good fit).
66
DF Estimate
Error
95% Confidence
Limits
Intercept
2.2636
0.2049
1.8621
2.6651
Scale
1.0217
0.1638
0.7462
1.3987
Weibull Shape
0.9788
0.1569
0.7149
1.3401
<.0001
Compar
e to KM:
68
Example 2: 2 groups
Using data from hepatitis trial, I fit
exponential and Weibull models in
SAS using LIFEREG (Weibull is
default in LIFEREG)
69
Dependent Variable
Log(time)
17
Name of Distribution
Exponential
Log Likelihood
Scale parameter is set to
1, because its
exponential.
-68.03461345
DF Estimate
Error
95% Confidence
Limits
Intercept
4.4886
0.2500
3.9986
4.9786
322.37
<.0001
group
0.9008
0.3917
0.1332
1.6685
5.29
0.0214
Scale
1.0000
0.0000
1.0000
1.0000
Weibull Shape
1.0000
0.0000
1.0000
1.0000
Log(time)
-2LogLikelihood(simpler model)2LogLikelihood(more
complex) = chi-square
17with 1 df (1 extra parameter estimated
for weibull model).
=136-134 = 2
No evidence that
Weibull model is much better than
Weibull
exponential.
-66.94904552
Parameter
DF Estimate
Error
95% Confidence
Limits
Intercept
4.4811
0.3169
3.8601
5.1022
200.00
<.0001
group
1.0544
0.5096
0.0556
2.0533
4.28
0.0385
Scale
1.2673
0.2139
0.9103
1.7643
Weibull Shape
0.7891
0.1332
0.5668
1.0985
Compar
e to KM:
Compare to Cox
regression:
Variable
group
DF
Parameter
Estimate
Standard
Error
Chi-Square
Pr > ChiSq
Hazard
Ratio
-0.83230
0.39739
4.3865
0.0362
0.435
0.948
73