A note on promotion time cure models
with a new biological consideration

Zhi Zhao Department of Biostatistics, University of Oslo
P.O.Box 1122 Blindern, 0317 Oslo, Norway
Fatih Kızılaslan Department of Biostatistics, University of Oslo
P.O.Box 1122 Blindern, 0317 Oslo, Norway
Department for Method Development and Analytics,
Norwegian Institute of Public Health, Oslo, Norway
Abstract

We introduce a generalized promotion time cure model motivated by a new biological consideration. The new approach is flexible to model heterogeneous survival data, in particular for addressing intra-sample heterogeneity. We also indicate that the new approach is suited to model a series or parallel system consisting of multiple subsystems in reliability analysis.

Keywords: Multiscale data integration; cell composition analysis; survival modeling; Weibull mixture models; multinomial-Poisson transformation; reliability analysis

1 Introduction

The promotion time cure model (PTCM) is one of the most important models in survival analysis, but it has not yet been studied much in the literature (Amico and Keilegom,, 2018). The PTCM is constructed by motivating biological considerations, which assumes that after initial treatment the time to recurrence of cancer is the result of a latent process of the residual tumor cells (i.e. clonogenic cells) propagating into a newly detectable tumor (Yakovlev,, 1996). As shown in Yakovlev, (1996) and Chen et al., (1999), the construction of the PTCM dependents on a latent variable N𝑁Nitalic_N that is the number of clonogenic cells left active in a patient after initial treatment. Assume that N𝑁Nitalic_N is Poisson distributed with mean θ>0𝜃0\theta>0italic_θ > 0, i.e. (N=k)=θkeθ/k!𝑁𝑘superscript𝜃𝑘superscript𝑒𝜃𝑘\mathbb{P}(N=k)=\theta^{k}e^{-\theta}/k!blackboard_P ( italic_N = italic_k ) = italic_θ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_θ end_POSTSUPERSCRIPT / italic_k !, k𝑘k\in\mathbb{N}italic_k ∈ blackboard_N. Let another latent variable Zjsubscript𝑍𝑗Z_{j}italic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT (j=1,,N𝑗1𝑁j=1,...,Nitalic_j = 1 , … , italic_N) be the random time for the j𝑗jitalic_j-th clonogenic cell to produce a detectable tumor mass. Given N𝑁Nitalic_N, the variables Zjsubscript𝑍𝑗Z_{j}italic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are independently and identically distributed with cumulative distribution function (cdf) F(t)=1S(t)𝐹𝑡1𝑆𝑡F(t)=1-S(t)italic_F ( italic_t ) = 1 - italic_S ( italic_t ). Here F(t)𝐹𝑡F(t)italic_F ( italic_t ) is the promotion time distribution of any clonogenic cell and S(t)𝑆𝑡S(t)italic_S ( italic_t ) is its corresponding survival function. The time to tumor recurrence can be defined by the random variable T=min0jN{Zj}𝑇subscript0𝑗𝑁subscript𝑍𝑗T=\min_{0\leq j\leq N}\{Z_{j}\}italic_T = roman_min start_POSTSUBSCRIPT 0 ≤ italic_j ≤ italic_N end_POSTSUBSCRIPT { italic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT }, i.e. tumor recurrence when the one of the clonogenic cells becomes activated, where (Z0=)=1subscript𝑍01\mathbb{P}(Z_{0}=\infty)=1blackboard_P ( italic_Z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = ∞ ) = 1 (i.e. no tumor recurrence in a finite time). Note that the time to tumor recurrence T𝑇Titalic_T of a patient is observable, but N𝑁Nitalic_N and Zjsubscript𝑍𝑗Z_{j}italic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are unobservable latent variables. The survival function of the population is the probability of no newly detectable tumor by time t𝑡titalic_t given by

Spop(t)subscript𝑆𝑝𝑜𝑝𝑡\displaystyle S_{pop}(t)italic_S start_POSTSUBSCRIPT italic_p italic_o italic_p end_POSTSUBSCRIPT ( italic_t ) =(N=0)+(Z1>t,.,ZN>t,N>0)\displaystyle=\mathbb{P}(N=0)+\mathbb{P}(Z_{1}>t,....,Z_{N}>t,N>0)= blackboard_P ( italic_N = 0 ) + blackboard_P ( italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > italic_t , … . , italic_Z start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT > italic_t , italic_N > 0 )
=eθ+k=1(Z1>t,.,ZN=k>t)(N=k)\displaystyle=e^{-\theta}+\sum_{k=1}^{\infty}\mathbb{P}(Z_{1}>t,....,Z_{N=k}>t% )\mathbb{P}(N=k)= italic_e start_POSTSUPERSCRIPT - italic_θ end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT blackboard_P ( italic_Z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > italic_t , … . , italic_Z start_POSTSUBSCRIPT italic_N = italic_k end_POSTSUBSCRIPT > italic_t ) blackboard_P ( italic_N = italic_k )
=eθ+k=1S(t)kθkeθk!=eθ(1S(t))=eθF(t).absentsuperscript𝑒𝜃superscriptsubscript𝑘1𝑆superscript𝑡𝑘superscript𝜃𝑘superscript𝑒𝜃𝑘superscript𝑒𝜃1𝑆𝑡superscript𝑒𝜃𝐹𝑡\displaystyle=e^{-\theta}+\sum_{k=1}^{\infty}S(t)^{k}\frac{\theta^{k}e^{-% \theta}}{k!}=e^{-\theta\left(1-S(t)\right)}=e^{-\theta F(t)}.= italic_e start_POSTSUPERSCRIPT - italic_θ end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_S ( italic_t ) start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT divide start_ARG italic_θ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_θ end_POSTSUPERSCRIPT end_ARG start_ARG italic_k ! end_ARG = italic_e start_POSTSUPERSCRIPT - italic_θ ( 1 - italic_S ( italic_t ) ) end_POSTSUPERSCRIPT = italic_e start_POSTSUPERSCRIPT - italic_θ italic_F ( italic_t ) end_POSTSUPERSCRIPT . (1)

Covariates can be introduced in the parameter θ𝜃\thetaitalic_θ and may also be introduced in the proper baseline distribution function F(t)𝐹𝑡F(t)italic_F ( italic_t ). However, it is not clear how to include clonogenic cell data information in the PTCM (1) for better progression-free survival prediction or potentially for the identification of subclonal driver genes predictive of survival. Motivated by this, we propose the following generalized promotion time cure model (GPTCM) to integrate multiscale data, i.e. cancer patient data on multiple biological scales: individual-level survival data, cellular-level cell type proportions data, and subcellular-level cell-type-specific genetic variables.

2 The generalized promotion time cure model (GPTCM)

2.1 Formulation

In cancer cell biology, the tumor might contain a mixture of cell subtypes, for example, invasive tumor cells, non-invasive tumor cells and stromal cells (Trapnell,, 2015). Motivated by the classical PTCM (1), we assume that all tumor cells are composed of multiple clonogenic cell groups (i.e. tumor cell subtypes or subclones). Suppose a patient after an initial treatment has the total number of tumor cells N=l=1LNl𝑁superscriptsubscript𝑙1𝐿subscript𝑁𝑙N=\sum_{l=1}^{L}N_{l}italic_N = ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT, L2𝐿2L\geq 2italic_L ≥ 2, where Nlsubscript𝑁𝑙N_{l}italic_N start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT is the number of l𝑙litalic_l-th cluster of cells (e.g. tumor cell subtype). Similar to the PTCM, let the l𝑙litalic_l-th cluster have multivariate random times for Nlsubscript𝑁𝑙N_{l}italic_N start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT clonogenic cells propagating into a newly detectable tumor:

𝑾l=(Zj=1l1Nj1+1,,Zj=1l1Nj1+Nl),subscript𝑾𝑙subscript𝑍superscriptsubscript𝑗1𝑙1subscript𝑁𝑗11subscript𝑍superscriptsubscript𝑗1𝑙1subscript𝑁𝑗1subscript𝑁𝑙\bm{W}_{l}=\left(Z_{\sum_{j=1}^{l-1}N_{j-1}+1},...,Z_{\sum_{j=1}^{l-1}N_{j-1}+% N_{l}}\right),bold_italic_W start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT = ( italic_Z start_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT , … , italic_Z start_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT + italic_N start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ,

where l{1,,L}𝑙1𝐿l\in\{1,...,L\}italic_l ∈ { 1 , … , italic_L } and N0=0subscript𝑁00N_{0}=0italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 0. For the Nlsubscript𝑁𝑙N_{l}italic_N start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT homogeneous cells in the l𝑙litalic_l-th cluster, we assume cluster-specific promotion time distribution Fl(t)=1Sl(t)subscript𝐹𝑙𝑡1subscript𝑆𝑙𝑡F_{l}(t)=1-S_{l}(t)italic_F start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) = 1 - italic_S start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ), and then we have

(min{𝑾l}>t)=Sl(t)Nl.subscript𝑾𝑙𝑡subscript𝑆𝑙superscript𝑡subscript𝑁𝑙\mathbb{P}(\min\{\bm{W}_{l}\}>t)=S_{l}(t)^{N_{l}}.blackboard_P ( roman_min { bold_italic_W start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT } > italic_t ) = italic_S start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUPERSCRIPT .

The time to tumor recurrence can be defined as T=min{min{𝑾1},,min{𝑾L}}𝑇subscript𝑾1subscript𝑾𝐿T=\min\{\min\{\bm{W}_{1}\},...,\min\{\bm{W}_{L}\}\}italic_T = roman_min { roman_min { bold_italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } , … , roman_min { bold_italic_W start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT } }. Then the survival function for the population is given by

Spop(t)subscript𝑆𝑝𝑜𝑝𝑡\displaystyle S_{pop}(t)italic_S start_POSTSUBSCRIPT italic_p italic_o italic_p end_POSTSUBSCRIPT ( italic_t ) =(no cancer by time t)absentno cancer by time 𝑡\displaystyle=\mathbb{P}(\text{no cancer by time }t)= blackboard_P ( no cancer by time italic_t )
=(N=0)+(min{min{𝑾1},,min{𝑾L}}>t,N>0)absent𝑁0formulae-sequencesubscript𝑾1subscript𝑾𝐿𝑡𝑁0\displaystyle=\mathbb{P}(N=0)+\mathbb{P}\left(\min\{\min\{\bm{W}_{1}\},...,% \min\{\bm{W}_{L}\}\}>t,N>0\right)= blackboard_P ( italic_N = 0 ) + blackboard_P ( roman_min { roman_min { bold_italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } , … , roman_min { bold_italic_W start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT } } > italic_t , italic_N > 0 )
=(N=0)+(min{𝑾1}>t,,min{𝑾L}>t,N>0)absent𝑁0formulae-sequencesubscript𝑾1𝑡formulae-sequencesubscript𝑾𝐿𝑡𝑁0\displaystyle=\mathbb{P}(N=0)+\mathbb{P}(\min\{\bm{W}_{1}\}>t,...,\min\{\bm{W}% _{L}\}>t,N>0)= blackboard_P ( italic_N = 0 ) + blackboard_P ( roman_min { bold_italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } > italic_t , … , roman_min { bold_italic_W start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT } > italic_t , italic_N > 0 )
=(N=0)+k=1(min{𝑾1}>t,,min{𝑾L}>t|N=k)(N=k).absent𝑁0superscriptsubscript𝑘1formulae-sequencesubscript𝑾1𝑡subscript𝑾𝐿conditional𝑡𝑁𝑘𝑁𝑘\displaystyle=\mathbb{P}(N=0)+\sum_{k=1}^{\infty}\mathbb{P}\left(\min\{\bm{W}_% {1}\}>t,...,\min\{\bm{W}_{L}\}>t|N=k\right)\cdot\mathbb{P}(N=k).= blackboard_P ( italic_N = 0 ) + ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT blackboard_P ( roman_min { bold_italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } > italic_t , … , roman_min { bold_italic_W start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT } > italic_t | italic_N = italic_k ) ⋅ blackboard_P ( italic_N = italic_k ) . (2)

To compute the second term in (2), we use the multinomial theorem and the multinomial-Poisson transformation (Brookmeyer and Damiano,, 1989). With the multinomial theorem, we can consider all configurations of (N1,,NL)subscript𝑁1subscript𝑁𝐿(N_{1},...,N_{L})( italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_N start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) such that their sum is k𝑘kitalic_k. If Nlsubscript𝑁𝑙N_{l}italic_N start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT’s are independent Poisson random variables with mean θlsubscript𝜃𝑙\theta_{l}italic_θ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT denoted as 𝒫ois(Nl;θl)𝒫𝑜𝑖𝑠subscript𝑁𝑙subscript𝜃𝑙\mathcal{P}ois(N_{l};\theta_{l})caligraphic_P italic_o italic_i italic_s ( italic_N start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ; italic_θ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ), (l=1,,L𝑙1𝐿l=1,...,Litalic_l = 1 , … , italic_L), by the multinomial-Poisson transformation, the unconditional joint distribution of (N1,,NL)subscript𝑁1subscript𝑁𝐿(N_{1},...,N_{L})( italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_N start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) can be factorized into the product of a Poisson distribution and a multinomial distribution. The multinomial distribution is (N1,,NL|N=k)=:ult(k;𝒑)\mathbb{P}(N_{1},...,N_{L}|N=k)=:\mathcal{M}ult(k;\bm{p})blackboard_P ( italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_N start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT | italic_N = italic_k ) = : caligraphic_M italic_u italic_l italic_t ( italic_k ; bold_italic_p ), where 𝒑=(p1,,pL)𝒑subscript𝑝1subscript𝑝𝐿\bm{p}=(p_{1},...,p_{L})bold_italic_p = ( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_p start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ), pl=θl/θ,θ=lθlformulae-sequencesubscript𝑝𝑙subscript𝜃𝑙𝜃𝜃subscript𝑙subscript𝜃𝑙p_{l}=\theta_{l}/\theta,\theta=\sum_{l}\theta_{l}italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT = italic_θ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT / italic_θ , italic_θ = ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT. Then we obtain

S>0(t):=k=1(min{𝑾1}>t,,min{𝑾L}>t|N=k)(N=k)assignsubscript𝑆absent0𝑡superscriptsubscript𝑘1formulae-sequencesubscript𝑾1𝑡subscript𝑾𝐿conditional𝑡𝑁𝑘𝑁𝑘\displaystyle S_{>0}(t):=\sum_{k=1}^{\infty}\mathbb{P}\left(\min\{\bm{W}_{1}\}% >t,...,\min\{\bm{W}_{L}\}>t|N=k\right)\cdot\mathbb{P}(N=k)italic_S start_POSTSUBSCRIPT > 0 end_POSTSUBSCRIPT ( italic_t ) := ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT blackboard_P ( roman_min { bold_italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } > italic_t , … , roman_min { bold_italic_W start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT } > italic_t | italic_N = italic_k ) ⋅ blackboard_P ( italic_N = italic_k )
=k=1All config.(N1,,NL) with sum k(min{𝑾1}>t,,min{𝑾L}>t|N=k,N1,,NL)(N=k,N1,,NL)\displaystyle=\sum_{k=1}^{\infty}\sum_{\begin{subarray}{c}\text{All config.}\\ (N_{1},...,N_{L})\\ \text{ with sum }k\end{subarray}}\mathbb{P}\left(\min\{\bm{W}_{1}\}>t,...,\min% \{\bm{W}_{L}\}>t|N=k,N_{1},...,N_{L}\right)\cdot\mathbb{P}(N=k,N_{1},...,N_{L})= ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL All config. end_CELL end_ROW start_ROW start_CELL ( italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_N start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL with sum italic_k end_CELL end_ROW end_ARG end_POSTSUBSCRIPT blackboard_P ( roman_min { bold_italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } > italic_t , … , roman_min { bold_italic_W start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT } > italic_t | italic_N = italic_k , italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_N start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) ⋅ blackboard_P ( italic_N = italic_k , italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_N start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT )
=k=1All config.(N1,,NL) with sum k(min{𝑾1}>t,,min{𝑾L}>t|N=k,N1,,NL)(N=k)(N1,,NL|N=k)\displaystyle=\sum_{k=1}^{\infty}\sum_{\begin{subarray}{c}\text{All config.}\\ (N_{1},...,N_{L})\\ \text{ with sum }k\end{subarray}}\mathbb{P}\left(\min\{\bm{W}_{1}\}>t,...,\min% \{\bm{W}_{L}\}>t|N=k,N_{1},...,N_{L}\right)\cdot\mathbb{P}(N=k)\mathbb{P}(N_{1% },...,N_{L}|N=k)= ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL All config. end_CELL end_ROW start_ROW start_CELL ( italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_N start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL with sum italic_k end_CELL end_ROW end_ARG end_POSTSUBSCRIPT blackboard_P ( roman_min { bold_italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } > italic_t , … , roman_min { bold_italic_W start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT } > italic_t | italic_N = italic_k , italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_N start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) ⋅ blackboard_P ( italic_N = italic_k ) blackboard_P ( italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_N start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT | italic_N = italic_k )
=k=1All config.(N1,,NL) with sum k(min{𝑾1}>t,,min{𝑾L}>t|N=k,N1,,NL)𝒫ois(k;θ)ult(k;𝒑)\displaystyle=\sum_{k=1}^{\infty}\sum_{\begin{subarray}{c}\text{All config.}\\ (N_{1},...,N_{L})\\ \text{ with sum }k\end{subarray}}\mathbb{P}\left(\min\{\bm{W}_{1}\}>t,...,\min% \{\bm{W}_{L}\}>t|N=k,N_{1},...,N_{L}\right)\cdot\mathcal{P}ois(k;\theta)% \mathcal{M}ult(k;\bm{p})= ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL All config. end_CELL end_ROW start_ROW start_CELL ( italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_N start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL with sum italic_k end_CELL end_ROW end_ARG end_POSTSUBSCRIPT blackboard_P ( roman_min { bold_italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } > italic_t , … , roman_min { bold_italic_W start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT } > italic_t | italic_N = italic_k , italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_N start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) ⋅ caligraphic_P italic_o italic_i italic_s ( italic_k ; italic_θ ) caligraphic_M italic_u italic_l italic_t ( italic_k ; bold_italic_p )
=k=1All config.(N1,,NL) with sum kl=1L(min{𝑾l}>t|Nl)𝒫ois(k;θ)ult(k;𝒑)absentsuperscriptsubscript𝑘1subscriptAll config.subscript𝑁1subscript𝑁𝐿 with sum 𝑘superscriptsubscriptproduct𝑙1𝐿subscript𝑾𝑙conditional𝑡subscript𝑁𝑙𝒫𝑜𝑖𝑠𝑘𝜃𝑢𝑙𝑡𝑘𝒑\displaystyle=\sum_{k=1}^{\infty}\sum_{\begin{subarray}{c}\text{All config.}\\ (N_{1},...,N_{L})\\ \text{ with sum }k\end{subarray}}\prod_{l=1}^{L}\mathbb{P}(\min\{\bm{W}_{l}\}>% t|N_{l})\cdot\mathcal{P}ois(k;\theta)\mathcal{M}ult(k;\bm{p})= ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL All config. end_CELL end_ROW start_ROW start_CELL ( italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_N start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL with sum italic_k end_CELL end_ROW end_ARG end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT blackboard_P ( roman_min { bold_italic_W start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT } > italic_t | italic_N start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) ⋅ caligraphic_P italic_o italic_i italic_s ( italic_k ; italic_θ ) caligraphic_M italic_u italic_l italic_t ( italic_k ; bold_italic_p )
=k=1All config.(N1,,NL) with sum kl=1LSl(t)Nl𝒫ois(k;θ)ult(k;𝒑)absentsuperscriptsubscript𝑘1subscriptAll config.subscript𝑁1subscript𝑁𝐿 with sum 𝑘superscriptsubscriptproduct𝑙1𝐿subscript𝑆𝑙superscript𝑡subscript𝑁𝑙𝒫𝑜𝑖𝑠𝑘𝜃𝑢𝑙𝑡𝑘𝒑\displaystyle=\sum_{k=1}^{\infty}\sum_{\begin{subarray}{c}\text{All config.}\\ (N_{1},...,N_{L})\\ \text{ with sum }k\end{subarray}}\prod_{l=1}^{L}S_{l}(t)^{N_{l}}\cdot\mathcal{% P}ois(k;\theta)\mathcal{M}ult(k;\bm{p})= ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL All config. end_CELL end_ROW start_ROW start_CELL ( italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_N start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL with sum italic_k end_CELL end_ROW end_ARG end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_S start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋅ caligraphic_P italic_o italic_i italic_s ( italic_k ; italic_θ ) caligraphic_M italic_u italic_l italic_t ( italic_k ; bold_italic_p )
=k=1All config.(N1,,NL) with sum kl=1LSl(t)Nlθkeθk!k!N1!NL!p1N1pLNLabsentsuperscriptsubscript𝑘1subscriptAll config.subscript𝑁1subscript𝑁𝐿 with sum 𝑘superscriptsubscriptproduct𝑙1𝐿subscript𝑆𝑙superscript𝑡subscript𝑁𝑙superscript𝜃𝑘superscript𝑒𝜃𝑘𝑘subscript𝑁1subscript𝑁𝐿superscriptsubscript𝑝1subscript𝑁1superscriptsubscript𝑝𝐿subscript𝑁𝐿\displaystyle=\sum_{k=1}^{\infty}\sum_{\begin{subarray}{c}\text{All config.}\\ (N_{1},...,N_{L})\\ \text{ with sum }k\end{subarray}}\prod_{l=1}^{L}S_{l}(t)^{N_{l}}\cdot\frac{% \theta^{k}e^{-\theta}}{k!}\cdot\frac{k!}{N_{1}!...N_{L}!}p_{1}^{N_{1}}...p_{L}% ^{N_{L}}= ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL All config. end_CELL end_ROW start_ROW start_CELL ( italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_N start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL with sum italic_k end_CELL end_ROW end_ARG end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_S start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋅ divide start_ARG italic_θ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_θ end_POSTSUPERSCRIPT end_ARG start_ARG italic_k ! end_ARG ⋅ divide start_ARG italic_k ! end_ARG start_ARG italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ! … italic_N start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ! end_ARG italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT … italic_p start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT end_POSTSUPERSCRIPT
=k=1θkeθk!All config.(N1,,NL) with sum kk!N1!NL!{p1S1(t)}N1{pLSL(t)}NLabsentsuperscriptsubscript𝑘1superscript𝜃𝑘superscript𝑒𝜃𝑘subscriptAll config.subscript𝑁1subscript𝑁𝐿 with sum 𝑘𝑘subscript𝑁1subscript𝑁𝐿superscriptsubscript𝑝1subscript𝑆1𝑡subscript𝑁1superscriptsubscript𝑝𝐿subscript𝑆𝐿𝑡subscript𝑁𝐿\displaystyle=\sum_{k=1}^{\infty}\frac{\theta^{k}e^{-\theta}}{k!}\sum_{\begin{% subarray}{c}\text{All config.}\\ (N_{1},...,N_{L})\\ \text{ with sum }k\end{subarray}}\frac{k!}{N_{1}!...N_{L}!}\{p_{1}S_{1}(t)\}^{% N_{1}}...\{p_{L}S_{L}(t)\}^{N_{L}}= ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG italic_θ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_θ end_POSTSUPERSCRIPT end_ARG start_ARG italic_k ! end_ARG ∑ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL All config. end_CELL end_ROW start_ROW start_CELL ( italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_N start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL with sum italic_k end_CELL end_ROW end_ARG end_POSTSUBSCRIPT divide start_ARG italic_k ! end_ARG start_ARG italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ! … italic_N start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ! end_ARG { italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t ) } start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT … { italic_p start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ( italic_t ) } start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT end_POSTSUPERSCRIPT
=k=1θkeθk!{p1S1(t)++pLSL(t)}kabsentsuperscriptsubscript𝑘1superscript𝜃𝑘superscript𝑒𝜃𝑘superscriptsubscript𝑝1subscript𝑆1𝑡subscript𝑝𝐿subscript𝑆𝐿𝑡𝑘\displaystyle=\sum_{k=1}^{\infty}\frac{\theta^{k}e^{-\theta}}{k!}\{p_{1}S_{1}(% t)+...+p_{L}S_{L}(t)\}^{k}= ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT divide start_ARG italic_θ start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_θ end_POSTSUPERSCRIPT end_ARG start_ARG italic_k ! end_ARG { italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t ) + … + italic_p start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ( italic_t ) } start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT
={eθl=1LplSl(t)1}eθ.absentsuperscript𝑒𝜃superscriptsubscript𝑙1𝐿subscript𝑝𝑙subscript𝑆𝑙𝑡1superscript𝑒𝜃\displaystyle=\left\{e^{\theta\sum_{l=1}^{L}p_{l}S_{l}(t)}-1\right\}e^{-\theta}.= { italic_e start_POSTSUPERSCRIPT italic_θ ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) end_POSTSUPERSCRIPT - 1 } italic_e start_POSTSUPERSCRIPT - italic_θ end_POSTSUPERSCRIPT .

Finally, the population survival function is

Spop(t)subscript𝑆𝑝𝑜𝑝𝑡\displaystyle S_{pop}(t)italic_S start_POSTSUBSCRIPT italic_p italic_o italic_p end_POSTSUBSCRIPT ( italic_t ) =(N=0)+S>0(t)absent𝑁0subscript𝑆absent0𝑡\displaystyle=\mathbb{P}(N=0)+S_{>0}(t)= blackboard_P ( italic_N = 0 ) + italic_S start_POSTSUBSCRIPT > 0 end_POSTSUBSCRIPT ( italic_t )
=eθ+{eθl=1LplSl(t)1}eθabsentsuperscript𝑒𝜃superscript𝑒𝜃superscriptsubscript𝑙1𝐿subscript𝑝𝑙subscript𝑆𝑙𝑡1superscript𝑒𝜃\displaystyle=e^{-\theta}+\left\{e^{\theta\sum_{l=1}^{L}p_{l}S_{l}(t)}-1\right% \}e^{-\theta}= italic_e start_POSTSUPERSCRIPT - italic_θ end_POSTSUPERSCRIPT + { italic_e start_POSTSUPERSCRIPT italic_θ ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) end_POSTSUPERSCRIPT - 1 } italic_e start_POSTSUPERSCRIPT - italic_θ end_POSTSUPERSCRIPT
=eθ{1l=1LplSl(t)}.absentsuperscript𝑒𝜃1superscriptsubscript𝑙1𝐿subscript𝑝𝑙subscript𝑆𝑙𝑡\displaystyle=e^{-\theta\left\{1-\sum_{l=1}^{L}p_{l}S_{l}(t)\right\}}.= italic_e start_POSTSUPERSCRIPT - italic_θ { 1 - ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) } end_POSTSUPERSCRIPT . (3)

Let F(t)=1l=1LplSl(t)𝐹𝑡1superscriptsubscript𝑙1𝐿subscript𝑝𝑙subscript𝑆𝑙𝑡F(t)=1-\sum_{l=1}^{L}p_{l}S_{l}(t)italic_F ( italic_t ) = 1 - ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ), and then Spop(t)=eθF(t)subscript𝑆𝑝𝑜𝑝𝑡superscript𝑒𝜃𝐹𝑡S_{pop}(t)=e^{-\theta F(t)}italic_S start_POSTSUBSCRIPT italic_p italic_o italic_p end_POSTSUBSCRIPT ( italic_t ) = italic_e start_POSTSUPERSCRIPT - italic_θ italic_F ( italic_t ) end_POSTSUPERSCRIPT. This means that if S(t)=Sl(t)𝑆𝑡subscript𝑆𝑙𝑡S(t)=S_{l}(t)italic_S ( italic_t ) = italic_S start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ), l{1,,L}for-all𝑙1𝐿\forall l\in\{1,...,L\}∀ italic_l ∈ { 1 , … , italic_L } (i.e. no different types of cells), the population survival function (3) is degenerated into PTCM (1), so the new model is named as generalized promotion time cure model (GPTCM).
Remark 1. Note that the proportions (p1,,pL)subscript𝑝1subscript𝑝𝐿(p_{1},...,p_{L})( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_p start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) in the GPTCM (3) are patients’ cancer cell proportions data (i.e. n𝑛nitalic_n-by-L𝐿Litalic_L data matrix collected from n𝑛nitalic_n patients) not simple weight parameters. Therefore, the GPTCM can integrate multiscale data [i.e. individual-level survival data, cellular-level cell type proportions data, and subcellular-level cell-type-specific molecular and genomic data (see Remark 4 below)] for joint modeling.
Remark 2. The GPTCM is similar to the general class of PTCM with Equation 2 in Gómez et al., (2023) whose population survival function was derived based on a compound Poisson distribution for the total number of clonogenic cells. But they assumed a common promotion time distribution for all cells, i.e. without distinguishing heterogeneous cells. Gómez et al., (2023) also fixed the number of cells to be 1, 2 or \infty regardless of the exact tumor cells in patients.
Remark 3. The GPTCM is also similar to a mixtures-of-experts model for survival analysis (Rosen and Tanner,, 1999) or a generalized Weibull mixture model for reliability analysis (Jiang and Murthy,, 1995). But the (generalized) mixture models are to model inter-sample heterogeneity, assuming every sample is from one of the mixture clusters. In contrast, our GPTCM is to model intra-sample heterogeneity, since every sample/patient has data for multiple clusters of plSl(t),l{1,,L}subscript𝑝𝑙subscript𝑆𝑙𝑡𝑙1𝐿p_{l}S_{l}(t),l\in\{1,...,L\}italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) , italic_l ∈ { 1 , … , italic_L }. In fact, the formulation of a mixture model like Spop(t)=(N=0)+l=1LplSl(t)subscript𝑆𝑝𝑜𝑝𝑡𝑁0superscriptsubscript𝑙1𝐿subscript𝑝𝑙subscript𝑆𝑙𝑡S_{pop}(t)=\mathbb{P}(N=0)+\sum_{l=1}^{L}p_{l}S_{l}(t)italic_S start_POSTSUBSCRIPT italic_p italic_o italic_p end_POSTSUBSCRIPT ( italic_t ) = blackboard_P ( italic_N = 0 ) + ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) is not biologically meaningful in the situation that the time to recurrence is the result of a latent process for cancer recurrence, see Appendix A.
Remark 4. Similar to the PTCM, the GPTCM can introduce covariates X𝑋Xitalic_X through the Poisson rate parameter θ𝜃\thetaitalic_θ, e.g. θ=exp(ξ0+Xξ)𝜃subscript𝜉0𝑋𝜉\theta=\exp(\xi_{0}+X\xi)italic_θ = roman_exp ( italic_ξ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_X italic_ξ ). Benefiting from the mixture part in the GPTCM, cluster-specific covariates (e.g. genetic variables from each tumor cell subtype) Xlsubscript𝑋𝑙X_{l}italic_X start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT can be introduced in Sl(t)subscript𝑆𝑙𝑡S_{l}(t)italic_S start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ). For example, using a log-linear model to capture the mean survival time, i.e. logμl=Xlβl,l{1,,L}formulae-sequencesubscript𝜇𝑙subscript𝑋𝑙subscript𝛽𝑙𝑙1𝐿\log\mu_{l}=X_{l}\beta_{l},l\in\{1,...,L\}roman_log italic_μ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT = italic_X start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT , italic_l ∈ { 1 , … , italic_L }, where μlsubscript𝜇𝑙\mu_{l}italic_μ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT is the mean of the Weibull distribution Sl(t)=exp{(t/λl)κ},λl=μl/Γ(1+1/κ),κ+,formulae-sequencesubscript𝑆𝑙𝑡superscript𝑡subscript𝜆𝑙𝜅formulae-sequencesubscript𝜆𝑙subscript𝜇𝑙Γ11𝜅𝜅subscriptS_{l}(t)=\exp\{-(t/\lambda_{l})^{\kappa}\},\lambda_{l}=\mu_{l}/\Gamma(1+1/% \kappa),\kappa\in\mathbb{R}_{+},italic_S start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) = roman_exp { - ( italic_t / italic_λ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_κ end_POSTSUPERSCRIPT } , italic_λ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT = italic_μ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT / roman_Γ ( 1 + 1 / italic_κ ) , italic_κ ∈ blackboard_R start_POSTSUBSCRIPT + end_POSTSUBSCRIPT , and Γ()Γ\Gamma(\cdot)roman_Γ ( ⋅ ) is the gamma function. The modeling of tumor cell-type-specific genes has the potential to identify cell-type-specific drivers for cancer prognosis, and ultimately improve individualized cancer diagnosis and personalized cancer therapies. Furthermore, if we assume randomness in the proportions data (e.g. following Dirichlet distribution), any covariate may also be introduced to model the compositional data of cell proportions (Greenacre,, 2021; Mangiola et al.,, 2023).
Remark 5. Identifiability is an important issue in the estimation of cure models. The GPTCM is identifiable when θ=exp(ξ0+Xξ)𝜃subscript𝜉0𝑋𝜉\theta=\exp(\xi_{0}+X\xi)italic_θ = roman_exp ( italic_ξ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_X italic_ξ ) and limtSl(t)=0subscript𝑡subscript𝑆𝑙𝑡0\lim_{t\to\infty}S_{l}(t)=0roman_lim start_POSTSUBSCRIPT italic_t → ∞ end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) = 0 (l{1,,L}for-all𝑙1𝐿\forall l\in\{1,...,L\}∀ italic_l ∈ { 1 , … , italic_L }) according to Proposition 7 in Hanin and Huang, (2014). In a finite mixture model l=1LplSl(t)superscriptsubscript𝑙1𝐿subscript𝑝𝑙subscript𝑆𝑙𝑡\sum_{l=1}^{L}p_{l}S_{l}(t)∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ), the label switching problem is a common identifiability issue, since there is no prior information to distinguish between the clusters of the mixture. However, in the applications of single-cell data, cell types can be predefined based on cell biology, and single-cell sequencing data usually result in well estimated cell type proportions. As mentioned in Remark 1, the proportions (p1,,pL)subscript𝑝1subscript𝑝𝐿(p_{1},...,p_{L})( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_p start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) in the GPTCM are cancer cell proportions data collected from patients rather than weight parameters, so the label switching is irrelevant.

2.2 Connection to last-activation scheme and reliability analysis

The PTCM is also referred to as the first-activation scheme (Cooner et al.,, 2007). When all clonogenic cells are homogeneous (i.e. no different types of cells) and the time to tumor recurrence is when the last clonogenic cell becomes activated (i.e. T=max0jN{Zj}𝑇subscript0𝑗𝑁subscript𝑍𝑗T=\max_{0\leq j\leq N}\{Z_{j}\}italic_T = roman_max start_POSTSUBSCRIPT 0 ≤ italic_j ≤ italic_N end_POSTSUBSCRIPT { italic_Z start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT }), Cooner et al., (2007) referred this as the last-activation scheme and its corresponding population survival function is 1+eθ(1eθF(t))1superscript𝑒𝜃1superscript𝑒𝜃𝐹𝑡1+e^{-\theta}(1-e^{\theta F(t)})1 + italic_e start_POSTSUPERSCRIPT - italic_θ end_POSTSUPERSCRIPT ( 1 - italic_e start_POSTSUPERSCRIPT italic_θ italic_F ( italic_t ) end_POSTSUPERSCRIPT ). Similar to the last-activation scheme, the GPTCM can be extended for the recurrence to be observed when the last class of clonogenic cells becomes activated. Then the time to tumor recurrence can be defined as T=max{max{𝑾1},,max{𝑾L}}𝑇subscript𝑾1subscript𝑾𝐿T=\max\{\max\{\bm{W}_{1}\},...,\max\{\bm{W}_{L}\}\}italic_T = roman_max { roman_max { bold_italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } , … , roman_max { bold_italic_W start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT } } and the population survival function is given by (see Appendix B for details)

S~pop(t)=1+eθeθl=1LplSl(t).subscript~𝑆𝑝𝑜𝑝𝑡1superscript𝑒𝜃superscript𝑒𝜃superscriptsubscript𝑙1𝐿subscript𝑝𝑙subscript𝑆𝑙𝑡\tilde{S}_{pop}(t)=1+e^{-\theta}-e^{-\theta\sum_{l=1}^{L}p_{l}S_{l}(t)}.over~ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_p italic_o italic_p end_POSTSUBSCRIPT ( italic_t ) = 1 + italic_e start_POSTSUPERSCRIPT - italic_θ end_POSTSUPERSCRIPT - italic_e start_POSTSUPERSCRIPT - italic_θ ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) end_POSTSUPERSCRIPT . (4)

Here S~pop(0)subscript~𝑆𝑝𝑜𝑝0\tilde{S}_{pop}(0)over~ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_p italic_o italic_p end_POSTSUBSCRIPT ( 0 ) is improper with S~pop(0)=1subscript~𝑆𝑝𝑜𝑝01\tilde{S}_{pop}(0)=1over~ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_p italic_o italic_p end_POSTSUBSCRIPT ( 0 ) = 1 and cure rate S~pop()=eθsubscript~𝑆𝑝𝑜𝑝superscript𝑒𝜃\tilde{S}_{pop}(\infty)=e^{-\theta}over~ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_p italic_o italic_p end_POSTSUBSCRIPT ( ∞ ) = italic_e start_POSTSUPERSCRIPT - italic_θ end_POSTSUPERSCRIPT.

From the perspective of system reliability, the PTCM can be interpreted as analogous to a series system with a random number of units under random shock (Cha and Finkelstein,, 2018). In such a system, failure occurs as soon as one unit fails, making the PTCM conceptually similar to a reliability structure where the weakest link dictates the overall system failure. Our proposed GPTCM can be suited to model a system consisting of multiple heterogeneous subsystems (Fig. 1A), as discussed in Wei and Liu, (2023), which investigates the reliability of the time until one critical subsystem fails.

In reliability engineering, a natural extension to the series system is the latent parallel system model, in which failure occurs only after all latent factors have been activated, known as the last-activation scheme defined in Cooner et al., (2007). It represents a contrasting mechanism where the survival time depends on the simultaneous activation of multiple latent processes, rather than being dictated by the earliest activation. Therefore, the last-activation scheme model (4) can be used for a parallel system with multiple subsystems (Fig. 1B). The cure rate S~pop()=eθsubscript~𝑆𝑝𝑜𝑝superscript𝑒𝜃\tilde{S}_{pop}(\infty)=e^{-\theta}over~ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_p italic_o italic_p end_POSTSUBSCRIPT ( ∞ ) = italic_e start_POSTSUPERSCRIPT - italic_θ end_POSTSUPERSCRIPT means that a harmful event does not result in an ultimate system failure. Further extensions can be for a parallel-series system (Fig. 1C) with the failure time T=max{min{𝑾1},,min{𝑾L}}𝑇subscript𝑾1subscript𝑾𝐿T=\max\{\min\{\bm{W}_{1}\},...,\min\{\bm{W}_{L}\}\}italic_T = roman_max { roman_min { bold_italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } , … , roman_min { bold_italic_W start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT } }, i.e. the failure occurs when all of the parallel subsystems fail, and can also be for a series-parallel system (Fig. 1D) with the failure time T=min{max{𝑾1},,max{𝑾L}}𝑇subscript𝑾1subscript𝑾𝐿T=\min\{\max\{\bm{W}_{1}\},...,\max\{\bm{W}_{L}\}\}italic_T = roman_min { roman_max { bold_italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } , … , roman_max { bold_italic_W start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT } }, i.e. the failure occurs when one of the parallel subsystems fails.

Refer to caption
Figure 1: Illustration of series system, parallel system, parallel-series system and series-parallel system.

2.3 Statistical characteristics of the GPTCM

The GPTCM (for the first-activation scheme) and the classical PTCM have similar statistical properties. For example, both the PTCM and GPTCM do not have proper survival functions, since their cure fraction is Spop()=eθ>0subscript𝑆𝑝𝑜𝑝superscript𝑒𝜃0S_{pop}(\infty)=e^{-\theta}>0italic_S start_POSTSUBSCRIPT italic_p italic_o italic_p end_POSTSUBSCRIPT ( ∞ ) = italic_e start_POSTSUPERSCRIPT - italic_θ end_POSTSUPERSCRIPT > 0. The survival function of the noncured population of the GPTCM is a proper survival function, i.e. S(t)=S>0(t)/(1eθ)superscript𝑆𝑡subscript𝑆absent0𝑡1superscript𝑒𝜃S^{*}(t)=S_{>0}(t)/(1-e^{-\theta})italic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_t ) = italic_S start_POSTSUBSCRIPT > 0 end_POSTSUBSCRIPT ( italic_t ) / ( 1 - italic_e start_POSTSUPERSCRIPT - italic_θ end_POSTSUPERSCRIPT ), S(0)=1superscript𝑆01S^{*}(0)=1italic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( 0 ) = 1 and S()=0superscript𝑆0S^{*}(\infty)=0italic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( ∞ ) = 0. Assuming all covariates X𝑋Xitalic_X are time-independent, the population probability density function (pdf) of the GPTCM is given by

fpop(t|X)=dSpop(t|X)dt=θf(t)eθF(t),subscript𝑓𝑝𝑜𝑝conditional𝑡𝑋dsubscript𝑆𝑝𝑜𝑝conditional𝑡𝑋d𝑡𝜃𝑓𝑡superscript𝑒𝜃𝐹𝑡\displaystyle f_{pop}(t|X)=-\frac{\operatorname{d}\!{S}_{pop}(t|X)}{% \operatorname{d}\!{t}}=\theta f(t)e^{-\theta F(t)},italic_f start_POSTSUBSCRIPT italic_p italic_o italic_p end_POSTSUBSCRIPT ( italic_t | italic_X ) = - divide start_ARG roman_d italic_S start_POSTSUBSCRIPT italic_p italic_o italic_p end_POSTSUBSCRIPT ( italic_t | italic_X ) end_ARG start_ARG roman_d italic_t end_ARG = italic_θ italic_f ( italic_t ) italic_e start_POSTSUPERSCRIPT - italic_θ italic_F ( italic_t ) end_POSTSUPERSCRIPT ,

where F(t)=1l=1LplSl(t)=l=1LplFl(t)𝐹𝑡1superscriptsubscript𝑙1𝐿subscript𝑝𝑙subscript𝑆𝑙𝑡superscriptsubscript𝑙1𝐿subscript𝑝𝑙subscript𝐹𝑙𝑡F(t)=1-\sum_{l=1}^{L}p_{l}S_{l}(t)=\sum_{l=1}^{L}p_{l}F_{l}(t)italic_F ( italic_t ) = 1 - ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) = ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ), f(t)=(d/dt)F(t)=l=1Lplfl(t)𝑓𝑡dd𝑡𝐹𝑡superscriptsubscript𝑙1𝐿subscript𝑝𝑙subscript𝑓𝑙𝑡f(t)=(\operatorname{d}\!{/}\operatorname{d}\!{t})F(t)=\sum_{l=1}^{L}p_{l}f_{l}% (t)italic_f ( italic_t ) = ( roman_d / roman_d italic_t ) italic_F ( italic_t ) = ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ), and fl(t)subscript𝑓𝑙𝑡f_{l}(t)italic_f start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) and Fl(t)subscript𝐹𝑙𝑡F_{l}(t)italic_F start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) are the cluster-specific promotion time pdf and cdf, respectively. Note that here fpop(t|X)subscript𝑓𝑝𝑜𝑝conditional𝑡𝑋f_{pop}(t|X)italic_f start_POSTSUBSCRIPT italic_p italic_o italic_p end_POSTSUBSCRIPT ( italic_t | italic_X ) is not a proper pdf, since Spop(t|X)subscript𝑆𝑝𝑜𝑝conditional𝑡𝑋S_{pop}(t|X)italic_S start_POSTSUBSCRIPT italic_p italic_o italic_p end_POSTSUBSCRIPT ( italic_t | italic_X ) is not a proper survival function.

The hazard functions of the entire population and the noncured population of the GPTCM are

hpop(t|X)subscript𝑝𝑜𝑝conditional𝑡𝑋\displaystyle h_{pop}(t|X)italic_h start_POSTSUBSCRIPT italic_p italic_o italic_p end_POSTSUBSCRIPT ( italic_t | italic_X ) =dSpop(t|X)/dtSpop(t|X)=θf(t)=θl=1Lplfl(t),absentdsubscript𝑆𝑝𝑜𝑝conditional𝑡𝑋d𝑡subscript𝑆𝑝𝑜𝑝conditional𝑡𝑋𝜃𝑓𝑡𝜃superscriptsubscript𝑙1𝐿subscript𝑝𝑙subscript𝑓𝑙𝑡\displaystyle=\frac{-\operatorname{d}\!{S}_{pop}(t|X)/\operatorname{d}\!{t}}{S% _{pop}(t|X)}=\theta f(t)=\theta\sum_{l=1}^{L}p_{l}f_{l}(t),= divide start_ARG - roman_d italic_S start_POSTSUBSCRIPT italic_p italic_o italic_p end_POSTSUBSCRIPT ( italic_t | italic_X ) / roman_d italic_t end_ARG start_ARG italic_S start_POSTSUBSCRIPT italic_p italic_o italic_p end_POSTSUBSCRIPT ( italic_t | italic_X ) end_ARG = italic_θ italic_f ( italic_t ) = italic_θ ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) ,
h(t|X)superscriptconditional𝑡𝑋\displaystyle h^{*}(t|X)italic_h start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_t | italic_X ) =dS(t|X)/dtS(t|X)={1eθl=1LplSl(t)}1θl=1Lplfl(t).absentdsuperscript𝑆conditional𝑡𝑋d𝑡superscript𝑆conditional𝑡𝑋superscript1superscript𝑒𝜃superscriptsubscript𝑙1𝐿subscript𝑝𝑙subscript𝑆𝑙𝑡1𝜃superscriptsubscript𝑙1𝐿subscript𝑝𝑙subscript𝑓𝑙𝑡\displaystyle=\frac{-\operatorname{d}\!{S}^{*}(t|X)/\operatorname{d}\!{t}}{S^{% *}(t|X)}=\left\{1-e^{-\theta\sum_{l=1}^{L}p_{l}S_{l}(t)}\right\}^{-1}\theta% \sum_{l=1}^{L}p_{l}f_{l}(t).= divide start_ARG - roman_d italic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_t | italic_X ) / roman_d italic_t end_ARG start_ARG italic_S start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_t | italic_X ) end_ARG = { 1 - italic_e start_POSTSUPERSCRIPT - italic_θ ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) end_POSTSUPERSCRIPT } start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_θ ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) .
Refer to caption
Figure 2: Survival and hazard plots of the GPTCM with two clusters. Weibull distributions with cluster-specific scale parameters and common shape parameter are as an example: Sl(t)=e(t/λl)κsubscript𝑆𝑙𝑡superscript𝑒superscript𝑡subscript𝜆𝑙𝜅S_{l}(t)=e^{-(t/\lambda_{l})^{\kappa}}italic_S start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) = italic_e start_POSTSUPERSCRIPT - ( italic_t / italic_λ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_κ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, λl=μlΓ(1+1/κ)subscript𝜆𝑙subscript𝜇𝑙Γ11𝜅\lambda_{l}=\frac{\mu_{l}}{\Gamma(1+1/\kappa)}italic_λ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT = divide start_ARG italic_μ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_ARG start_ARG roman_Γ ( 1 + 1 / italic_κ ) end_ARG, l{1,2}𝑙12l\in\{1,2\}italic_l ∈ { 1 , 2 }, (logμ1,logμ2)=(0.1,1)subscript𝜇1subscript𝜇20.11(\log\mu_{1},\log\mu_{2})=(-0.1,1)( roman_log italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , roman_log italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = ( - 0.1 , 1 ), (p1,p2)=(0.3,0.7)subscript𝑝1subscript𝑝20.30.7(p_{1},p_{2})=(0.3,0.7)( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = ( 0.3 , 0.7 ), θ=2𝜃2\theta=2italic_θ = 2, and the Weibull’s shape parameter κ=0.5𝜅0.5\kappa=0.5italic_κ = 0.5 (solid line), κ=1𝜅1\kappa=1italic_κ = 1 (dashed line) or κ=3𝜅3\kappa=3italic_κ = 3 (dotted line).

Similar to Chen et al., (1999), we also have

hpop(t|X)=h(t|X){1eθl=1LplSl(t)}h(t|X),subscript𝑝𝑜𝑝conditional𝑡𝑋superscriptconditional𝑡𝑋1superscript𝑒𝜃superscriptsubscript𝑙1𝐿subscript𝑝𝑙subscript𝑆𝑙𝑡superscriptconditional𝑡𝑋h_{pop}(t|X)=h^{*}(t|X)\left\{1-e^{-\theta\sum_{l=1}^{L}p_{l}S_{l}(t)}\right\}% \leq h^{*}(t|X),italic_h start_POSTSUBSCRIPT italic_p italic_o italic_p end_POSTSUBSCRIPT ( italic_t | italic_X ) = italic_h start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_t | italic_X ) { 1 - italic_e start_POSTSUPERSCRIPT - italic_θ ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) end_POSTSUPERSCRIPT } ≤ italic_h start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_t | italic_X ) ,

i.e. the hazard function of the noncured samples is greater than a sample selected from the entire population. We can also obtain the population cumulative hazard function Hpop(t)=0thpop(s)ds=θl=1LplFl(t)subscript𝐻𝑝𝑜𝑝𝑡superscriptsubscript0𝑡subscript𝑝𝑜𝑝𝑠d𝑠𝜃superscriptsubscript𝑙1𝐿subscript𝑝𝑙subscript𝐹𝑙𝑡H_{pop}(t)=\int_{0}^{t}h_{pop}(s)\operatorname{d}\!{s}=\theta\sum_{l=1}^{L}p_{% l}F_{l}(t)italic_H start_POSTSUBSCRIPT italic_p italic_o italic_p end_POSTSUBSCRIPT ( italic_t ) = ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_h start_POSTSUBSCRIPT italic_p italic_o italic_p end_POSTSUBSCRIPT ( italic_s ) roman_d italic_s = italic_θ ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ), and Spop(t)=eHpop(t)subscript𝑆𝑝𝑜𝑝𝑡superscript𝑒subscript𝐻𝑝𝑜𝑝𝑡S_{pop}(t)=e^{-H_{pop}(t)}italic_S start_POSTSUBSCRIPT italic_p italic_o italic_p end_POSTSUBSCRIPT ( italic_t ) = italic_e start_POSTSUPERSCRIPT - italic_H start_POSTSUBSCRIPT italic_p italic_o italic_p end_POSTSUBSCRIPT ( italic_t ) end_POSTSUPERSCRIPT. Fig. 2 shows the population survival curves and population hazards of the GPTCM when assuming two clusters and Weibull distributed survival. It is interesting that the population hazard function of the GPTCM can be multimodal, because hpop(t|X)subscript𝑝𝑜𝑝conditional𝑡𝑋h_{pop}(t|X)italic_h start_POSTSUBSCRIPT italic_p italic_o italic_p end_POSTSUBSCRIPT ( italic_t | italic_X ) is a mixture of L𝐿Litalic_L density functions, which is beyond the basic shapes of the hazard function [e.g. constant, decreasing, increasing, unimodal (up-then-down), or bathtub (down-then-up) shape] (Christen and Rubio,, 2025).

The importance measure of cluster-specific survival can provide valuable insight for developing effective strategies to improve or intervene the entire system, applicable to both biomedical applications and reliability engineering. Such measures help to identify which clusters should receive attention in survival improvement efforts. The population survival function at a given time t𝑡titalic_t is expressed as a function of the L𝐿Litalic_L clusters’ survival at that time, i.e.

Spop(t)=f(S1(t),S2(t),,SL(t)).subscript𝑆𝑝𝑜𝑝𝑡𝑓subscript𝑆1𝑡subscript𝑆2𝑡subscript𝑆𝐿𝑡S_{pop}(t)=f(S_{1}(t),S_{2}(t),...,S_{L}(t)).italic_S start_POSTSUBSCRIPT italic_p italic_o italic_p end_POSTSUBSCRIPT ( italic_t ) = italic_f ( italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t ) , italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) , … , italic_S start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ( italic_t ) ) .

The Birnbaum measure (Birnbaum,, 1969) can be used to evaluate the survival importance of different clusters given by

Spop(t)Sl(t)=Spop(t)θpl,l=1,,L.formulae-sequencesubscript𝑆𝑝𝑜𝑝𝑡subscript𝑆𝑙𝑡subscript𝑆𝑝𝑜𝑝𝑡𝜃subscript𝑝𝑙𝑙1𝐿\frac{\partial S_{pop}(t)}{\partial S_{l}(t)}=S_{pop}(t)\theta p_{l},\ l=1,...% ,L.divide start_ARG ∂ italic_S start_POSTSUBSCRIPT italic_p italic_o italic_p end_POSTSUBSCRIPT ( italic_t ) end_ARG start_ARG ∂ italic_S start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) end_ARG = italic_S start_POSTSUBSCRIPT italic_p italic_o italic_p end_POSTSUBSCRIPT ( italic_t ) italic_θ italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT , italic_l = 1 , … , italic_L .

Note that the Birnbaum measure does not account for any time-dependent covariate. A systematic overview of different importance measures can be found in Wu and Coolen, (2022).

3 Simulation study

We provide insights about the parameter estimation of the proposed GPTCM in Section 2.1 by using Monte Carlo simulations. We consider sample sizes of n=200, 500𝑛200500n=200,\;500italic_n = 200 , 500 and 1000100010001000. Each sample/patient has two clinical covariates (i.e. one row of the clinical data matrix 𝐗0n×2subscript𝐗0superscript𝑛2\mathbf{X}_{0}\in\mathbb{R}^{n\times 2}bold_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × 2 end_POSTSUPERSCRIPT), and has cells belonging to L=3𝐿3L=3italic_L = 3 tumor cell subtypes with each subtype consisting of two cell-type-specific covariates (i.e. one row of data matrix 𝐗ln×2subscript𝐗𝑙superscript𝑛2\mathbf{X}_{l}\in\mathbb{R}^{n\times 2}bold_X start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × 2 end_POSTSUPERSCRIPT, l{1,,L}𝑙1𝐿l\in\{1,...,L\}italic_l ∈ { 1 , … , italic_L }). Each sample also has tumor cell subtype proportions data (i.e. one row of the proportions data matrix 𝐩[0,1]n×2𝐩superscript01𝑛2\mathbf{p}\in[0,1]^{n\times 2}bold_p ∈ [ 0 , 1 ] start_POSTSUPERSCRIPT italic_n × 2 end_POSTSUPERSCRIPT). Every covariate is generated independently from the standard normal distribution except the first clinical variable generated from the Bernoulli distribution. The tumor cell subtype proportions of each sample is generated independent from the Dirichlet distribution. The survival times are generated based on the population survival function (3) using rate parameter θ=exp(ξ0+𝐗0𝝃)𝜃subscript𝜉0subscript𝐗0𝝃\theta=\exp(\xi_{0}+\mathbf{X}_{0}\bm{\xi})italic_θ = roman_exp ( italic_ξ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + bold_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT bold_italic_ξ ), and using the Weibull distributed survival functions with mean parameters log𝝁l=𝐗l𝜷lsubscript𝝁𝑙subscript𝐗𝑙subscript𝜷𝑙\log\bm{\mu}_{l}=\mathbf{X}_{l}\bm{\beta}_{l}roman_log bold_italic_μ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT = bold_X start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT bold_italic_β start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT, l{1,,L}𝑙1𝐿l\in\{1,...,L\}italic_l ∈ { 1 , … , italic_L }. Censoring is generated through an exponential distribution with approximately 50%percent5050\%50 % censoring rate. The true values of all parameters are shown in Table 1.

The maximum likelihood (ML) estimation is to maximize the log-likelihood function for right-censored survival data, i.e.

(ϑ|𝒟)=i=1nfpop(ti|𝐗0,𝐗1,𝐗2,𝐗3,𝐩)δiSpop(ti|𝐗0,𝐗1,𝐗2,𝐗3,𝐩)1δi,conditionalbold-italic-ϑ𝒟superscriptsubscriptproduct𝑖1𝑛subscript𝑓𝑝𝑜𝑝superscriptconditionalsubscript𝑡𝑖subscript𝐗0subscript𝐗1subscript𝐗2subscript𝐗3𝐩subscript𝛿𝑖subscript𝑆𝑝𝑜𝑝superscriptconditionalsubscript𝑡𝑖subscript𝐗0subscript𝐗1subscript𝐗2subscript𝐗3𝐩1subscript𝛿𝑖\mathcal{L}(\bm{\vartheta}|\mathcal{D})=\prod_{i=1}^{n}f_{pop}(t_{i}|\mathbf{X% }_{0},\mathbf{X}_{1},\mathbf{X}_{2},\mathbf{X}_{3},\mathbf{p})^{\delta_{i}}S_{% pop}(t_{i}|\mathbf{X}_{0},\mathbf{X}_{1},\mathbf{X}_{2},\mathbf{X}_{3},\mathbf% {p})^{{1-\delta_{i}}},caligraphic_L ( bold_italic_ϑ | caligraphic_D ) = ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_f start_POSTSUBSCRIPT italic_p italic_o italic_p end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , bold_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , bold_p ) start_POSTSUPERSCRIPT italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_S start_POSTSUBSCRIPT italic_p italic_o italic_p end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | bold_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , bold_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , bold_p ) start_POSTSUPERSCRIPT 1 - italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ,

where ϑbold-italic-ϑ\bm{\vartheta}bold_italic_ϑ consists of all unknown parameters and 𝒟={ti,δi,𝐗0,𝐗1,𝐗2,𝐗3,𝐩}i=1n𝒟superscriptsubscriptsubscript𝑡𝑖subscript𝛿𝑖subscript𝐗0subscript𝐗1subscript𝐗2subscript𝐗3𝐩𝑖1𝑛\mathcal{D}=\{t_{i},\delta_{i},\mathbf{X}_{0},\mathbf{X}_{1},\mathbf{X}_{2},% \mathbf{X}_{3},\mathbf{p}\}_{i=1}^{n}caligraphic_D = { italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , bold_X start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , bold_p } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT consists of all data information including each sample’s observed survival time tisubscript𝑡𝑖t_{i}italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, censoring indicator δisubscript𝛿𝑖\delta_{i}italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, covariates, and cell subtype proportions. The R function nlminb using the adaptive nonlinear least-squares algorithm is to perform the optimization. We repeat the scenario of each sample size 1000100010001000 times to obtain the ML estimates with mean and standard error.

Table 1 shows that the mean squared error (MSE) of each estimate decreases with the increase of sample size as we expected. The performance of the ML estimates of all parameters, except for the Weibull’s shape parameter log(κ)𝜅\log(\kappa)roman_log ( italic_κ ), are close to their true values. However, future work for further investigations with more simulation scenarios (e.g. more covariates and model misspecification) and applications to real data is needed to better understand the implications of the proposed model.

Table 1: Simulation results with maximum likelihood estimates (standard errors in parentheses) and the mean squared errors for different sample sizes
Parameter Truth Estimate MSE Estimate MSE Estimate MSE
n=200𝑛200n=200italic_n = 200 n=500𝑛500n=500italic_n = 500 n=1000𝑛1000n=1000italic_n = 1000
log(κ)𝜅\log(\kappa)roman_log ( italic_κ ) 1.10 1.69 (0.098) 0.361 1.64 (0.060) 0.293 1.62 (0.041) 0.271
ξ1subscript𝜉1\xi_{1}italic_ξ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT -0.80 -0.81 (0.193) 0.037 -0.80 (0.120) 0.014 -0.79 (0.082) 0.007
ξ2subscript𝜉2\xi_{2}italic_ξ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT 0.90 1.05 (0.242) 0.080 1.05 (0.150) 0.044 1.03 (0.100) 0.028
ξ3subscript𝜉3\xi_{3}italic_ξ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT 0.60 0.71 (0.134) 0.031 0.71 (0.080) 0.020 0.71 (0.054) 0.015
β11subscript𝛽11\beta_{11}italic_β start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT 0.40 0.29 (0.086) 0.018 0.29 (0.049) 0.014 0.29 (0.034) 0.013
β12subscript𝛽12\beta_{12}italic_β start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT -0.30 -0.22 (0.084) 0.013 -0.22 (0.049) 0.009 -0.22 (0.034) 0.008
β21subscript𝛽21\beta_{21}italic_β start_POSTSUBSCRIPT 21 end_POSTSUBSCRIPT 0.25 0.18 (0.076) 0.010 0.18 (0.046) 0.007 0.18 (0.032) 0.005
β22subscript𝛽22\beta_{22}italic_β start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT -0.45 -0.33 (0.084) 0.021 -0.33 (0.051) 0.017 -0.33 (0.036) 0.016
β31subscript𝛽31\beta_{31}italic_β start_POSTSUBSCRIPT 31 end_POSTSUBSCRIPT -0.20 -0.15 (0.075) 0.008 -0.15 (0.048) 0.005 -0.15 (0.033) 0.004
β32subscript𝛽32\beta_{32}italic_β start_POSTSUBSCRIPT 32 end_POSTSUBSCRIPT 0.30 0.22 (0.080) 0.012 0.22 (0.050) 0.008 0.22 (0.035) 0.007

4 Conclusion

We have presented a new promotion time cure model GPTCM that is a generalized version of the classical PTCM. The new formulation consists of the part of a mixture of survival distributions that is strongly motivated by biological intra-tumor heterogeneity of patients rather than mathematical construction of a mixture model. The new modeling framework is flexible to model survival data with intra-sample heterogeneity in biomedicine or intra-system heterogeneity in reliability engineering. Note that both the PTCM and GPTCM assume the latent variable N𝑁Nitalic_N (i.e. number of clonogenic cells) independent of time t𝑡titalic_t. Cha and Finkelstein, (2018) presented various shock models, including the PTCM as a special case, from the counting process of view. Therefore, a future direction for an extension of the GPTCM is to model the dynamics of the number of clonogenic cells by treating N(t)𝑁𝑡N(t)italic_N ( italic_t ) as a counting process, which can better mimic tumor evolution.

Acknowledgments

This work was supported by the University of Oslo innovation funds, ERA PerMed under the ERA-NET Cofund scheme of the European Union’s Horizon 2020 research and innovation framework program (grant ‘SYMMETRY’ ERAPERMED2021-330). The authors would like to thank Manuela Zucknick for discussions.

Appendix A Mixture survival model

When we assume only one tumor cell left active after an initial treatment, this tumor cell belongs to one of the L𝐿Litalic_L tumor cell subtypes, with probability (Nl=1)=plsubscript𝑁𝑙1subscript𝑝𝑙\mathbb{P}(N_{l}=1)=p_{l}blackboard_P ( italic_N start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT = 1 ) = italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT, l{1,2,,L}𝑙12𝐿l\in\{1,2,...,L\}italic_l ∈ { 1 , 2 , … , italic_L }, l=1Lpl=1superscriptsubscript𝑙1𝐿subscript𝑝𝑙1\sum_{l=1}^{L}p_{l}=1∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT = 1. Using the same notations as Section 2.1, now the population survival function is

Spop(t)subscript𝑆𝑝𝑜𝑝𝑡\displaystyle S_{pop}(t)italic_S start_POSTSUBSCRIPT italic_p italic_o italic_p end_POSTSUBSCRIPT ( italic_t ) =(N=0)+(𝑾1>t,𝑾L>t,N=l=1LNl=1)absent𝑁0formulae-sequencesubscript𝑾1𝑡formulae-sequencesubscript𝑾𝐿𝑡𝑁superscriptsubscript𝑙1𝐿subscript𝑁𝑙1\displaystyle=\mathbb{P}(N=0)+\mathbb{P}(\bm{W}_{1}>t,...\bm{W}_{L}>t,N=\sum_{% l=1}^{L}N_{l}=1)= blackboard_P ( italic_N = 0 ) + blackboard_P ( bold_italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > italic_t , … bold_italic_W start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT > italic_t , italic_N = ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT = 1 )
=(N=0)+l=1L(𝑾l>t|Nl=1)(Nl=1)absent𝑁0superscriptsubscript𝑙1𝐿subscript𝑾𝑙conditional𝑡subscript𝑁𝑙1subscript𝑁𝑙1\displaystyle=\mathbb{P}(N=0)+\sum_{l=1}^{L}\mathbb{P}(\bm{W}_{l}>t|N_{l}=1)% \cdot\mathbb{P}(N_{l}=1)= blackboard_P ( italic_N = 0 ) + ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT blackboard_P ( bold_italic_W start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT > italic_t | italic_N start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT = 1 ) ⋅ blackboard_P ( italic_N start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT = 1 )
=(N=0)+l=1LplSl(t).absent𝑁0superscriptsubscript𝑙1𝐿subscript𝑝𝑙subscript𝑆𝑙𝑡\displaystyle=\mathbb{P}(N=0)+\sum_{l=1}^{L}p_{l}S_{l}(t).= blackboard_P ( italic_N = 0 ) + ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) .

This is a classical mixture model. However, the assumption with only one tumor cell left active after an initial treatment is not biologically meaningful in cancer research.

Appendix B Last-activation scheme

Our approach in Section 2.1 can be adapted straightforwardly for the last-activation scheme. The population survival function is now

S~pop(t)subscript~𝑆𝑝𝑜𝑝𝑡\displaystyle\tilde{S}_{pop}(t)over~ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_p italic_o italic_p end_POSTSUBSCRIPT ( italic_t ) =(N=0)+(max{max{𝑾1},,max{𝑾L}}>t,N>0)absent𝑁0formulae-sequencesubscript𝑾1subscript𝑾𝐿𝑡𝑁0\displaystyle=\mathbb{P}(N=0)+\mathbb{P}\left(\max\{\max\{\bm{W}_{1}\},...,% \max\{\bm{W}_{L}\}\}>t,N>0\right)= blackboard_P ( italic_N = 0 ) + blackboard_P ( roman_max { roman_max { bold_italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } , … , roman_max { bold_italic_W start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT } } > italic_t , italic_N > 0 )
=(N=0)+(max{max{𝑾1},,max{𝑾L}}>t|N>0)(N>0)absent𝑁0subscript𝑾1subscript𝑾𝐿𝑡ket𝑁0𝑁0\displaystyle=\mathbb{P}(N=0)+\mathbb{P}\left(\max\{\max\{\bm{W}_{1}\},...,% \max\{\bm{W}_{L}\}\}>t|N>0\right)\mathbb{P}(N>0)= blackboard_P ( italic_N = 0 ) + blackboard_P ( roman_max { roman_max { bold_italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } , … , roman_max { bold_italic_W start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT } } > italic_t | italic_N > 0 ) blackboard_P ( italic_N > 0 )
=(N=0)+[1(max{max{𝑾1},,max{𝑾L}}t|N>0)](N>0)absent𝑁0delimited-[]1subscript𝑾1subscript𝑾𝐿𝑡ket𝑁0𝑁0\displaystyle=\mathbb{P}(N=0)+\left[1-\mathbb{P}\left(\max\{\max\{\bm{W}_{1}\}% ,...,\max\{\bm{W}_{L}\}\}\leq t|N>0\right)\right]\mathbb{P}(N>0)= blackboard_P ( italic_N = 0 ) + [ 1 - blackboard_P ( roman_max { roman_max { bold_italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } , … , roman_max { bold_italic_W start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT } } ≤ italic_t | italic_N > 0 ) ] blackboard_P ( italic_N > 0 )
=(N=0)+(N>0)(max{max{𝑾1},,max{𝑾L}}t|N>0)(N>0)absent𝑁0𝑁0subscript𝑾1subscript𝑾𝐿𝑡ket𝑁0𝑁0\displaystyle=\mathbb{P}(N=0)+\mathbb{P}(N>0)-\mathbb{P}\left(\max\{\max\{\bm{% W}_{1}\},...,\max\{\bm{W}_{L}\}\}\leq t|N>0\right)\mathbb{P}(N>0)= blackboard_P ( italic_N = 0 ) + blackboard_P ( italic_N > 0 ) - blackboard_P ( roman_max { roman_max { bold_italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } , … , roman_max { bold_italic_W start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT } } ≤ italic_t | italic_N > 0 ) blackboard_P ( italic_N > 0 )
=1(max{𝑾1}t,,max{𝑾L}t,N>0)absent1formulae-sequencesubscript𝑾1𝑡formulae-sequencesubscript𝑾𝐿𝑡𝑁0\displaystyle=1-\mathbb{P}(\max\{\bm{W}_{1}\}\leq t,...,\max\{\bm{W}_{L}\}\leq t% ,N>0)= 1 - blackboard_P ( roman_max { bold_italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } ≤ italic_t , … , roman_max { bold_italic_W start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT } ≤ italic_t , italic_N > 0 )
=1k=1(max{𝑾1}t,,max{𝑾L}t|N=k)(N=k)absent1superscriptsubscript𝑘1formulae-sequencesubscript𝑾1𝑡subscript𝑾𝐿conditional𝑡𝑁𝑘𝑁𝑘\displaystyle=1-\sum_{k=1}^{\infty}\mathbb{P}\left(\max\{\bm{W}_{1}\}\leq t,..% .,\max\{\bm{W}_{L}\}\leq t|N=k\right)\cdot\mathbb{P}(N=k)= 1 - ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT blackboard_P ( roman_max { bold_italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } ≤ italic_t , … , roman_max { bold_italic_W start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT } ≤ italic_t | italic_N = italic_k ) ⋅ blackboard_P ( italic_N = italic_k )
=1k=1All config.(N1,,NL) with sum k(max{𝑾1}t,,max{𝑾L}t|N=k,N1,,NL)(N=k,N1,,NL).\displaystyle=1-\sum_{k=1}^{\infty}\sum_{\begin{subarray}{c}\text{All config.}% \\ (N_{1},...,N_{L})\\ \text{ with sum }k\end{subarray}}\mathbb{P}\left(\max\{\bm{W}_{1}\}\leq t,...,% \max\{\bm{W}_{L}\}\leq t|N=k,N_{1},...,N_{L}\right)\cdot\mathbb{P}(N=k,N_{1},.% ..,N_{L}).= 1 - ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL All config. end_CELL end_ROW start_ROW start_CELL ( italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_N start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL with sum italic_k end_CELL end_ROW end_ARG end_POSTSUBSCRIPT blackboard_P ( roman_max { bold_italic_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } ≤ italic_t , … , roman_max { bold_italic_W start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT } ≤ italic_t | italic_N = italic_k , italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_N start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) ⋅ blackboard_P ( italic_N = italic_k , italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_N start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) .

Denote the component-specific cdf as Fl(t)=1Sl(t)subscript𝐹𝑙𝑡1subscript𝑆𝑙𝑡F_{l}(t)=1-S_{l}(t)italic_F start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) = 1 - italic_S start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ). By using the multinomial-Poisson transformation, the multinomial theorem and the power series of the exponential function, we obtain

S~pop(t)subscript~𝑆𝑝𝑜𝑝𝑡\displaystyle\tilde{S}_{pop}(t)over~ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_p italic_o italic_p end_POSTSUBSCRIPT ( italic_t ) =1k=1All config.(N1,,NL) with sum kl=1L(max{𝑾l}t|Nl)𝒫ois(k;θ)ult(k;𝒑)absent1superscriptsubscript𝑘1subscriptAll config.subscript𝑁1subscript𝑁𝐿 with sum 𝑘superscriptsubscriptproduct𝑙1𝐿subscript𝑾𝑙conditional𝑡subscript𝑁𝑙𝒫𝑜𝑖𝑠𝑘𝜃𝑢𝑙𝑡𝑘𝒑\displaystyle=1-\sum_{k=1}^{\infty}\sum_{\begin{subarray}{c}\text{All config.}% \\ (N_{1},...,N_{L})\\ \text{ with sum }k\end{subarray}}\prod_{l=1}^{L}\mathbb{P}(\max\{\bm{W}_{l}\}% \leq t|N_{l})\cdot\mathcal{P}ois(k;\theta)\mathcal{M}ult(k;\bm{p})= 1 - ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL All config. end_CELL end_ROW start_ROW start_CELL ( italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_N start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL with sum italic_k end_CELL end_ROW end_ARG end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT blackboard_P ( roman_max { bold_italic_W start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT } ≤ italic_t | italic_N start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) ⋅ caligraphic_P italic_o italic_i italic_s ( italic_k ; italic_θ ) caligraphic_M italic_u italic_l italic_t ( italic_k ; bold_italic_p )
=1k=1All config.(N1,,NL) with sum kl=1LFl(t)Nl𝒫ois(k;θ)ult(k;𝒑)absent1superscriptsubscript𝑘1subscriptAll config.subscript𝑁1subscript𝑁𝐿 with sum 𝑘superscriptsubscriptproduct𝑙1𝐿subscript𝐹𝑙superscript𝑡subscript𝑁𝑙𝒫𝑜𝑖𝑠𝑘𝜃𝑢𝑙𝑡𝑘𝒑\displaystyle=1-\sum_{k=1}^{\infty}\sum_{\begin{subarray}{c}\text{All config.}% \\ (N_{1},...,N_{L})\\ \text{ with sum }k\end{subarray}}\prod_{l=1}^{L}F_{l}(t)^{N_{l}}\cdot\mathcal{% P}ois(k;\theta)\mathcal{M}ult(k;\bm{p})= 1 - ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL All config. end_CELL end_ROW start_ROW start_CELL ( italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_N start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL with sum italic_k end_CELL end_ROW end_ARG end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_F start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋅ caligraphic_P italic_o italic_i italic_s ( italic_k ; italic_θ ) caligraphic_M italic_u italic_l italic_t ( italic_k ; bold_italic_p )
=1{eθl=1LplFl(t)1}eθabsent1superscript𝑒𝜃superscriptsubscript𝑙1𝐿subscript𝑝𝑙subscript𝐹𝑙𝑡1superscript𝑒𝜃\displaystyle=1-\left\{e^{\theta\sum_{l=1}^{L}p_{l}F_{l}(t)}-1\right\}e^{-\theta}= 1 - { italic_e start_POSTSUPERSCRIPT italic_θ ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) end_POSTSUPERSCRIPT - 1 } italic_e start_POSTSUPERSCRIPT - italic_θ end_POSTSUPERSCRIPT
=1+eθeθ{1l=1LplFl(t)}absent1superscript𝑒𝜃superscript𝑒𝜃1superscriptsubscript𝑙1𝐿subscript𝑝𝑙subscript𝐹𝑙𝑡\displaystyle=1+e^{-\theta}-e^{-\theta\{1-\sum_{l=1}^{L}p_{l}F_{l}(t)\}}= 1 + italic_e start_POSTSUPERSCRIPT - italic_θ end_POSTSUPERSCRIPT - italic_e start_POSTSUPERSCRIPT - italic_θ { 1 - ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) } end_POSTSUPERSCRIPT
=1+eθeθl=1LplSl(t).absent1superscript𝑒𝜃superscript𝑒𝜃superscriptsubscript𝑙1𝐿subscript𝑝𝑙subscript𝑆𝑙𝑡\displaystyle=1+e^{-\theta}-e^{-\theta\sum_{l=1}^{L}p_{l}S_{l}(t)}.= 1 + italic_e start_POSTSUPERSCRIPT - italic_θ end_POSTSUPERSCRIPT - italic_e start_POSTSUPERSCRIPT - italic_θ ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_S start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_t ) end_POSTSUPERSCRIPT .

References

  • Amico and Keilegom, (2018) Amico, M. and Keilegom, I. V. (2018). Cure models in survival analysis. Annual Review of Statistics and Its Application, 5(1):311–342.
  • Birnbaum, (1969) Birnbaum, Z. W. (1969). On the importance of different components in a multicom- ponent system. In Krishnaiah, P., editor, Multivariate Analysis - II, page 581–592, New York, USA. Academic Press.
  • Brookmeyer and Damiano, (1989) Brookmeyer, R. and Damiano, A. (1989). Statistical methods for short‐term projections of aids incidence. Statistics in Medicine, 8(1):23–34.
  • Cha and Finkelstein, (2018) Cha, J. H. and Finkelstein, M. (2018). Point Processes for Reliability Analysis. Springer.
  • Chen et al., (1999) Chen, M.-H., Ibrahim, J. G., and Sinha, D. (1999). A new Bayesian model for survival data with a surviving fraction. Journal of the American Statistical Association, 94(447):909–919.
  • Christen and Rubio, (2025) Christen, J. and Rubio, F. (2025). On harmonic oscillator hazard functions. Statistics & Probability Letters, 217:110304.
  • Cooner et al., (2007) Cooner, F., Banerjee, S., Carlin, B. P., and Sinha, D. (2007). Flexible cure rate modeling under latent activation schemes. Journal of the American Statistical Association, 102(478):560–572.
  • Greenacre, (2021) Greenacre, M. (2021). Compositional data analysis. Annual Review of Statistics and Its Application, 8(1):271–299.
  • Gómez et al., (2023) Gómez, Y. M., Gallardo, D. I., Bourguignon, M., Bertolli, E., and Calsavara, V. F. (2023). A general class of promotion time cure rate models with a new biological interpretation. Lifetime Data Analysis, 29(1):66–86.
  • Hanin and Huang, (2014) Hanin, L. and Huang, L.-S. (2014). Identifiability of cure models revisited. Journal of Multivariate Analysis, 130:261–274.
  • Jiang and Murthy, (1995) Jiang, R. and Murthy, D. (1995). Modeling failure-data by mixture of 2 weibull distributions: a graphical approach. IEEE Transactions on Reliability, 44(3):477–488.
  • Mangiola et al., (2023) Mangiola, S., Roth-Schulze, A. J., Trussart, M., Zozaya-Valdés, E., Ma, M., Gao, Z., Rubin, A. F., Speed, T. P., Shim, H., and Papenfuss, A. T. (2023). sccomp: Robust differential composition and variability analysis for single-cell data. Proceedings of the National Academy of Sciences, 120(33).
  • Rosen and Tanner, (1999) Rosen, O. and Tanner, M. (1999). Mixtures of proportional hazards regression models. Statistics in Medicine, 18(9):1119–1131.
  • Trapnell, (2015) Trapnell, C. (2015). Defining cell types and states with single-cell genomics. Genome Research, 25(10):1491–1498.
  • Wei and Liu, (2023) Wei, Y. and Liu, S. (2023). Reliability analysis of series and parallel systems with heterogeneous components under random shock environment. Computers & Industrial Engineering, 179:109214.
  • Wu and Coolen, (2022) Wu, S. and Coolen, F. (2022). Importance measures in reliability engineering: An introductory overview. In Salhi, S. and Boylan, J., editors, The Palgrave Handbook of Operations Research, page 659–674, Cham, Switzerland. Springer.
  • Yakovlev, (1996) Yakovlev, A. (1996). Threshold models of tumor recurrence. Mathematical and Computer Modelling, 23(6):153–164.