0% found this document useful (0 votes)
45 views6 pages

Bayesian Statistics: 5.3 Poisson Model For Count Data

1) The document discusses using the Poisson distribution to model count data, where the number of occurrences in an interval is modeled as a random variable y with a Poisson distribution with parameter θ. 2) The conjugate prior for the Poisson distribution is the Gamma distribution. When used as a prior, the posterior distribution after observing the data y1:n is also Gamma distributed, with updated parameters. 3) The prior predictive distribution of the data when θ has a Gamma prior is Negative Binomial. The posterior predictive distribution is also Negative Binomial with updated parameters.

Uploaded by

lamaridis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views6 pages

Bayesian Statistics: 5.3 Poisson Model For Count Data

1) The document discusses using the Poisson distribution to model count data, where the number of occurrences in an interval is modeled as a random variable y with a Poisson distribution with parameter θ. 2) The conjugate prior for the Poisson distribution is the Gamma distribution. When used as a prior, the posterior distribution after observing the data y1:n is also Gamma distributed, with updated parameters. 3) The prior predictive distribution of the data when θ has a Gamma prior is Negative Binomial. The posterior predictive distribution is also Negative Binomial with updated parameters.

Uploaded by

lamaridis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Bayesian Statistics

Course notes by Robert Piche, Tampere University of Technology


based on notes by Antti Penttinen, University of Jyvaskyla
version: February 27, 2009

5.3 Poisson model for count data


Let #I denote the number of occurrences of some phenomenon that are observed
in an interval I (of time, usually). For example, #I could be the number of traffic
accidents on a given stretch of highway, the number of particles emitted in the
radioactive decay of an isotope sample, the number of outbreaks of a given disease
in a given city. . . The number y of occurrences per unit time is often modelled
y
as y | Poisson( ), which has the pmf P(#(t0 ,t0 + 1] = y | ) = (y!) e (y
{0, 1, 2, . . .}).
The Poisson model can be derived as follows. Assume that the events are
relatively rare and occur at a constant rate , that is,

P(#(t,t + h] = 1 | ) = h + o(h), P(#(t,t + h] 2 | ) = o(h),

where o(h) means lim o(h)


h = 0. Assume also that the numbers of occurences in
h0
distinct intervals are independent given . Letting Pk (t) := P(#(0,t] = k | ), we
have

P0 (t + h) = P(#(0,t] = 0 & #(t,t + h] = 0 | )


= P(#(t,t + h] = 0 | #(0,t] = 0, )P(#(0,t] = 0 | )
= (1 h + o(h))P0 (t).

Letting h 0 gives the differential equation P00 (t) = P0 (t), which with the
initial condition P0 (0) = 1 has the solution P0 (t) = et . Similary, for k > 0 we
have

Pk (t + h) = P(#(0,t] = k & #(t,t + h] = 0 | )


+ P(#(0,t] = k 1 & #(t,t + h] = 1 | )
= (1 h + o(h))Pk (t) + ( h + o(h))Pk1 (t),

which in the limit h 0 gives the differential equations

Pk0 (t) = Pk (t) + Pk1 (t).

Solving these with the initial conditions Pk (0) = 0 (k > 0) gives

(t)k t
Pk (t) = e , (k {0, 1, 2, . . .}),
k!
which for t = 1 is the Poisson pmf.
Thus, a Poisson-distributed random variable y | Poisson( ) has the pmf

k
P(y = k | ) = e , (k {0, 1, 2, . . .})
k!
and the summary statistics

E(y | ) = , V(y | ) = .

1
The likelihood pmf of a sequence y1 , . . . , yn of Poisson-distributed counts on
unit-length intervals, assumed to be mutually independent conditional on , is
n
( )yi
p(y1:n | ) = e s en
i=1 yi !

where s = ni=1 yi .
The conjugate prior for the Poisson distribution is the Gamma(, ) distribu-
tion, which has the pdf
1
p( ) = e ( > 0).
()
The distribution gets its name from the normalisation factor of its pdf. The mean,
variance and mode of Gamma(, ) are
1
E( ) = , V( ) = , mode( ) = .
2
The formula for the mean can be derived as follows:
Z
1
E( ) = e d
0 ()
( + 1) +1 (+1)1
Z
= e d
() 0 ( + 1)
()
= 1 = .
()
The parameter > 0 is a scaling factor (note that some tables and software use
1/ instead of to specify the gamma distribution); the parameter > 0 deter-
mines the shape:
1

Gamma(,1)

=1
p()

=2

=5

0
0 10

With the likelihood pdf p(y1:n | ) s en and the prior pdf p( ) 1 e ,


Bayess formula gives the posterior pdf

p( | y1:n ) +s1 e( +n) ,

that is, | y1:n Gamma( + s, + n). The and parameters in the priors
gamma distribution are thus updated to + s and + n in the posteriors gamma
distribution. The summary statistics are updated similarly, in particular the poste-
rior mean and posterior mode (MAP estimate) are
+s +s1
E( | y1:n ) = , mode( | y1:n ) = .
+n +n
As n , both the posterior mean and posterior mode tend to y = s/n.

2
The prior predictive distribution (marginal distribution of data) has the pmf
Z Z k
1
P(y = k) = P(y = k | )p( ) d = e e
0 0 k! ()
Z
( + k) ( + 1)+k
= +k1 e( +1) d
k!( + 1)+k () 0 ( + k)
| {z }
=1
   k
( + k 1)( + k 2) () 1
=
()k! +1 +1
    k
+k1 1
= .
1 +1 +1

This is the pmf of the negative binomial distribution. The summary statistics of
y NegBin(, ) are

E(y)
= , V(y)
= + .
2
The negative binomial distribution also happens to model the number of Bernoulli
failures occurring before the th success when the probability of success is p =

+1 . For this reason, many software packages (including Matlab, R and Win-
BUGS) use p instead of as the second parameter to specify the negative bino-
mial distribution.
The posterior predictive distribution can be derived similarly as the prior pre-
dictive, and is
y | y1:n NegBin( + s, + n).

Example: moose counts A region is divided into equal-area (100 km2 ) squares
and the moose in each square are counted. The prior distribution is Gamma(4, 0.5),
which corresponds to the prior predictive pmf y NegBin(4, 0.5).
0.12
0.1

p()
y = k)
P(

0 0
0 5 10 15 20 0 5 10 15 20
k

On a certain day the following moose counts are


collected from an aerial survey of 15 squares: 4
n 15
3 mean 7.07
5 7 7 12 2 # sd 4.51
2
14 7 8 5 6 1
18 6 4 1 4
0
1 5 9 13 17
y
The posterior distribution for the rate (i.e. number
of moose per 100 km2 ) is | y1:15 Gamma(110, 15.5), for which

E( | y1:15 ) = 7.0968, V( | y1:15 ) = 0.67672 , mode( | y1:15 ) = 7.0323

and the 95% credibility interval is [5.83, 8.48]. (The normal approximation
gives the interval [5.77, 8.42].) The predictive posterior distribution is y | y1:n

3
NegBin(110, 15.5).
0.6

0.15

p(|y)
y = k|y)
P(

Moose counts

0 0
0 5 10 15 20 0 5 10 15 20
k

theta
A WinBUGS model for this problem is
model {
for (i in 1:n) { y[i] dpois(theta) }
theta dgamma(4,0.5)
ypred dpois(theta)
} ypred y[i]

The data is entered as


for(i IN 1 : n)
list( y=c(5,7,7,12,2,14,7,8,5,6,18,6,4,1,4), n=15)

The results are


node mean sd 2.5% median 97.5%
theta 7.107 0.6608 5.85 7.101 8.482
ypred 7.098 2.838 2.0 7.0 13.0

A more general Poisson model can be used for counts of occurrences in intervals of
different sizes. The model is
yi | Poisson(ti )
where the ti are known positive values, sometimes called exposures. Assuming as usual
that the counts are mutually independent given , the likelihood is

p(y1:n | ) s e T

where s = ni=1 yi and T = ni=1 ti . With the conjugate prior Gamma(, ), the pos-
terior is
| y1:n Gamma( + s, + T ),
with
+s +s1
E( | y1:n ) = , mode( | y1:n ) = .
+T +T
As n , both the posterior mean and posterior mode tend towards s/T .

4
5.4 Exponential model for lifetime data
Consider a non-negative random variable y used to model intervals such as the time-to-
failure of machine components or a patients survival time. In such applications, it is
typical to specify the probability distribution using a hazard function, from which the cdf
and pdf can be deduced (and vice versa).
The hazard function is defined by

P(t < y t + dt) p(t) dt


h(t) dt = P(t < y t + dt | t < y ) = = ,
| {z } |{z} P(t < y) S(t)
fail in (t,t+dt] OK at t

where p is the pdf of y and S(t) := P(t < y) is called the reliability function. Now, because
0 (t)
p(t) = S0 (t), we have the differential equation h(t) = SS(t) with initial condition S(0) =
1, which can be solved to give Rt
S(t) = e 0 h() d .
In particular, for constant hazard h(t) = the reliability is S(t) = et and the density is
the exponential distribution pdf

p(t) = et .

Suppose a component has worked without failure for s time units. Then according to the
constant-hazard model, the probability that it will survive at least t time units more is

P(y > s & y > s + t) P(y > s + t) e (t+s)


P(y > s + t | y > s) = = = s = et ,
P(y > s) P(y > s) e

which is the same probability as for a new component! This is the lack-of-memory or
no-aging property of the constant-hazard model.
For an exponentially distributed random variable y | Exp( ) the mean and vari-
ance are
1 1
E(y | ) = , V(y | ) = 2 .

The exponential distribution also models the durations (waiting times) between consecu-
tive Poisson-distributed occurrences.
For exponentially-distributed samples yi | Exp( ) that are mutually independent
given , the likelihood is
n
p(y1:n | ) = e yi = n e s
i=1

where s = ni=1 yi . Using the conjugate prior Gamma(, ), the posterior pdf is

p( | y1:n ) 1 e n e s = +n1 e( +s) ,

that is, | y1:n Gamma( + n, + s), for which

+n +n1 +n
E( | y1:n ) = , mode( | y1:n ) = , V( | y1:n ) = .
+s +s ( + s)2

It often happens that lifetime or survival studies are ended before all the samples
have failed or died. Then, in addition to k observations y1 , . . . , yk [0, L], we have n k
samples whose lifetimes are known to be y j > L, but are otherwise unknown. This is
called a censored data set. The censored observations can be modelled as Bernoulli trials
with success (z j = 1), corresponding to y j > L, having the probability

P(y j > L | ) = e L .

5
The likelihood of the censored data is
k nk
p(y1:k , z1:nk | ) = e yi e L = k e (sk +(nk)L) ,
i=1 j=1

where sk = ki=1 yi . With the conjugate prior Gamma(, ), the posterior pdf is
p( | y1:k , z1:nk ) 1 e k e (sk +(nk)L) = +k1 e( +sk +(nk)L) ,
that is, | y1:k , z1:nk Gamma( + k, + sk + (n k)L).

Example: Censored lifetime data In a two-year survival study of 15 cancer pa-


tients, the lifetimes (in years) are
1.54,0.70,1.23,0.82,0.99,1.33,0.38,0.99,1.97,1.10,0.40
and 4 patients are still alive at the end of the study. Assuming mutually independent
Lifetimes
yi | Exp( ) conditional on , and choosing the prior Gamma(2, 1), we obtain the
posterior
| y1:11 , z1:4 Gamma(2 + 11, 1 + 11.45 + 4 2) = Gamma(13, 20.45)
which has mean 0.636, variance (0.176)2 , and 95% credibility interval (0.338, 1.025).
The normal approximation has 95% credibility interval (0.290, 0.981).

theta
posterior
p
1

prior

0
0 1 2 3 4
y[i] L[i]
A WinBUGS model for this problem is
model { for(i IN 1 : n)
theta dgamma(2,1)
for (i in 1:n) {y[i] dexp(theta)I(L[i],)}
}
Censoring is represented by appending the I(lower,upper) modifier to the distribution
specification. The data is entered as
list(y=c(1.54,0.70,1.23,0.82,0.99,1.33,0.38,0.99,1.97,1.10,0.40,
NA,NA,NA,NA), n=15, L=c(0,0,0,0,0,0,0,0,0,0,0,2,2,2,2))
where the censored observations are represented
modelby{ NA. The results after 2000 simulation
steps are theta ~ dgamma(2,1)
node mean sd for( i in 1median
2.5% :n){ 97.5%
y[i] ~
theta 0.6361 0.1789 0.3343 0.6225 1.045 dexp(theta)I(L[i],)
}
}

data
list(y=c(1.54,0.7,1.23,0.82,0.99,1.33,0.38,0.99,1.97,1.10,0.40,NA,NA,NA,
L=c(0,0,0,0,0,0,0,0,0,0,0,2,2,2,2))

inits
list(theta=0.6)

Note: press "gen inits" to initialise the chain for y[12:15]

6
theta sample: 2000
3.0

You might also like