ABC-SysBio – Approximate Bayesian Computation in Python with GPU support

ABC-SysBio: Approximate Bayesian Computation in Python with GPU
support

Chris Barnes, Michael Stumpf
(among many others!)

Centre for Bioinformatics & Institute of Mathematical Sciences
& Centre for Integrative Systems Biology at Imperial College London

20th May 2010

1 / 28

Challenges in biological modelling

2 / 28


Biological systems
Complex models with many parameters
Not all parameters can be measured in vivo
At least some parameters, if not all, must be inferred from data

2 / 28


Biological systems

Data
Time course
Single cell ﬂuorescent microscopy, ﬂow cytometry, count data
Can be challenging to perform, not all species observed

2 / 28


Biological systems

Data
Time course
Single cell ﬂuorescent microscopy, ﬂow cytometry, count data
Can be challenging to perform, not all species observed

Stochasticity
Apparent that many biological processes are highly stochastic
Stochastic models make inference much harder
Often we cannot write down the likelihood function

2 / 28

Diﬀerent time course modelling techniques

3 / 28

Diﬀerent time course modelling techniques

Infer model parameters from time course data
Select between competing models of a process

3 / 28

Approximate Bayesian Computation

Model Data, X

θ2 X (t)

t

θ1

4 / 28


Model Data, X

θ2 X (t) Simulation, Xs (θ)

t

θ1

4 / 28


Model Data, X


t

d = ∆(Xs (θ), X )
θ1

4 / 28


Model Data, X


t

d = ∆(Xs (θ), X )
θ1
Reject θ if d >
Accept θ if d ≤

4 / 28

Approximate Bayesian Computation (ABC)

Bayesian Inference
Posterior ∝ Likelihood × prior
p(θ|X ) ∝ p(X |θ) p(θ)

Approximate inference methods
Sample from approximate posterior:
p(θ|∆(Xs (θ), X ) ≤ ).
where ∆(Xs , X ) is distance between simulation and data.
It can be shown, as → 0
p(θ|∆(Xs (θ), X ) ≤ ) → p(θ|X )

ABC ﬂavours
ABC rejection Pritchard et al. Mol. Biol. Evol. (1999)

ABC MCMC Marjoram et al. PNAS (2003)

ABC SMC Toni and Stumpf, Bioinformatics (2010)

5 / 28

ABC SMC

Prior, π(θ)

πT (θ|∆(Xs , X ) < T )

6 / 28

ABC SMC

Prior, π(θ) Deﬁne set of intermediate distributions, πt , t = 1, ...., T

1 > 2 > ...... > T

πt−1 (θ|∆(Xs , X ) < t−1 )

πt (θ|∆(Xs , X ) < t )

πT (θ|∆(Xs , X ) < T )

6 / 28

ABC SMC

Prior, π(θ) Deﬁne set of intermediate distributions, πt , t = 1, ...., T

1 > 2 > ...... > T

πt−1 (θ|∆(Xs , X ) < t−1 )

πt (θ|∆(Xs , X ) < t )

πT (θ|∆(Xs , X ) < T )

Sequential importance sampling:
Sample from proposal, ηt (θt ) and weight wt (θt ) = πt (θt )/ηt (θt )
R
ηt (θt ) = πt−1 (θt−1 )Kt (θt−1 , θt )dθt−1
Kt (θt−1 , θt ) is Markov perturbation kernel

6 / 28

ABC-SysBio
http://abc-sysbio.sourceforge.net/

7 / 28

What can ABC-SysBio do?
Inputs Outputs
Models in SBML format or python (Approx) posterior distribution over
code models and parameters
Time series data, distance schedule Diagnostic plots

Example 5 : Inference in repressilator model

8 / 28

Model selection

Example 1 : SIR model selection
Simulated data
Model posteriors

Model 1 ﬁt

9 / 28

Computation on graphical processing units (GPUs)

’GPUs are massively multithreaded many-core chips’

Parallel architectures
Multiple instruction multiple data (MIMD)
Multiple independent processors execute different instructions on different data
Clusters, GRID computing
Main drawback: cost

Single instruction multiple data (SIMD)
Multiple processors execute same instruction on different data
Supercomputers from 70s-80s based on this architecture
GPUs follow this paradigm and are cheap
Main drawback: Programming paradigm differs from CPU, not all applications can be ported

GPGPU: General purpose GPU
GPUs evolved from dedicated computer graphics to general-purpose parallel processors
Dedicated computation GPUs manufactured by NVIDIA, ATI

10 / 28

NVIDIA GPUs : Compute Uniﬁed Device Architecture (CUDA)
CUDA
Refers to both hardware and software architectures
Provides C APIs for programming NVIDIA GPUs

Tesla C1060 :
30 multi x 8 processors = 240 cores

http://www.nvidia.com/object/cuda_home_new.html
http://www.many- core.group.cam.ac.uk/course/
http://www.drdobbs.com/high- performance- computing/207200659
"ProgrammingMassivelyParallelProcessors",Kirk, Hwu
http://mathema.tician.de/software/pycuda

11 / 28

CUDA architecture

Software: Thread hierarchy

Hardware: Tesla C1060

12 / 28

Memory structure

speed scope
global 150x slower device, host
local 150x slower thread
texture faster (cached) device, host
constant faster (cached) device, host
shared fastest thread block
registers fastest thread

13 / 28

ABC on GPU

Suitability
ABC SMC time dominated by simulation of timeseries
Simulations can be done in parallel albeit with diﬀerent parameters

14 / 28

ABC on GPU

Suitability

CUDA performance issues
Shared vs global memory. Shared fast. Global slow
If using global memory is it coalesced?
Warp divergence. ’if else’ statements cause divergence
Register vs local memory. Reduce registers to increase occupancy. Local memory slow
Double vs ﬂoat

14 / 28

ABC on GPU

Suitability

CUDA performance issues
Shared vs global memory. Shared fast. Global slow
If using global memory is it coalesced?
Warp divergence. ’if else’ statements cause divergence
Register vs local memory. Reduce registers to increase occupancy. Local memory slow
Double vs ﬂoat

Unsuitability
’Good quality’ pseudo random number generation
ODE integration : stiﬀ solver requires double precision, adaptive
MJP simulations : Gillespie is sequential, threads are independent
SDE simulations : Euler-Maruyama, time step same for all threads, possibly most to gain

14 / 28

Occupancy vs latency
Warps (32 threads) are run sequentially on a processor
Hardware tries to hide ant memory latency (eg waiting for global memory call)
Performance is a tradeoﬀ between occupancy and latency

Example : SSA (Gillespie) simulation of immigration-death process

15 / 28

Bacterial two component systems (TCS)

TCSs are abundant in bacteria, plants and
fungi, but apparently absent from animals.
They regulate response to environmental
stimuli.

H ∼P

D ∼P

ATP

16 / 28


The ArcB-ArcA TCS
TCSs are abundant in bacteria, plants and The ArcB-ArcA system in Escherichia coli uses a
fungi, but apparently absent from animals. phospho-relay mechanism.
stimuli.

ATP

Transmitter
H ∼P Domain (H1) H ∼P

Receiver
D ∼P
Domain (D1)
D ∼P
Phosphotransfer
H ∼P
Domain (H2)
D ∼P
ATP

16 / 28


The ArcB-ArcA TCS
stimuli.

ATP

Transmitter P

Receiver
D ∼P
Domain (D1)
D ∼P
Phosphotransfer
H ∼P
Domain (H2)
D ∼P
ATP

16 / 28


The ArcB-ArcA TCS
stimuli.

ATP

Transmitter P

Receiver
D ∼P
Domain (D1)
D ∼P
Phosphotransfer
H ∼P
Domain (H2)
D ∼P
ATP

Orthodox system
Unorthodox system

16 / 28

Gillespie simulation of orthodox TCS : timing comparisons
MJP simulation using Gillespie, 5 reactions and 5 species
pyCUDA on GPU vs C++ on single CPU
With no compiler optimizations on either 30x speed up
Maybe 10x more realistic

Simulation timing simulations per second gpu / cpu

30
q single cpu q q QuadroFX1700
q QuadroFX1700 q QuadroFX3700
q QuadroFX3700 q TeslaC1060
2.5

q TeslaC1060 q

25
q q
q
q
2.0

q
q

20
q q
log10(time (s) )

q q
q
q
q q

q

15
1.5

q q
q q
q q
q
q q
q
q q q q
q
q
q q

10
q q
q qq q
q q q q q q q
q
q q qq q q
q
qq q q q
1.0

q q q q
q q q q
q q q
q q
q
5
0.5

q
0

1 2 3 4 5

log10(n simulations)

17 / 28

Mechanistic modelling of metapopulation dynamics

Metapopulations
Predator-prey systems are unstable and prone to extinction
One mechanism for promoting stability is spatial heterogeneity
A metapopulation is a set of linked sub populations or ’patches’
Limited dispersal and asynchronous dynamics between patches increases total persistence

18 / 28

Mechanistic modelling of metapopulation dynamics

Metapopulations
Predator-prey systems are unstable and prone to extinction
One mechanism for promoting stability is spatial heterogeneity
A metapopulation is a set of linked sub populations or ’patches’
Limited dispersal and asynchronous dynamics between patches increases total persistence

Inference of stochastic migration models from time series data
Important for understanding how to maximize metapopulation persistence
Obvious implications for conservation
Relatively unexplored Gillespie and Golightly (2009)

Demonstrate methods that can be used on ﬁeld data

18 / 28

The contenders

Beetles : Callosobruchus chinensis

Also known as ’bean weavil’
Live on beans or seeds
Adults lay eggs on beans
Larvae chew their way into the bean

Wasps : Anisopteromalus calandrae

Bean weavil’s natural parasitoid
Adults lay eggs on beetles
Larvae kill and eat beetle
Can be released into grain stores as pest control

19 / 28

Experimental setup

Laboratory microcosm
4 × 4 clear plastic boxes (73×73×30 mm)

20 / 28

Experimental setup

Establish bruchid beetle on black eyed peas

20 / 28

Experimental setup

Introduce wasp populations

20 / 28

Experimental setup

Introduce wasp populations
Control inter cell migration using gates
Limited dispersal
Unlimited dispersal

20 / 28

Metapopulation structure aﬀects persistence (Bonsall et al (2002) )

21 / 28

Time series data : ’unlimited dispersal’ x 4 replicates

20

25
20

20
20
15

15

15
15
X

X

X

X
10

10

10
10
5

5

5
5
0

0

0

0
0 20 40 60 80 0 20 40 60 80 0 20 40 60 80 0 20 40 60 80

t t t t

25
20

20
25

20
20
15

15
15
15
X

X

X

X
10

10
10
10
5

5
5
5
0

0

0

0
0 20 40 60 80 0 20 40 60 80 0 20 40 60 80 0 20 40 60 80

t t t t
25

25
25
30
20

20
25

20
20
15

15
15
X

X

X

X
15
10

10
10
10
5

5
5
5
0

0

0

0
0 20 40 60 80 0 20 40 60 80 0 20 40 60 80 0 20 40 60 80

t t t t

30
25
30

20

25
20
25

20
15
20

15
X

X

X

X

15
15

10

10

10
10

5

5

5
5
0

0

0

0 20 40 60 80 0 20 40 60 80 0 20 40 60 80 0 0 20 40 60 80

t t t t

22 / 28

Time series data : ’limited dispersal’ x 4 replicates

30

30
40

40
25

25
30

30
20

20
15

15
X

X

X

X
20

20
10

10
10

10
5

5
0

0

0

0
0 20 40 60 80 0 20 40 60 80 0 20 40 60 80 0 20 40 60 80

t t t t

30
40
20

25

30
20
30
15

20
15
X

X

X

X
20
10

10

10
10
5

5
0

0

0

0
0 20 40 60 80 0 20 40 60 80 0 20 40 60 80 0 20 40 60 80

t t t t

40
30

50
30

25

40

30
20

30
20

20
X

X

X

X
15

20
10
10

10
10
5
0

0

0

0
0 20 40 60 80 0 20 40 60 80 0 20 40 60 80 0 20 40 60 80

t t t t

40
40
30

50
25

40

30
30
20

30

20
20
X

X

X

X
15

20
10

10
10

10
5
0

0

0

0 20 40 60 80 0 20 40 60 80 0 20 40 60 80 0 0 20 40 60 80

t t t t

23 / 28

Stochastic model

Xi , Yi denote numbers of beetles, wasps in cell i

Beetle logistic growth
process rate
Xi → 2Xi b1 Xi
Xi → ∅ d1 Xi2

Wasp-Beetle interation: Lotka-Volterra
process rate
Xi + Yi → 2Yi pXi Yi
Yi → ∅ d2 Yi

Migration
process rate
Xi → Xj mXc Xi
Xi + Xi → Xi + Xj mXd Xi (Xi − 1)/2
Yi → Yj mYc Yi
Yi + Yi → Yi + Yj mYd Yi (Yi − 1)/2

24 / 28

Movement models

Q: ”Given a migration event occurs, where does the individual move?”

25 / 28

Movement models


Global movement: Xi → Xj where i = j

25 / 28

Movement models


Local movement: Xi → Xj where j ∈ nearest neighbours of i

25 / 28

Inference: Global vs local movement
1) Global movement
2) Local movement

26 / 28

Inference: Parameters of the global movement model

prey birth prey m (const)
m2

m2

m2

m2

m2

m2
0.4 0.8 1.2 0 1 2 3 4 5
prey death prey m (dens)
m1 m1 m1 m1 m1 m1
m2

m2

m2

m2
0.00 0.05 0.10 0.15 0 1 2 3 4 5
predation pred m (const)
m1 m1 m1 m1 m1 m1
m2

m2

m2

m2
0.10 0.15 0.20 0.25 0 1 2 3 4 5
pred death pred m (dens)
m1 m1 m1 m1 m1 m1
m2

m2

m2

m2
0.20 0.25 0.30 0.35 0.40 0 1 2 3 4 5

27 / 28

Thanks!

Acknowledgements
Juliane Liepe, Erika Cule
Michael Stumpf
Michael Bonsall
Tina Toni
Xia Sheng
Kamil Erguler
Paul Kirk
Justina Norkunaite
Suhail Islam

christopher.barnes@imperial.ac.uk
http://www3.imperial.ac.uk/theoreticalsystemsbiology
http://abc-sysbio.sourceforge.net/

28 / 28

ABC-SysBio – Approximate Bayesian Computation in Python with GPU support

More Related Content

What's hot (18)

Similar to ABC-SysBio – Approximate Bayesian Computation in Python with GPU support (20)

More from Biogeeks (6)

Recently uploaded (20)

ABC-SysBio – Approximate Bayesian Computation in Python with GPU support