Bayesian Workflow with
PyMC and ArviZ
Corrie Bartelheimer
Data Scientist at Europace AG
@corrieaar
First, the problem
First, the problem
First, the problem
First, the problem
First, the problem
First, the problem
First, the problem
First, the problem
Solution:
Hierarchical Bayesian Model
Solution: Hierarchical Bayesian Model
Solution: Hierarchical Bayesian Model
Solution: Hierarchical Bayesian Model
Solution: Hierarchical Bayesian Model
Solution: Hierarchical Bayesian Model
Getting this into Python
import pymc3 as pm
with pm.Model() as lin_model:
α = pm.Normal("α", 0, 100)
β = pm.Normal("β", 0, 100)
σ = pm.Exponential("σ", 1/100)
μ = α + β*d["area"]
y = pm.Normal("y", μ, σ,
observed=d["price"])
Getting this into Python: PyMC3
import pymc3 as pm
with pm.Model() as lin_model:
α = pm.Normal("α", 0, 100)
β = pm.Normal("β", 0, 100)
σ = pm.Exponential("σ", 1/100)
μ = α + β*d["area"]
y = pm.Normal("y", μ, σ,
observed=d["price"])
Getting this into Python: PyMC3
import pymc3 as pm
with pm.Model() as lin_model:
α = pm.Normal("α", 0, 100)
β = pm.Normal("β", 0, 100)
σ = pm.Exponential("σ", 1/100)
μ = α + β*d["area"]
y = pm.Normal("y", μ, σ,
observed=d["price"])
Getting this into Python: PyMC3
Getting this into Python: PyMC3
import pymc3 as pm
with pm.Model() as lin_model:
α = pm.Normal("α", 0, 100)
β = pm.Normal("β", 0, 100)
σ = pm.Exponential("σ", 1/100)
μ = α + β*d["area"]
y = pm.Normal("y", μ, σ,
observed=d["price"])
with pm.Model() as hier_model:
μ_α = pm.Normal("μ_α", 0, 100)
μ_β = ...
σ = pm.Exponential("σ", 1/100)
σ_α = σ_β = ...
α = pm.Normal("α", μ_α, σ_α,
shape=num_zip)
β = pm.Normal("β", μ_β, σ_β,
shape=num_zip)
μ = α[d["zip"]] + β[d["zip"]]*d["area"]
y = pm.Normal("y", μ, σ,
observed=d["price"])
Getting this into Python: PyMC3
with pm.Model() as hier_model:
μ_α = pm.Normal("μ_α", 0, 100)
μ_β = ...
σ = pm.Exponential("σ", 1/100)
σ_α = σ_β = ...
α = pm.Normal("α", μ_α, σ_α,
shape=num_zip)
β = pm.Normal("β", μ_β, σ_β,
shape=num_zip)
μ = α[d["zip"]] + β[d["zip"]]*d["area"]
y = pm.Normal("y", μ, σ,
observed=d["price"])
Getting this into Python: PyMC3
with pm.Model() as hier_model:
μ_α = pm.Normal("μ_α", 0, 100)
μ_β = ...
σ = pm.Exponential("σ", 1/100)
σ_α = σ_β = ...
α = pm.Normal("α", μ_α, σ_α,
shape=num_zip)
β = pm.Normal("β", μ_β, σ_β,
shape=num_zip)
μ = α[d["zip"]] + β[d["zip"]]*d["area"]
y = pm.Normal("y", μ, σ,
observed=d["price"])
Getting this into Python: PyMC3
with pm.Model() as hier_model:
μ_α = pm.Normal("μ_α", 0, 100)
μ_β = ...
σ = pm.Exponential("σ", 1/100)
σ_α = σ_β = ...
α = pm.Normal("α", μ_α, σ_α,
shape=num_zip)
β = pm.Normal("β", μ_β, σ_β,
shape=num_zip)
μ = α[d["zip"]] + β[d["zip"]]*d["area"]
y = pm.Normal("y", μ, σ,
observed=d["price"])
Getting this into Python: PyMC3
What about the priors?
What about the priors?
What about the priors?
with model:
prior = pm.sample_prior_predictive()
with model:
prior = pm.sample_prior_predictive()
What about the priors?
with model:
prior = pm.sample_prior_predictive()
What about the priors?
with model:
prior = pm.sample_prior_predictive()
What about the priors?
What about the priors?
What about the priors?
What about the priors?
What about the priors?
What about the priors?
with pm.Model() as hier_model:
μ_α = pm.Normal("μ_α", 0, 20)
μ_β = pm.Normal("μ_β", 0, 5)
σ = pm.Exponential("σ", 1/5)
σ_α = σ_β = ...
α = pm.Normal("α", μ_α, σ_α,
shape=num_zip)
β = pm.Normal("β", μ_β, σ_β,
shape=num_zip)
μ = α[d["zip"]] + β[d["zip"]]*d["area"]
y = pm.Normal("y", μ, σ,
observed=d["price"])
trace = pm.sample()
What about the priors?
Did it converge?
Did it converge?
import arviz as az
az.plot_trace(trace)
Did it converge?
Did it converge?
Some Bad Examples
Did it converge?
Some Bad Examples
Did it converge?
Some Bad Examples
Did it converge?
Some Bad Examples
Did it converge?
az.summary(trace)
Did it converge?
az.summary(trace)
Did it converge?
az.summary(trace)
Did it converge?
az.summary(trace)
Did it converge?
Rhat statistic smaller
1.05?
Effective sample size / iterations
greater 10%?
Monte Carlo se / posterior sd
smaller 10%?
How good does my model fit the
data?
How good does my model fit the data?
with hier_model:
posterior_predictive = pm.sample_posterior_predictive(trace)
How good does my model fit the data?
How good does my model fit the data?
How good does my model fit the data?
Results, please!
Results, please!
Results, please!
Results, please!
Results, please!
What’s next?
What’s next?
● Iterate!
● More predictors!
○ Year of construction
○ House type
○ ...
● More hierarchies!
● Add group predictors!
○ Percentage of green areas
○ Economical indices
● Try different likelihoods
● Probably save more money...
Further resources Richard McElreath: Statistical Rethinking
- Port to PyMC3
Prior Recommendation by Stan Team
Michael Betancourts Case Studies
BerlinBayesians
Icons by icons8
Thanks!
@corrieaar
corriebar
Code and Notebooks
www.samples-of-thoughts.com
Icons by icons8

Bayesian workflow with PyMC3 and ArviZ

Editor's Notes

  • #62 Meetup Icons by icons8
  • #63 Photo by Crawford Jolly on Unsplash Twitter and Github Icons by icons8