Skip to content

Add save_trace and load_trace #2975

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 18, 2018
Merged

Conversation

ColCarroll
Copy link
Member

This provides functions to save and load traces, avoiding pickle. My main use would be saving traces while running a large notebook, or distributing the traces with code containing the models used to produce them.

Pros:

  • it should be compatible between python versions (so long as these functions retain compatibility),
  • it avoids security concerns (all files are json or .npy)
  • appears to be smaller (though missing some stuff) -- the test model was 400kb, compared to 900kb pickled
  • answers a question that comes up reasonably often in issues about saving traces

Cons:

  • Requires model context to reload (in particular, the model is stored in the pickle, but not this file)
  • Does not contain any part of the trace.report yet (though that could be added without breaking compatibility)
  • Requires maintenance

@ColCarroll
Copy link
Member Author

Also, here is an example of it in use. This creates a local directory called .pymc.trace by default to save the trace to.

image

@junpenglao
Copy link
Member

THANK YOU! This will solve so much pickling issues!

@springcoil
Copy link
Contributor

Just so I understand what's the pickle issues? Incompatible between python versions and security concerns?

@springcoil
Copy link
Contributor

This looks good to me and ready to merge. Any objections @ColCarroll

Copy link
Contributor

@springcoil springcoil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@twiecki twiecki merged commit 850a2a7 into pymc-devs:master May 18, 2018
@twiecki
Copy link
Member

twiecki commented May 18, 2018

Great stuff!

@twiecki
Copy link
Member

twiecki commented May 18, 2018

Oh, I forgot, we should add this to the release-notes, and also add some example docs somewhere.

@springcoil
Copy link
Contributor

springcoil commented May 18, 2018 via email

@ColCarroll
Copy link
Member Author

I will add release notes, and I realized that there's an edge case I didn't cover that requires deleting the directory before writing to it (if you save a model with variables x, y,, and then save a new model with z, w, loading will give you a model with x, y, z, w).

@sudiptamazumda
Copy link

sudiptamazumda commented Aug 25, 2018

Does anyone have a sample code to predict with this trace load functionality? I have a Gaussian model :
y= f(x) + e...
f(x) ~ Gaussian(a, b),
e ~ N(0, sigma^2)
Trace saves the posterior of a,b and sigma...

My objective is to predict f(x) for a new x in a new python session without running the model training piece...

@ColCarroll
Copy link
Member Author

I might need more detail for what you're trying to do. Here's an example, though:

First, generate a random model:

import os

import numpy as np
import matplotlib.pyplot as plt
import theano
import theano.tensor as tt

dims = 2
N = 100

true_weights = np.random.normal(size=(dims,))

data = np.random.normal(size=(N, dims))
noise = np.random.normal(0, 0.5, size=N)

y = np.dot(data, true_weights) + noise
print(true_weights)

Now do a cached prediction -- running this multiple times will work, even changing the predict_data.

cache_file = 'my_trace.trace'


s_data = theano.shared(data)

with pm.Model() as model:
    weights = pm.Normal('weights', mu=0, sd=1, shape=dims)
    y_obs = pm.Normal('y_obs', mu=tt.dot(s_data, weights), sd=0.5, observed=y, shape=s_data.shape[0].eval())

if not os.path.exists(cache_file):
    with model:
        trace = pm.sample()

    pm.save_trace(trace, directory=cache_file)
else:
    trace = pm.load_trace(cache_file, model=model)

    
predict_data = np.array([
    [0, 1],
    [1, 0],
    [1, 1],
    [2, 2],
])

s_data.set_value(predict_data)

with model:
    ppc = pm.sample_ppc(trace)

print(trace['weights'].mean(axis=0))  # pretty close to true weights
print(ppc['y_obs'].mean(axis=0))  # should be reasonable

@sudiptamazumda
Copy link

sudiptamazumda commented Aug 25, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants