0% found this document useful (0 votes)
18 views

Cosmological Parameter Estimation with Sequential Linear Simulation-based Inference

Uploaded by

jipoke6111
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Cosmological Parameter Estimation with Sequential Linear Simulation-based Inference

Uploaded by

jipoke6111
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Cosmological Parameter Estimation with

Sequential Linear Simulation-based Inference


N. G. Mediato -Diaz
Cavendish Laboratory, University of Cambridge,
JJ Thomson Avenue, Cambridge, CB3 0HE

W. J. Handley
Kavli Institute for Cosmology, University of Cambridge,
Madingley Road, Cambridge, CB3 0HA

We develop the framework of Linear Simulation-based Inference (LSBI), an application of


simulation-based inference where the likelihood is approximated by a Gaussian linear function of its
arXiv:2501.03921v1 [astro-ph.IM] 7 Jan 2025

parameters. We obtain analytical expressions for the posterior distributions of hyper-parameters of


the linear likelihood in terms of samples drawn from a simulator, for both uniform and conjugate
priors. This method is applied sequentially to several toy-models and tested on emulated datasets
for the Cosmic Microwave Background temperature power spectrum. We find that convergence is
achieved after four or five rounds of O(104 ) simulations, which is competitive with state-of-the-art
neural density estimation methods. Therefore, we demonstrate that it is possible to obtain sig-
nificant information gain and generate posteriors that agree with the underlying parameters while
maintaining explainability and intellectual oversight.

Key words: cosmology, Bayesian data analysis, simulation-based inference, CMB power spectrum

I. INTRODUCTION However, the use of neural networks presents some


disadvantages, the most significant of which is their lack
In many astrophysical applications, statistical models of explainability. This means that most neural networks
can be simulated forward, but their likelihood functions are treated as a ‘black box’, where the decisions taken
are too complex to calculate directly. Simulation-based by the artificial intelligence in arriving at the optimized
inference (SBI) [1] provides an alternative way to perform solution are not known to researchers, which can hin-
Bayesian analysis on these models, relying solely on for- der intellectual oversight [18]. This problem affects the
ward simulations rather than likelihood estimates. How- algorithms discussed above, as NRE constitutes an unsu-
ever, modern cosmological models are typically expen- pervised learning task, where the artificial intelligence is
sive to simulate and datasets are often high-dimensional, given unlabeled input data and allowed to discover pat-
so traditional methods like the Approximate Bayesian terns in its distribution without guidance. This combines
Computation (ABC) [2], which scale poorly with dimen- with the problem of over-fitting, where the neural net-
sionality, are no longer suitable for parameter estimation. work may attempt to maximize the likelihood without
Improvements to ABC, such as the inclusion of Markov- regard for the physical sensibility of the output. Cur-
chain Monte Carlo [3] and Sequential Monte Carlo [4] rent algorithms for simulation-based inference can pro-
methods, can also have limited effectiveness for large duce overconfident posterior approximations [19], making
datasets. them possibly unreliable for scientific use.
In the past decade, machine learning techniques have The question naturally arises as to whether it is pos-
revolutionized the field of SBI [1], enabling a reduction in sible to achieve fast and credible simulation-based infer-
the number of simulator calls required to achieve high- ence without dependence on neural networks. Such a
fidelity inference, and providing an efficient framework methodology might allow researchers to acquire control
to analyze complex data and expensive simulators. In over the inference process and better assess whether their
particular, we highlight Density Estimation Likelihood- approximations produce overconfident credible regions.
free Inference (DELFI) [5, 6] and Truncated Marginal Moreover, disentangling ML from SBI can be pedagogi-
Neural Ratio Estimation (TMNRE) [7]. Given a set of cally useful in explaining SBI to new audiences by sep-
simulated parameter-data pairs, these algorithms learn arating its general principles from the details of neural
a parametric model for the joint and the likelihood-to- networks.
evidence ratio respectively, via neural density estimation This paper takes a first step in this direction. In
(NRE) techniques [8, 9]. Furthermore, recent applica- Section II we develop the theoretical framework of Lin-
tions of SBI to high-dimensional cosmological and astro- ear Simulation-based Inference (LSBI), an application
physical datasets [10–17] demonstrate that these algo- of likelihood-free inference where the model is approxi-
rithms are rapidly establishing themselves as a standard mated by a linear function of its parameters and the noise
machine learning technique in the field. is assumed to be Gaussian with zero mean. In Section
2

III, several toy models are investigated using LSBI, and one may obtain a numerical estimation of the likelihood
in Section III B the method is applied to parameter es- through the simulator by learning the probability density
timation for the Cosmic Microwave Background (CMB) (i)
of the pairs {θ(i) , Dsim } for a sufficient number of simu-
power spectrum. lator runs. Here, we follow this strategy, except that
the assumption of linearity in Eq. 2 avoids the need for
machine-learning tools. This linear analysis applied to
II. THEORY SBI is not available in the literature, although some re-
cent works are similar, such as SELFI [20] or MOPED
A. The Linear Approximation [21].
We first draw k parameter vectors {θ(i) }; since we
Let us consider a d-dimensional dataset D described are estimating the likelihood, we may draw these from
by a model M with n parameters θ = {θi }. We assume an arbitrary distribution that does not need to be the
a Gaussian prior θ|M ∼ N (µ, Σ) with known mean and model prior π(θ). Then, for each θ(i) the simulator is
covariance. The likelihood LD (θ) for an arbitrary model run to produce the corresponding data vector D(i) . We
is generally intractable, but in this paper, we approx- define the first- and second-order statistics
imate it as a homoscedastic Gaussian (thus neglecting
k k
any parameter-dependence of the covariance matrix), 1 X (i) 1 X (i)
θ= θ , D= D , (8)
D|θ, M ∼ N (M(θ), C). (1) k i=1 k i=1

Furthermore, we approximate the model linearly about and


a fiducial point θ0 , such that M(θ) ≈ m + M θ, where k
1 X (i)
M ≡ ∇θ M θ0 and m ≡ M(θ0 ) − M θ0 . Under these Θ= (θ − θ)(θ(i) − θ)T , (9)
k i=1
assumptions, the resulting likelihood is
k
D|θ, M ∼ N (m + M θ, C) (2) 1 X (i)
∆= (D − D)(D(i) − D)T , (10)
k i=1
If we knew the likelihood hyper-parameters m, M , and
k
C, the posterior distribution could be found and would 1 X (i)
also be Gaussian Ψ= (D − D)(θ(i) − θ)T , (11)
k i=1
θ|D, M ∼ N (µP , ΣP ) (3) Then, by expanding the joint likelihood,
where p({D(i) }|{θ(i) }, m, M, C)
(12)
Σ−1 T −1
P ≡M C M + Σ−1 , (4) Qk
≡ i=1 p(D(i) |θ(i) , m, M, C),
µP ≡ µ + ΣP M T C −1 (D − m − M µ). (5) From this result, and a choice of broad uniform priors for
m, M and C, we find the distributions
Similarly, the evidence would given by
m|M, C, S ∼ N D − M θ, Ck

(13)
D|M ∼ N (m + M µ, C + M ΣM T ). (6)
M |C, S ∼ MN ΨΘ−1 , Ck , Θ−1

Nevertheless, m, M , and C are unknown, so we must (14)
obtain their distributions before computing the posterior.
C|S ∼ W −1 k(∆ − ΨΘ−1 ΨT ), ν

(15)
where S = {(θ(i) , D(i) )} are the simulated parameter-
data pairs, and ν = k − d − n − 2. MN (M, U, V ) stands
B. Linear Simulation-based Inference
for the matrix normal distribution with mean M and
covariance U ⊗ V , and W −1 (Λ, ν) stands for the inverse
The goal of SBI is to obtain an approximate form Wishart distribution with scale matrix Λ and ν degrees of
of the likelihood through the simulator itself. For the freedom [22]. Note that ν > d − 1, so there is a minimum
purposes of this paper, the simulator SM is understood to number of simulations kmin = n + 2d + 2 required to yield
be a stochastic function that takes the model parameters well-defined distributions. Appendix A gives the details
as input and yields a set of noisy simulated data of the equivalent result for a choice of conjugate priors
SM for m, M , and C,
θ −−−−−−−−→ Dsim . (7)
m|M, C, {θ(i) } ∼ N (0, C), (16)
(i)
Hence, upon fixing a set of parameter samples {θ }, we
can run the simulator on each to obtain a set of sim- M |C, {θ(i) } ∼ MN (0, C, Θ−1 ), (17)
(i)
ulated data samples {Dsim }. Crucially, these are dis-
(i)
tributed as the true likelihood Dsim ∼ L( · |θ). Thus, C|{θ(i) } ∼ W −1 (C0 , ν0 ). (18)
3

The posterior can finally be estimated III. RESULTS


Z
PD (θ) = PD (θ|m, M, C) × p(m, M, C) dm dM dC In the development of LSBI, we have made two foun-
D E dational assumptions about the nature of the underlying
≈ PD (θ|m, M, C) . (19) likelihood and model:
m,M,C

The average can be computed by drawing N exact sam- • the model M(θ) is approximated as a linear func-
ples (m(I) , M (I) , C (I) ) from Eqs. 13 – 15, where N is tion of the parameters.
large enough to guarantee convergence. For N > 1, the • the likelihood LD (θ) can be accurately approxi-
resulting posterior is a Gaussian mixture of N compo- mated by a homoscedastic Gaussian distribution
nents. Since each sample is independent, this calculation (Eq. 1),
can be made significantly faster by parallelization, allow-
In this section, we test the resilience of LSBI against devi-
ing a large N without much effect on the computational
ations from these assumptions by applying the procedure
efficiency of the calculation.
to several toy models, as well as the CMB temperature
power spectrum. These toy models were implemented
C. Sequential LSBI with the help of the Python package lsbi, currently un-
der development by W.J. Handley, and tools from the
scipy library. To simulate the cosmological power spec-
As discussed in Section II A, the linear approximation trum data, we use the cmb tt neural emulator from Cos-
is only applicable within a localized region surrounding moPowerJAX [23, 24].
the fiducial point θ0 . Given that the prior distribution is
typically broad, this condition is not often satisfied. Con- In addition to LSBI, the parameter posteriors are
sequently, when dealing with non-linear models, LSBI also calculated via nested sampling with dynesty [25–27]
will truncate the model to its linear expansion, thereby for comparison. The plots were made with the software
computing the posterior distribution corresponding to an getdist [28]. Unless otherwise stated, the calculations
‘effective’ linear likelihood function. This truncation re- in this section use broad uniform priors.
sults in a less constraining model, leading to a broader
‘effective’ posterior distribution relative to the ‘true’ pos-
terior. A. Toy Models
However, since the simulation parameters {θ(i) } can
be drawn from any non-singular distribution, indepen- For simplicity, the prior on the parameters is a stan-
dent of the prior, LSBI can be performed on a set of dard normal θ ∼ N (0, In ), and the samples for the sim-
samples generated by simulations that are proximal to ulations are taken directly from this prior. To generate
the observed data, i.e., a narrow distribution with θ0 the model, we draw the entries of m from a standard
near the true parameter values. A natural choice for normal, whereas the entries of M have mean 0 and stan-
this distribution is the ‘effective’ LSBI posterior. This dard deviation 1/d. The covariance C is drawn from
leads to the concept of Sequential LSBI, wherein LSBI is W −1 (σ 2 Id , d + 2), where σ = 0.5. The number of sam-
iteratively applied to simulation samples drawn from the ples N taken from posterior distributions of m, M , and C
posterior distribution of the previous iteration, with the depends on the dataset’s dimensionality d; generally, we
initial iteration corresponding to the prior distribution. choose the highest value allowing for manageable compu-
tation time.
It is worth noting that this method suffers from two
disadvantages compared to plain LSBI. Firstly, the al- Our starting point is a 50-dimensional dataset with a
gorithm is no longer amortized, as subsequent iterations quadratic model of n = 4 parameters,
depend on the choice of Dobs . Secondly, as the sampling
distribution becomes narrower, Θ becomes smaller, re- M(θ) = m + M θ + θT Q θ (21)
sulting in a broader distribution for M . Thus, the value
of N may need to be increased accordingly. and Gaussian likelihood, where m and M are as above,
and Q is a n × d × n matrix with entries drawn form
The evidence may be evaluated similarly. Thus, if N (0, 1/d). The noise is now Gaussian with covariance C.
the procedure is repeated for a different model M′ , the At each round, we perform LSBI with k = 2500 simula-
Bayes’ ratio between the two models may be calculated, tions to obtain an estimate for the posterior distribution,
⟨p(Dobs |M)⟩m,M,C where the sampling distribution of the parameter sets
B= (20) {θ(i) } is the posterior of the previous round. The poste-
⟨p(Dobs |M′ )⟩m′ ,M ′ ,C ′
rior is calculated for a set of ‘observed’ data, which are
Nevertheless, this calculation is inefficient for large determined by applying the model and noise to a set of
datasets, so a data compression algorithm is proposed ‘real’ parameters θ∗ . We also calculate the KL divergence
in Appendix B, although it is not investigated further in (DKL ) between the prior and each posterior, which will
this paper. help us determine the number of rounds of LSBI required
4

to obtain convergent results. The posterior distribution sequential LSBI. Furthermore, the final DKL between the
is also computed using nested sampling for comparison. prior and posterior distributions approaches the values
obtained using nested sampling / rejection ABC meth-
Figure 1 illustrates the outcomes of these simula-
ods. Nevertheless, the distributions show some discrep-
tions. The first iteration of LSBI indeed produces an
excessively broad posterior, which subsequent iterations ancy, illustrating the fact that non-Gaussian noise can
rapidly improve upon. Figure 2 confirms that after four affect the accuracy of LSBI. The results are less satisfac-
iterations, the Kullback-Leibler divergence between the tory for Laplacian noise, as the DKL does not converge to
prior and the LSBI posterior converges to that calculated a value within the error bars of the nested-sampling esti-
via nested sampling, with no appreciable discrepancy as mation. On the other hand, the lower value of the DKL
expected for Gaussian noise. estimated via rejection ABC for uniform noise compared
to the LSBI values after round 3 is probably due to the
In addition, we test the performance of LSBI for non- inaccuracy of ABC as a posterior estimation method.
Gaussian noise shapes. The cases considered are uniform,
Student-t with d + 2 degrees of freedom, and asymmet- We note that the distributions considered here have
ric Laplacian noise. The model is also given by Eq. 21, well-defined first and second moments, so they can be
approximated by a Gaussian. Although not considered
and the uncertainty in these distributions is defined di-
in this paper, there exist distributions with an undefined
rectly in terms of the model covariance C. The posterior
covariance, such as the Cauchy distribution (Student-t
distribution is also computed using nested sampling for
with one degree of freedom). In these cases, it has been
the Laplacian and Student-t cases, whereas that of the
checked that LSBI fails to predict a posterior, instead
uniform likelihood is obtained through an Approximate
returning the original prior.
Bayesian Computation (rejection ABC).
The one- and two-dimensional LSBI posteriors for the
models with non-Gaussian error are shown in Figures 3
and 4. The results demonstrate that the posteriors con- B. The CMB Power Spectrum
verge to a stable solution after approximately 4 rounds of
In this section, we test the performance of LSBI on a
pseudo-realistic cosmological dataset. In the absence of
Prior
generative Planck likelihoods, we produce the simulations
Round 1 through CosmoPowerJAX [23, 24], a library of machine
Round 2 learning-based inference tools and emulators. In particu-
Round 3
Round 4 lar, we use the cmb tt probe, which takes as input the six
Round 5 ΛCDM parameters: the Hubble constant H0 , the densi-
Nested Sampling
ties of baryonic matter Ωb h2 and cold dark matter Ωc h2 ,
2 where h = H0 /100km s−1 Mpc−1 , the re-ionization op-
tical depth τ , and the amplitude As and slope ns of the
0
1

initial power spectrum of density perturbations. The out-


2

0
2

2
8

7
2
KL Divergence

0
6
3

2
5
2 0 2 2 0 2 2 0 2 2 0 2
0 1 2 3 4

3
FIG. 1. Prior and posterior distributions on the parame-
ters for a 50-dimensional dataset described by a non-linear 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
Number of Rounds
4-parameter model with Gaussian error. Each round corre-
sponds to the output of LSBI after k = 2500 simulations,
where the sampling distribution of the parameter sets {θ(i) } FIG. 2. DKL between the prior and posterior for each round
is the posterior of the previous round. The result of nested of Sequential LSBI for the data displayed in Figure 1 The
sampling is also shown. The dashed lines indicate the values black line corresponds to the value computed via nested sam-
of the ‘real’ parameters θ∗ . pling; the estimated error is also shown as a gray band.
5

Prior Prior Prior


Round 1 Round 1 Round 1
Round 2 Round 2 Round 2
Round 3 Round 3 Round 3
Round 4 Round 4 Round 4
Round 5 Round 5 Round 5
Rejection ABC Nested Sampling Nested Sampling

2 2 2

0 0 0
1

1
2 2 2

2 2 2

0 0 0
2

2
2 2 2

2 2 2

0 0 0
3

3
2 2 2

2 0 2 2 0 2 2 0 2 2 0 2 2 0 2 2 0 2 2 0 2 2 0 2 2 0 2 2 0 2 2 0 2 2 0 2
0 1 2 3 0 1 2 3 0 1 2 3

FIG. 3. Prior and posterior distributions on the parameters for a 50-dimensional dataset described by a linear 4-parameter
model and non-Gaussian error with σ ≈ 0.5. The posteriors are computed from k = 106, 500, 2500, and 10000 samples
drawn from the simulated likelihood. The dashed lines indicate the values of the underlying parameters θ∗ ; (left) uniform
noise; (centre) Student-t noise; (right) asymmetric Laplacian noise. The posterior distribution is also computed using nested
sampling for the Laplacian and Student-t cases, whereas that of the uniform likelihood is obtained through an Approximate
Bayesian Computation (rejection ABC).

7
7
7
6 6
6
KL Divergence

KL Divergence

KL Divergence
5 5
5
4
4
4
3
3
3
2
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
Number of Rounds Number of Rounds Number of Rounds

FIG. 4. Kullback-Leibler divergence between the prior and posterior on the parameters as a function of the number of
simulations; (left) uniform noise; (center) Student-t noise; (right) asymmetric Laplacian noise. The black line corresponds to
the value computed via nested sampling / rejection ABC; the estimated error is also shown as a gray band.

put is the predicted CMB temperature power spectrum The output of this calculation is shown in Figs. 6 and
7. The first figure displays the prior and rounds 1 and
ℓ 2, while the second shows rounds 3 to 5; in both cases
1 X
Cℓ = |aℓ,m |2 , 2 ⩽ ℓ ⩽ 2058 (22) the nested sampling result is shown. It can be noted by
2ℓ + 1
m=−ℓ eye that the posterior coincides well with the result of
nested sampling after four to five rounds of LSBI. This
where aℓ,m are the coefficients of the harmonic expansion suggests that, although the CMB power spectrum is not
of the CMB signal. To this data, we add the standard well approximated by a linear model at first instance,
scaled χ2 noise sequential LSBI succeeds at yielding a narrow sampling
2ℓ + 1 distribution about θ∗ , thus iteratively approximating the
Cℓ ∼ χ2 (2ℓ + 1). (23) correct posterior.
C ℓ + Nℓ
Figure 5 displays the evolution of the KL divergence
We apply several rounds of Sequential LSBI, each up to 8 rounds of LSBI. This result provides further evi-
drawing k = 104 simulations from the emulator, but dence that sequential LSBI converges after O(1) rounds,
keep N = 100 since a larger number is computationally thus keeping the total number of simulations within
unmanageable without parallelization. The parameter O(104 ). Nevertheless, we note that the DKL is slightly
samples are drawn from the prior displayed in Eqs. C1 overestimated, suggesting that the LSBI posterior over-
and C2 and, as before, the observed data is generated by estimates the true distribution to a small extent. Re-
running the simulator on a known set of parameters θ∗ . producing this calculation with smaller values of N , we
The posterior is also obtained by nested sampling with have noticed that the overconfidence increases as N is de-
dynesty. creased. Therefore, as discussed in Sections II B and II C,
6

Further efforts should be directed towards testing


LSBI on more realistic examples, such as the CMB with
27.5 foregrounds, Baryon Acoustic Oscillations (BAO), or su-
25.0 pernovae surveys. In addition, extending this analysis to
22.5
Gaussian-mixture models may be helpful in better ap-
proximating non-Gaussian and multimodal likelihoods.
20.0
KL Divergence

17.5
15.0 V. MATERIALS

12.5
The source code for the data and plots shown in this
10.0 paper can be found at https://github.com/ngm29/astro-
7.5 lsbi/tree/main.
1 2 3 4 5 6 7 8
Number of Rounds

FIG. 5. DKL between the prior and posterior for each round
of Sequential LSBI for the simulated CMB data, displayed
in Figures 6 and 7. The black line corresponds to the value
computed via nested sampling. The estimated error nested
sampling is also shown as a gray band.

the choice of N must be large enough to guarantee the


convergence of the integral in Eq. 19. In general we rec-
ommend a value at least of order 103 , but as evidenced by
Figure 5, a smaller value may still yield accurate results.

IV. CONCLUSION

In this paper, we have developed the theoretical


framework of Linear Simulation-based Inference (LSBI),
an application of likelihood-free inference where the
model is approximated by a linear function of its param-
eters and the noise is assumed to be Gaussian with zero
mean. Under these circumstances, it is possible to ob-
tain analytical expressions for the posterior distributions
of hyper-parameters of the linear likelihood, given a set of
samples S = (θ(i) , D(i) ), where D(i) are obtained by run-
ning simulations on the θ(i) . These parameter samples
can be drawn from a distribution other than the prior,
so we can exploit this to sequentially approximate the
posterior in the vicinity of the observed data.
The analysis of the toy models in Section III illus-
trates the extent of the resilience of LSBI to deviations
from its assumptions. When the error is non-Gaussian,
LSBI can still yield accurate estimates, although not uni-
versally so, and sequential LSBI provides a way to effec-
tively treat non-linear models. Furthermore, its appli-
cation to the pseudo-realistic model for the CMB power
spectrum demonstrates that it is possible to obtain signif-
icant information gain and generate posteriors that agree
with the underlying parameters while maintaining ex-
plainability and intellectual oversight. We also find that
convergence is achieved after O(104 ) simulations, com-
petitive with state-of-the-art neural density estimation
methods [5–7].
7

Prior
Round 1
Round 2
Ground Truth

0.5
ch 2

0.0

0.5

1.2
1.0
0.8
h

0.6
0.4
0.2

0.3
0.2
0.1
0.0
0.1
0.2

1.5
ln(1010As)

1.0

0.5

10

5
ns

5
0.00 0.02 0.04 0.5 0.0 0.5 0.25 0.50 0.75 1.00 1.25 0.1 0.0 0.1 0.2 0.3 0.5 1.0 1.5 5 0 5 10
bh 2 ch 2 h ln(1010As) ns

FIG. 6. The plot displays the two-dimensional posterior distributions given by the first two rounds of sequential LSBI, where
each round corresponds to the output of LSBI after k = 104 simulations. The prior distribution and the result of nested
sampling on the dataset (labeled ”Ground Truth”) are also shown. The dashed lines indicate the values of the ‘real’ parameters
θ∗ .
8

Round 3
Round 4
Round 5
Ground Truth

0.205
ch 2

0.200

0.195

0.190

0.76

0.74
h

0.72

0.12

0.10

0.08

0.06

0.04

0.955
ln(1010As)

0.950
0.945
0.940
0.935

3.15

3.10
ns

3.05

0.0225 0.0230 0.192 0.200 0.208 0.72 0.74 0.76 0.06 0.08 0.10 0.12 0.94 0.95 3.05 3.10 3.15
bh 2 ch 2 h ln(1010As) ns

FIG. 7. The plot displays the two-dimensional posterior distributions given by rounds three through five of sequential LSBI,
together with the result of nested sampling on the dataset (labeled ”Ground Truth”). The dashed lines indicate the values of
the ‘real’ parameters θ∗ .
9

Appendix A: LSBI with Conjugate Priors

Consider a simulator SM which emulates a model M that can be approximated linearly. We ignore the values of
the hyper-parameters m, M , and C, but we may infer them by performing k independent simulations S = {(Di , θi )},
where the {θi } may be drawn from an arbitrary distribution. Defining the sample covariances and means for the data
and parameters,
X X
Θ = k1 (θi − θ)(θi − θ)T , ∆ = k1 (Di − D)(Di − D)T , (A1)
i i
X X X
Ψ= 1
k (Di − D)(θi − θ)T , D= 1
k Di , θ= 1
k θi , (A2)
i i i

then, after some algebra, the joint likelihood for the simulations can be written as:
P
log p({Di }|{θi }, m, M, C) ≡ i log p(Di |θi , m, M, C)
= − k2 log |2πC| − 21 i (Di − m − M θi )T C −1 (Di − m − M θi )
P
T
= − k2 log |2πC| − 21 m − (D − M θ) (C/k)−1 m − (D − M θ)
  

− k2 tr Θ(M − ΨΘ−1 )T C −1 (M − ΨΘ−1 )




− k2 tr (∆ − ΨΘ−1 ΨT )C −1 .


Using the previous result, we can infer via Bayes’ theorem


p ({Di }|{θi }, m, M, C)  
p(m|M, C, S) = · p m|M, C, {θ(i) } .
p ({Di }|{θi }, M, C)

We choose the conjugate prior m|M, C, {θ(i) } ∼ N (0, C), giving the posterior
 iT i
1 1h k −1
h
k
p(m|M, C, S) = q exp − m − k+1 (D − M θ) [C/(k + 1)] m− k+1 (D − M θ) . (A3)
C
(2π)d | k+1 | 2

Hence, we find a new distribution, with m integrated out (the running evidence),

log p({Di }|{θi }, M, C) = − k2 log |2πC| − d


2 log(k + 1)
− 21 k+1
k
(D − M θ)T C −1 (D − M θ)
− k2 tr Θ(M − ΨΘ−1 )T C −1 (M − ΨΘ−1 )


− k2 tr (∆ − ΨΘ−1 ΨT )C −1


= − k2 log |2πC| − d2 log(k + 1)


 
− k e −1
tr Θ∗ (M − ΨΘ T −1
∗ ) C
e −1
(M − ΨΘ∗ )
2
 
− k
tr (∆ e −1
e − ΨΘ ∗ Ψ
e T
)C −1
,
2

where
1 T 1 T 1 T
e ≡Θ+
Θ e ≡Ψ+ e ≡∆+
k+1 θ θ , Ψ k+1 D θ , ∆ k+1 D D . (A4)

A second application of Bayes’ theorem gives


p ({Di }|{θi }, M, C)  
p(M |C, S) = · p M |C, {θ(i) } .
p ({Di }|{θi }, C)

We choose the conjugate prior M |C, {θ(i) } ∼ N (0, C, Θ−1 ), giving the posterior
( )!
1 1 h iT  C −1 h i
p(M |C, S) = q exp − tr Θ∗ M − ΨΘ e −1 e −1
M − ΨΘ , (A5)
∗ ∗
(2π)nd | C |n |Θ−1 |d 2 k
k ∗
10

where
T
e + 1Θ =
Θ∗ ≡ Θ k+1 1
k k Θ + k+1 θ θ . (A6)

The running evidence after marginalization over M is then,


−1
log p({Di }|{θi }, C) = − k2 log |2πC| − d2 log(k + 1) − dn d
2 log k + 2 log |Θ∗ Θ|
n h i  o
− k2 tr ∆ e −Ψ e Θ−1 + Θ e −1 − Θ−1 Ψ e T C −1 .

A third and final application of Bayes’ theorem

p({Di }|{θi }, C)  
p(C|S) = · p C, {θ(i) }
p({Di }|{θi })

with conjugate prior C|{θ(i) } ∼ W −1 (C0 , ν0 ), gives the posterior

|C|−(ν+d+1)/2
 o
1 nh  e e h −1 e −1 i
−1 e T
 i
−1
p(C|S) = exp − tr k ∆ − Ψ Θ∗ + Θ − Θ Ψ + C0 C (A7)
N (S) 2

with ν = ν0 + k and ν0 > d, and


 h i  −ν/2
N (S) = 2νd/2 Γd [ ν2 ] × k ∆ e Θ−1
e −Ψ e −1 − Θ−1 Ψ
∗ +Θ
e T + C0

. The running evidence is then

p({Di }|{θi }, C)
log p({Di }|{θi }) ≡ · p(C)
p(C|S)
ν0
= − kd d
log(k + 1) − dn ν

2 log π − 2 2 log k + log Γd [ 2 ]/Γd [ 2 ]
n  h i  ν o
e −1 Θ| − 1 log k ∆ − Ψ Θ−1
+ d2 log |Θ −1 −1 T

2 ∗ + Θe − Θ Ψ + C0 |C0 |ν0 .

Finally, we can compute the total evidence for the simulations, where we assume that the parameter samples have
been drawn from a Gaussian, θ(i) ∼ N (θ, Θ)

log p(S) = log p({Di }|{θi }) + log p({θi })


k
= log p({Di }|{θi }) − 2 log |2πΘ| − 12 nk

= − kd d
log k + log Γd [ ν2 ]/Γd [ ν20 ] − 12 nk
dn

2 log π − 2 log(k + 1) − 2
n  h i  ν o
d e −1 Θ| − k
log |2πΘ| − 21 log k ∆ − Ψ Θ−1 −1 −1 T

+ 2 log |Θ 2 ∗ + Θ
e − Θ Ψ + C0 |C0 |ν0 .

Appendix B: Model Comparison and Data Compression

If the implicit likelihood is inferred for a different model M′ , the Bayes’ ratio between the two models may be
calculated,

⟨p(Dobs |M)⟩m,M,C
B= (B1)
⟨p(Dobs |M′ )⟩m′ ,M ′ ,C ′

However, this calculation requires a d × d matrix to be inverted (see Eq. 6), which scales as O(dα × N ), where
2 < α ⩽ 3 depending on the algorithm used. To increase the computational efficiency, Alsing et. al. [29, 30] and
Heavens et. al. [21] remark that for a homoscedastic Gaussian likelihood, the data may be mapped into a set of n
summary statistics via the linear compression

D 7→ M T C −1 (D − m) (B2)
11

without information loss. In our case, the Gaussian likelihood is an approximation, so we may only claim lossless
compression to the extent that the approximation is good. The advantage of this method is that C −1 may be drawn
directly from the distribution

C −1 |S ∼ W [(k − 1)(∆ − ΨΘ−1 ΨT )]−1 , ν



(B3)

and thus, save for one-time inversion of the scale matrix, the complexity of the compression is no more than O(n ×
d2 × N ).
We propose a slightly different data compression scheme,

x = ΓM T C −1 (D − m), Γ ≡ (M T C −1 M )−1 , (B4)

where Γ can be computed in O(n2 × d2 × N ) time. We substitute into Eq. 6 to find

x|M ∼ N (µ, Σ + Γ).

The matrix Σ + Γ is only n × n, so if we assume that n ≪ d, this represents a potential reduction in computational
complexity.

Appendix C: Priors

Prior for the ΛCDM parameters, θCMB = (Ωb h2 , Ωc h2 , τ, ln 1010 As , ns , H0 )

µCMB = (2.22 × 10−2 , 0.120, 6.66 × 10−2 , 3.05, 0.964 × 10−1 , 67.3), (C1)

ΣCMB = diag(1.05 × 10−3 , 8.28 × 10−3 , 3.47 × 10−2 , 1.47 × 10−1 , 2.64 × 10−2 , 3.38). (C2)
12

[1] K. Cranmer, J. Brehmer, and G. Louppe, The frontier of D. Grimm, and L. Tortorelli, Simulation-based inference
simulation-based inference, Proceedings of the National of deep fields: galaxy population model and redshift
Academy of Sciences 117, 30055 (2020). distributions, Journal of Cosmology and Astroparticle
[2] D. B. Rubin, Bayesianly justifiable and relevant fre- Physics 2024 (05), 049.
quency calculations for the applied statistician, The An- [16] C. P. Novaes, L. Thiele, J. Armijo, S. Cheng, J. A. Cow-
nals of Statistics , 1151 (1984). ell, G. A. Marques, E. G. Ferreira, M. Shirasaki, K. Os-
[3] P. Marjoram, J. Molitor, V. Plagnol, and S. Tavaré, ato, and J. Liu, Cosmology from hsc y1 weak lensing with
Markov chain monte carlo without likelihoods, Proceed- combined higher-order statistics and simulation-based in-
ings of the National Academy of Sciences 100, 15324 ference, arXiv preprint arXiv:2409.01301 (2024).
(2003). [17] S. Fischbacher, B. Moser, T. Kacprzak, J. Herbel, L. Tor-
[4] S. A. Sisson, Y. Fan, and M. M. Tanaka, Sequential torelli, U. Schmitt, A. Refregier, and A. Amara, galsbi :
monte carlo without likelihoods, Proceedings of the Na- A python package for the galsbi galaxy population model,
tional Academy of Sciences 104, 1760 (2007). arXiv preprint arXiv:2412.08722 (2024).
[5] G. Papamakarios and I. Murray, Fast ε-free inference of [18] D. Castelvecchi, Can we open the black box of ai?, Nature
simulation models with bayesian conditional density es- News 538, 20 (2016).
timation, Advances in neural information processing sys- [19] J. Hermans, A. Delaunoy, F. Rozet, A. Wehenkel,
tems 29 (2016). V. Begy, and G. Louppe, A trust crisis in simulation-
[6] J. Alsing, T. Charnock, S. Feeney, and B. Wandelt, Fast based inference? your posterior approximations can be
likelihood-free cosmology with neural density estimators unfaithful, arXiv preprint arXiv:2110.06581 (2021).
and active learning, Monthly Notices of the Royal Astro- [20] F. Leclercq, W. Enzi, J. Jasche, and A. Heavens, Pri-
nomical Society 488, 4440 (2019). mordial power spectrum and cosmology from black-box
[7] A. Cole, B. K. Miller, S. J. Witte, M. X. Cai, M. W. galaxy surveys, Monthly Notices of the Royal Astronom-
Grootes, F. Nattino, and C. Weniger, Fast and credible ical Society 490, 4237 (2019).
likelihood-free cosmology with truncated marginal neural [21] A. F. Heavens, R. Jimenez, and O. Lahav, Massive loss-
ratio estimation, Journal of Cosmology and Astroparticle less data compression and multiple parameter estimation
Physics 2022 (09), 004. from galaxy spectra, Monthly Notices of the Royal As-
[8] P. Lemos, M. Cranmer, M. Abidi, C. Hahn, M. Eick- tronomical Society 317, 965 (2000).
enberg, E. Massara, D. Yallup, and S. Ho, Robust [22] A. K. Gupta and D. K. Nagar, Matrix variate distribu-
simulation-based inference in cosmology with bayesian tions (Chapman and Hall/CRC, 2018).
neural networks, Machine Learning: Science and Tech- [23] D. Piras and A. S. Mancini, Cosmopower-jax: high-
nology 4, 01LT01 (2023). dimensional bayesian inference with differentiable cos-
[9] G. Papamakarios, Neural density estimation mological emulators, arXiv preprint arXiv:2305.06347
and likelihood-free inference, arXiv preprint (2023).
arXiv:1910.13233 (2019). [24] A. Spurio Mancini, D. Piras, J. Alsing, B. Joachimi,
[10] S. Dupourqué, N. Clerc, E. Pointecouteau, D. Eckert, and M. P. Hobson, Cosmopower: emulating cosmologi-
S. Ettori, and F. Vazza, Investigating the turbulent hot cal power spectra for accelerated bayesian inference from
gas in x-cop galaxy clusters, Astronomy & Astrophysics next-generation surveys, Monthly Notices of the Royal
673, A91 (2023). Astronomical Society 511, 1771 (2022).
[11] M. Gatti, N. Jeffrey, L. Whiteway, J. Williamson, [25] J. S. Speagle, dynesty: a dynamic nested sampling pack-
B. Jain, V. Ajani, D. Anbajagane, G. Giannini, C. Zhou, age for estimating bayesian posteriors and evidences,
A. Porredon, et al., Dark energy survey year 3 results: Monthly Notices of the Royal Astronomical Society 493,
Simulation-based cosmological inference with wavelet 3132 (2020).
harmonics, scattering transforms, and moments of weak [26] S. Koposov, J. Speagle, K. Barbary, G. Ashton, E. Ben-
lensing mass maps. validation on simulations, Physical nett, J. Buchner, C. Scheffler, B. Cook, C. Talbot,
Review D 109, 063534 (2024). J. Guillochon, et al., joshspeagle/dynesty: v2. 0.0, Zen-
[12] M. Crisostomi, K. Dey, E. Barausse, and R. Trotta, odo (2022).
Neural posterior estimation with guaranteed exact cover- [27] E. Higson, W. Handley, M. Hobson, and A. Lasenby, Dy-
age: The ringdown of gw150914, Physical Review D 108, namic nested sampling: an improved algorithm for pa-
044029 (2023). rameter estimation and evidence calculation, Statistics
[13] K. Christy, E. J. Baxter, and J. Kumar, Applying and Computing 29, 891 (2019).
simulation-based inference to spectral and spatial infor- [28] A. Lewis, Getdist: a python package for analysing monte
mation from the galactic center gamma-ray excess, arXiv carlo samples, arXiv preprint arXiv:1910.13970 (2019).
preprint arXiv:2402.04549 (2024). [29] J. Alsing and B. Wandelt, Generalized massive optimal
[14] J. Harnois-Deraps, S. Heydenreich, B. Giblin, N. Mar- data compression, Monthly Notices of the Royal Astro-
tinet, T. Troester, M. Asgari, P. Burger, T. Castro, nomical Society: Letters 476, L60 (2018).
K. Dolag, C. Heymans, et al., Kids-1000 and des-y1 [30] J. Alsing, B. Wandelt, and S. Feeney, Massive optimal
combined: Cosmology from peak count statistics, arXiv data compression and density estimation for scalable,
preprint arXiv:2405.10312 (2024). likelihood-free inference in cosmology, Monthly Notices
[15] B. Moser, T. Kacprzak, S. Fischbacher, A. Refregier, of the Royal Astronomical Society 477, 2874 (2018).

You might also like