Sampling the space of solutions of an artificial neural network

Alessandro Zambon [email protected] Department of Physics, University of Milan and INFN, via Celoria 16, 20133 Milano, Italy    Enrico M. Malatesta [email protected] Department of Computing Sciences and Bocconi Institute for Data Science and Analytics (BIDSA), Bocconi University, 20136 Milano, Italy    Guido Tiana [email protected] Department of Physics, University of Milan and INFN, via Celoria 16, 20133 Milano, Italy    Riccardo Zecchina [email protected] Department of Computing Sciences and Bocconi Institute for Data Science and Analytics (BIDSA), Bocconi University, 20136 Milano, Italy
(March 11, 2025)
Abstract

The weight space of an artificial neural network can be systematically explored using tools from statistical mechanics. We employ a combination of a hybrid Monte Carlo algorithm which performs long exploration steps, a ratchet-based algorithm to investigate connectivity paths, and coupled replica models simulations to study subdominant flat regions. Our analysis focuses on one hidden layer networks and spans a range of energy levels and constrained density regimes.

Near the interpolation threshold, the low-energy manifold shows a spiky topology. While these spikes aid gradient descent, they can trap general sampling algorithms at low temperatures; they remain connected by complex paths in a confined, non-flat region.

In the overparameterized regime, however, the low-energy manifold becomes entirely flat, forming an extended complex structure that is easy to sample. These numerical results are supported by an analytical study of the training error landscape, and we show numerically that the qualitative features of the loss landscape are robust across different data structures. Our study aims to provide new methodological insights for developing scalable methods for large networks.

I Introduction

Understanding the geometry of the loss landscape in deep neural network models is a critical challenge epf , as it directly influences the design and optimization of learning algorithms. The non-convexity of the loss landscape introduces significant complexity that often precludes the application of analytical methods.

Empirical evidence suggests that first-order optimization methods such as gradient descent (GD) and its variants such as stochastic gradient descent (SGD) can effectively navigate certain regions of these landscapes LeCun et al. (2015, 1998); Bottou (2010); Kingma and Ba (2014); Wilson et al. (2017). However, the insights derived from such methods are closely tied to the specifics of the algorithms, limiting their usefulness in providing a comprehensive understanding of the underlying geometric structures. For example, large language models are typically not trained to optimality on their data Kaplan et al. (2020); Hoffmann et al. (2022), linking their generalization properties to high-loss configurations, a relationship that remains poorly understood but holds promise for improving learning efficiency.

Analytical progress has been made in shallow, non-convex networks under simplified assumptions Gardner and Derrida (1988); Baldassi et al. (2019), using statistical physics techniques such as the replica and cavity methods. These studies reveal a highly intricate structure of minima Baldassi et al. (2020a, 2015, 2021); Annesi et al. (2023), characterized by features such as the overlap gap that renders some minima inaccessible Huang and Kabashima (2014), as well as rare, broad, and accessible minima. These minima have also been found to exhibit a diverse and broad range of generalization abilities Baldassi et al. (2022, 2020b).

On the empirical side, simple low-dimensional visualizations of the loss landscape have been used to gain insight into the general properties of these minima and their arrangement within the loss landscape Li et al. (2018); Huang et al. (2020). In particular, linear and piecewise linear paths Draxler et al. (2018); Garipov et al. (2018); Fort and Jastrzebski (2019) have been used to show that weight configurations found by SGD are generally not linearly connected but are nevertheless connected through a piecewise path Pittorino et al. (2022); Entezari et al. (2022).

The problem of studying the geometry of the solutions of a deep neural network can be easily formulated in terms of statistical mechanics. While an exhaustive search for the regions associated with a low value of the loss \mathcal{L}caligraphic_L in the high-dimensional parameter space is infeasible, one can explore the parameter space requiring the average loss delimited-⟨⟩\langle\mathcal{L}\rangle⟨ caligraphic_L ⟩ to be small, while allowing its instantaneous value to fluctuate around it.

We investigate the loss landscape using a Hybrid Monte Carlo (HMC) approach Duane et al. (1987), which enables large exploratory steps guided by gradients, in contrast to the purely random updates of standard Monte Carlo methods.

We also introduce a Ratchet Hybrid Monte Carlo (RHMC) algorithm, which steers sampling along complex paths while maintaining low loss values. This approach is inspired by a similar technique shown to be efficient in protein folding simulations Camilloni et al. (2011) and in identifying their transition states Tiana and Camilloni (2012).

To demonstrate the effectiveness of HMC and RHMC, we compare their results with analytical predictions for shallow non-convex networks and find agreement between the numerical results of the numerical methods and these predictions. Additionally, we uncover previously unreported theoretical insights into the geometry of minima in single–hidden–layer neural networks with generic activation functions Baldassi et al. (2019), including configurations that pose challenges to conventional optimization approaches.

Although these results underscore the efficacy of HMC and RHMC as powerful tools for probing neural networks with intricate architectures, extending these findings to large-scale networks remains a challenge. Our work provides new insights in this regard.

The architecture we have chosen to study the solution space of artificial neural networks is a tree committee machine  Barkai et al. (1992); Engel et al. (1992); Baldassi et al. (2019). It can be considered as the simplest non-convex and non-linear toy model of a neural network. It has the advantages of being analytically treatable using replica methods techniques and that its optimized parameters are not related by trivial permutational symmetries. Classical works in the 90s have studied this model by using the replica method Barkai et al. (1990); Engel et al. (1992) for the sign activation function and in the thermodynamic limit N𝑁N\to\inftyitalic_N → ∞ and P𝑃P\to\inftyitalic_P → ∞ with αP/N𝛼𝑃𝑁\alpha\equiv P/Nitalic_α ≡ italic_P / italic_N fixed and for K=O(1)𝐾𝑂1K=O(1)italic_K = italic_O ( 1 ). The typical and atypical states Baldassi et al. (2019) of this model were studied in the large width K𝐾Kitalic_K limit (but with K/N0𝐾𝑁0K/N\to 0italic_K / italic_N → 0) for a generic activation function. The same work provided a determination of the SAT/UNSAT transition, i.e. the maximum number of samples that the model can in principle store, in the Replica Symmetric (RS) and 1-step Replica Symmetry Breaking (1RSB) scheme. Recently the exact SAT/UNSAT transition has been computed in Annesi et al. (2024) by using a numerical solution of the full Replica Symmetry Breaking (fRSB) equations and it has been compared with the the maximal capacity reached by Gradient Descent; this unveiled the presence of an hard phase for Gradient Descent.

Our results are presented as follows. We shall first introduce the model and the algorithms to sample the loss space (Sect. II), then we shall describe the numerical results obtained close to the interpolation threshold (Sect. III.3) . In the case of the overparametrized regime (Sect. III.2), we shall first present the numerical results of the sampling and then compare it with the analytical calculations. Finally, we will discuss how these results change when realistic, correlated data are used as input (Sect. III.3), and we will draw the overall conclusions (Sect. IV).

II The model and the Methods for sampling and connecting solutions

II.1 The model and main definitions

The tree committee machine we consider consists of a two layer neural network with a generic non-linear activation function, in which each of the K𝐾Kitalic_K neurons of the hidden layer is connected only to a subset of N/K𝑁𝐾N/Kitalic_N / italic_K elements of the N𝑁Nitalic_N–dimensional input vector (Fig. 1).

For any input 𝒙𝒙\boldsymbol{x}bold_italic_x, the output of the tree committee machine can be written as

y^(𝒘)=sign[Δ(𝒘)]=sign[1Kl=1Kclφ(KNi=1N/Kwlixli)],^𝑦𝒘signdelimited-[]Δ𝒘signdelimited-[]1𝐾superscriptsubscript𝑙1𝐾subscript𝑐𝑙𝜑𝐾𝑁superscriptsubscript𝑖1𝑁𝐾subscript𝑤𝑙𝑖subscript𝑥𝑙𝑖\hat{y}(\boldsymbol{w})=\text{sign}[\Delta(\boldsymbol{w})]\\ =\text{sign}\left[\frac{1}{\sqrt{K}}\sum_{l=1}^{K}c_{l}\,\varphi\left(\sqrt{% \frac{K}{N}}\sum_{i=1}^{N/K}w_{li}x_{li}\right)\right],start_ROW start_CELL over^ start_ARG italic_y end_ARG ( bold_italic_w ) = sign [ roman_Δ ( bold_italic_w ) ] end_CELL end_ROW start_ROW start_CELL = sign [ divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_φ ( square-root start_ARG divide start_ARG italic_K end_ARG start_ARG italic_N end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N / italic_K end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_l italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_l italic_i end_POSTSUBSCRIPT ) ] , end_CELL end_ROW (1)

where K𝐾Kitalic_K is the width of the hidden layer and φ𝜑\varphiitalic_φ is a non-linear activation function. The first layer is parameterized by a set of weights 𝒘N𝒘superscript𝑁\boldsymbol{w}\in\mathbb{R}^{N}bold_italic_w ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT, which will be trained to learn a dataset. The weights of the second layer, clsubscript𝑐𝑙c_{l}italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT are fixed (not learned) to ±1plus-or-minus1\pm 1± 1, with equal probability. Thus, the number of learned weights is exactly equal to N𝑁Nitalic_N, the input dimension of the network.

Refer to caption

Figure 1: Example of a tree committee machine with K=6, activation function φ𝜑\varphiitalic_φ for the neurons in the hidden layer and σ𝜎\sigmaitalic_σ for the neuron in the output layer. The weights of the output neuron have been set to ±1plus-or-minus1\pm 1± 1 for i=1,,K2𝑖1𝐾2i=1,...,\frac{K}{2}italic_i = 1 , … , divide start_ARG italic_K end_ARG start_ARG 2 end_ARG and i=K2+1,,K𝑖𝐾21𝐾i=\frac{K}{2}+1,...,Kitalic_i = divide start_ARG italic_K end_ARG start_ARG 2 end_ARG + 1 , … , italic_K respectively.

We consider a syntetic dataset 𝒟𝒟\mathcal{D}caligraphic_D made of P𝑃Pitalic_P input vectors 𝒙μsuperscript𝒙𝜇\boldsymbol{x}^{\mu}bold_italic_x start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT, whose elements are extracted from a standard normal distribution, and the associated labels yμ=±1superscript𝑦𝜇plus-or-minus1y^{\mu}=\pm 1italic_y start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT = ± 1, also generated randomly with equal probability. The ratio α=P/N𝛼𝑃𝑁\alpha=P/Nitalic_α = italic_P / italic_N is called constrained density.

The task is to find an N𝑁Nitalic_N-dimensional vector 𝒘𝒘\boldsymbol{w}bold_italic_w that correctly classifies each input 𝒙μsuperscript𝒙𝜇\boldsymbol{x}^{\mu}bold_italic_x start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT in the dataset to the corresponding label yμsuperscript𝑦𝜇y^{\mu}italic_y start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT, for every μ[P]𝜇delimited-[]𝑃\mu\in[P]italic_μ ∈ [ italic_P ], i.e.

yμΔμ(𝒘)>0,μ[P].formulae-sequencesuperscript𝑦𝜇superscriptΔ𝜇𝒘0for-all𝜇delimited-[]𝑃y^{\mu}\Delta^{\mu}(\boldsymbol{w})>0\,,\qquad\forall\mu\in[P]\,.italic_y start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT roman_Δ start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT ( bold_italic_w ) > 0 , ∀ italic_μ ∈ [ italic_P ] . (2)

A weight vector 𝒘𝒘\boldsymbol{w}bold_italic_w satisfying Eq. (2) will be therefore called in the following as a solution of the classification problem. Equivalently, this means that the training error of 𝒘𝒘\boldsymbol{w}bold_italic_w defined as the number of misclassified input-output associations

ϵt=μ=1PNE(yμΔμ(𝒘))NE(yμΔμ(𝒘))Θ(yμΔ(𝒘))subscriptitalic-ϵ𝑡superscriptsubscript𝜇1𝑃subscriptNEsuperscript𝑦𝜇superscriptΔ𝜇𝒘subscriptNEsuperscript𝑦𝜇superscriptΔ𝜇𝒘Θsuperscript𝑦𝜇Δ𝒘\begin{split}&\epsilon_{t}=\sum_{\mu=1}^{P}\ell_{\mathrm{NE}}\left(y^{\mu}% \Delta^{\mu}(\boldsymbol{w})\right)\\ &\ell_{\mathrm{NE}}\left(y^{\mu}\Delta^{\mu}(\boldsymbol{w})\right)\equiv% \Theta(-y^{\mu}\Delta(\boldsymbol{w}))\end{split}start_ROW start_CELL end_CELL start_CELL italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_μ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT roman_ℓ start_POSTSUBSCRIPT roman_NE end_POSTSUBSCRIPT ( italic_y start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT roman_Δ start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT ( bold_italic_w ) ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL roman_ℓ start_POSTSUBSCRIPT roman_NE end_POSTSUBSCRIPT ( italic_y start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT roman_Δ start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT ( bold_italic_w ) ) ≡ roman_Θ ( - italic_y start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT roman_Δ ( bold_italic_w ) ) end_CELL end_ROW (3)

is equal to zero, where we have denoted by Θ()Θ\Theta(\cdot)roman_Θ ( ⋅ ) the Heaviside function, and by NEsubscriptNE\ell_{\mathrm{NE}}roman_ℓ start_POSTSUBSCRIPT roman_NE end_POSTSUBSCRIPT the “error counting” loss. Since this loss is not differentiable, it is not used for optimization in machine learning, which typically relies on gradient-based algorithms. Instead, the error counting loss usually serves to evaluate whether a solution has been found, rather than guiding the optimization process itself.

A loss function that is commonly used to actually find solutions, and on which we will focus on this paper, is the so called cross-entropy loss, which in binary classification reads

CE(yμΔμ(𝒘))=ln(1+eyμΔμ(𝒘)).subscriptCEsuperscript𝑦𝜇superscriptΔ𝜇𝒘1superscript𝑒superscript𝑦𝜇superscriptΔ𝜇𝒘\ell_{\mathrm{CE}}(y^{\mu}\Delta^{\mu}(\boldsymbol{w}))=\ln\left(1+e^{-y^{\mu}% \Delta^{\mu}(\boldsymbol{w})}\right).roman_ℓ start_POSTSUBSCRIPT roman_CE end_POSTSUBSCRIPT ( italic_y start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT roman_Δ start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT ( bold_italic_w ) ) = roman_ln ( 1 + italic_e start_POSTSUPERSCRIPT - italic_y start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT roman_Δ start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT ( bold_italic_w ) end_POSTSUPERSCRIPT ) . (4)

In the following we will say that a solution is “typical” if it is extracted from the flat measure over the set of all solutions. Sometimes one requires not only that 𝒘𝒘\boldsymbol{w}bold_italic_w is a solution of the classification problem, but also that it satisfies a certain degree of robustness. This can be enforced by ensuring that Δμ(𝒘)superscriptΔ𝜇𝒘\Delta^{\mu}(\boldsymbol{w})roman_Δ start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT ( bold_italic_w ) aligns with the corresponding label yμsuperscript𝑦𝜇y^{\mu}italic_y start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT for any μ[P]𝜇delimited-[]𝑃\mu\in[P]italic_μ ∈ [ italic_P ], within a specified confidence level κ𝜅\kappaitalic_κ

yμΔμ(𝒘)>κ,μ[P]formulae-sequencesuperscript𝑦𝜇superscriptΔ𝜇𝒘𝜅for-all𝜇delimited-[]𝑃y^{\mu}\Delta^{\mu}(\boldsymbol{w})>\kappa\,,\qquad\forall\mu\in[P]italic_y start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT roman_Δ start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT ( bold_italic_w ) > italic_κ , ∀ italic_μ ∈ [ italic_P ] (5)

κ𝜅\kappaitalic_κ is also named “margin”. Imposing a margin κ>0𝜅0\kappa>0italic_κ > 0 ensures that the solution sampled non only has zero training error, but also it is robust to perturbations of the inputs. Both typical (i.e. κ=0𝜅0\kappa=0italic_κ = 0) and atypically robust solutions with a positive margin κ𝜅\kappaitalic_κ can be obtained using a loss function that generalizes the one defined in Eq. (3),

(yμΔμ(𝒘))=Θ(yμΔμ(𝒘)κ).superscript𝑦𝜇superscriptΔ𝜇𝒘Θsuperscript𝑦𝜇superscriptΔ𝜇𝒘𝜅\ell(y^{\mu}\Delta^{\mu}(\boldsymbol{w}))=\Theta(-y^{\mu}\Delta^{\mu}(% \boldsymbol{w})-\kappa)\,.roman_ℓ ( italic_y start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT roman_Δ start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT ( bold_italic_w ) ) = roman_Θ ( - italic_y start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT roman_Δ start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT ( bold_italic_w ) - italic_κ ) . (6)

Indeed in the limit β𝛽\beta\to\inftyitalic_β → ∞ the Boltzmann measure of Eq. (7), equipped with the loss function of Eq. (6), focuses on solutions satisfying Eq. (5). We stress that extracting solutions by optimizing a loss different from the error counting loss is therefore to be considered as "atypical".

II.2 The canonical–ensemble framework

We shall look for the solutions of the network using the formalism of the canonical ensemble, thus sampling the space of parameters that have in average a given value of the loss. The solutions of the network are then found sampling the parameters at low temperature, where the average loss is minimal.

The posterior distribution corresponding to a given loss function (or log-likelihood) (𝒘;𝒟)𝒘𝒟\mathcal{L}(\boldsymbol{w};\mathcal{D})caligraphic_L ( bold_italic_w ; caligraphic_D ) is given by the Boltzmann-like distribution

p(𝒘,β;𝒟)=eβ(𝒘;𝒟)p(𝒘)Z(β;𝒟)𝑝𝒘𝛽𝒟superscript𝑒𝛽𝒘𝒟𝑝𝒘𝑍𝛽𝒟p(\boldsymbol{w},\beta;\mathcal{D})=\frac{e^{-\beta\mathcal{L}(\boldsymbol{w};% \mathcal{D})}p(\boldsymbol{w})}{Z(\beta;\mathcal{D})}italic_p ( bold_italic_w , italic_β ; caligraphic_D ) = divide start_ARG italic_e start_POSTSUPERSCRIPT - italic_β caligraphic_L ( bold_italic_w ; caligraphic_D ) end_POSTSUPERSCRIPT italic_p ( bold_italic_w ) end_ARG start_ARG italic_Z ( italic_β ; caligraphic_D ) end_ARG (7)

where β=1/T𝛽1𝑇\beta=1/Titalic_β = 1 / italic_T is the inverse temperature, p(𝒘)𝑝𝒘p(\boldsymbol{w})italic_p ( bold_italic_w ) is the prior distribution of the weights. For simplicity, a Boltzmann constant kB=1subscript𝑘𝐵1k_{B}=1italic_k start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT = 1 is assumed in every equation. The factor Z(β;𝒟)𝑍𝛽𝒟Z(\beta;\mathcal{D})italic_Z ( italic_β ; caligraphic_D ) is a normalization factor that is called evidence in Bayesian statistics and partition function in statistical physics and it reads

Z(β;𝒟)=𝑑𝒘p(𝒘)eβ(𝒘;𝒟)𝑍𝛽𝒟differential-d𝒘𝑝𝒘superscript𝑒𝛽𝒘𝒟Z(\beta;\mathcal{D})=\int d\boldsymbol{w}\,p(\boldsymbol{w})\,e^{-\beta% \mathcal{L}(\boldsymbol{w};\mathcal{D})}italic_Z ( italic_β ; caligraphic_D ) = ∫ italic_d bold_italic_w italic_p ( bold_italic_w ) italic_e start_POSTSUPERSCRIPT - italic_β caligraphic_L ( bold_italic_w ; caligraphic_D ) end_POSTSUPERSCRIPT (8)

The loss function usually considered in the machine learning literature is factorized over the P𝑃Pitalic_P elements of the dataset, i.e.

(𝒘;𝒟)μ=1P(yμΔμ(𝒘))𝒘𝒟superscriptsubscript𝜇1𝑃superscript𝑦𝜇superscriptΔ𝜇𝒘\mathcal{L}(\boldsymbol{w};\mathcal{D})\equiv\sum_{\mu=1}^{P}\ell(y^{\mu}% \Delta^{\mu}(\boldsymbol{w}))caligraphic_L ( bold_italic_w ; caligraphic_D ) ≡ ∑ start_POSTSUBSCRIPT italic_μ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT roman_ℓ ( italic_y start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT roman_Δ start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT ( bold_italic_w ) ) (9)

where \ellroman_ℓ is a loss function per pattern and Δ(𝒘)Δ𝒘\Delta(\boldsymbol{w})roman_Δ ( bold_italic_w ) identifies the preactivation of the output node. We consider as a prior a standard normal distribution, or equivalently a L2superscript𝐿2L^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT regularization term with parameter λ𝜆\lambdaitalic_λ

p(𝒘)=eβλ2|𝒘|2𝑝𝒘superscript𝑒𝛽𝜆2superscript𝒘2p(\boldsymbol{w})=e^{-\frac{\beta\lambda}{2}|\boldsymbol{w}|^{2}}italic_p ( bold_italic_w ) = italic_e start_POSTSUPERSCRIPT - divide start_ARG italic_β italic_λ end_ARG start_ARG 2 end_ARG | bold_italic_w | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT (10)

We can incorporate the regularization term and the loss term in a function U(𝒘)𝑈𝒘U(\boldsymbol{w})italic_U ( bold_italic_w )

U(𝒘)=(𝒘;𝒟)+λ2|𝒘|2𝑈𝒘𝒘𝒟𝜆2superscript𝒘2U(\boldsymbol{w})=\mathcal{L}(\boldsymbol{w};\mathcal{D})+\frac{\lambda}{2}|% \boldsymbol{w}|^{2}italic_U ( bold_italic_w ) = caligraphic_L ( bold_italic_w ; caligraphic_D ) + divide start_ARG italic_λ end_ARG start_ARG 2 end_ARG | bold_italic_w | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (11)

that can be though as the (potential) energy associated to the neural network parameters 𝒘𝒘\boldsymbol{w}bold_italic_w. Equation (7) can be then rewritten as

p(𝒘,T)=eU(𝒘)TZU(T)𝑝𝒘𝑇superscript𝑒𝑈𝒘𝑇subscript𝑍𝑈𝑇p(\boldsymbol{w},T)=\frac{e^{-\frac{U(\boldsymbol{w})}{T}}}{Z_{U}(T)}italic_p ( bold_italic_w , italic_T ) = divide start_ARG italic_e start_POSTSUPERSCRIPT - divide start_ARG italic_U ( bold_italic_w ) end_ARG start_ARG italic_T end_ARG end_POSTSUPERSCRIPT end_ARG start_ARG italic_Z start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_T ) end_ARG (12)

where we have dropped for simplicity the dependence on the dataset 𝒟𝒟\mathcal{D}caligraphic_D both on the posterior distribution and in the partition function ZU(T)=𝑑𝒘eU(𝒘)Tsubscript𝑍𝑈𝑇differential-d𝒘superscript𝑒𝑈𝒘𝑇Z_{U}(T)=\int d\boldsymbol{w}\,e^{-\frac{U(\boldsymbol{w})}{T}}italic_Z start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT ( italic_T ) = ∫ italic_d bold_italic_w italic_e start_POSTSUPERSCRIPT - divide start_ARG italic_U ( bold_italic_w ) end_ARG start_ARG italic_T end_ARG end_POSTSUPERSCRIPT.

II.3 Hybrid Monte Carlo algorithm

One of the main goals of statistical inference is to be able to sample from the posterior distribution of Eq. (12). Monte Carlo–based methods achieve this by simulating a Markovian stochastic process over 𝒘𝒘\boldsymbol{w}bold_italic_w that converges to the distribution of Eq. (12). In the Metropolis implementation, the transition rate of the stochastic process is

rM(𝒘i𝒘i+1)=r0pap(𝒘i+1|𝒘i)min[1,eU(𝒘i+1)U(𝒘i)T]subscript𝑟𝑀subscript𝒘𝑖subscript𝒘𝑖1subscript𝑟0subscript𝑝𝑎𝑝conditionalsubscript𝒘𝑖1subscript𝒘𝑖min1superscript𝑒𝑈subscript𝒘𝑖1𝑈subscript𝒘𝑖𝑇r_{M}(\boldsymbol{w}_{i}\rightarrow\boldsymbol{w}_{i+1})\\ =r_{0}\,p_{ap}(\boldsymbol{w}_{i+1}|\boldsymbol{w}_{i})\,\mathrm{min}\biggl{[}% 1,e^{-\frac{U(\boldsymbol{w}_{i+1})-U(\boldsymbol{w}_{i})}{T}}\biggr{]}start_ROW start_CELL italic_r start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ( bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT → bold_italic_w start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL = italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_a italic_p end_POSTSUBSCRIPT ( bold_italic_w start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT | bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) roman_min [ 1 , italic_e start_POSTSUPERSCRIPT - divide start_ARG italic_U ( bold_italic_w start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ) - italic_U ( bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG italic_T end_ARG end_POSTSUPERSCRIPT ] end_CELL end_ROW (13)

where r0subscript𝑟0r_{0}italic_r start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is a rate constant, pap(𝒘i+1|𝒘i)subscript𝑝𝑎𝑝conditionalsubscript𝒘𝑖1subscript𝒘𝑖p_{ap}(\boldsymbol{w}_{i+1}|\boldsymbol{w}_{i})italic_p start_POSTSUBSCRIPT italic_a italic_p end_POSTSUBSCRIPT ( bold_italic_w start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT | bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) is the a priori conditional probability of proposing a move and the last term is the acceptance probability. If pap(𝒘i+1|𝒘i)subscript𝑝𝑎𝑝conditionalsubscript𝒘𝑖1subscript𝒘𝑖p_{ap}(\boldsymbol{w}_{i+1}|\boldsymbol{w}_{i})italic_p start_POSTSUBSCRIPT italic_a italic_p end_POSTSUBSCRIPT ( bold_italic_w start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT | bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) is symmetric upon exhange of the two states, the condition of detailed balance holds and the process converges to Eq. (12). Often a uniform a priori probability is used; however, the higher the dimension of the system, the lower is the probability of going in an optimal direction with a uniform choice.

A way of mitigating this problem is to choose the a priori probability exploiting the knowledge of the energy gradient Duane et al. (1987), using a Hamiltonian formalism. Defining 𝒑N𝒑superscript𝑁\boldsymbol{p}\in\mathbb{R}^{N}bold_italic_p ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT as the momenta associated to the weights 𝒘𝒘\boldsymbol{w}bold_italic_w, the Hamiltonian of the system is

H(𝒘,𝒑)=12𝒑T1𝒑+U(𝒘)𝐻𝒘𝒑12superscript𝒑𝑇superscript1𝒑𝑈𝒘\begin{split}H(\boldsymbol{w},\boldsymbol{p})&=\frac{1}{2}\boldsymbol{p}^{T}% \mathcal{M}^{-1}\boldsymbol{p}+U(\boldsymbol{w})\end{split}start_ROW start_CELL italic_H ( bold_italic_w , bold_italic_p ) end_CELL start_CELL = divide start_ARG 1 end_ARG start_ARG 2 end_ARG bold_italic_p start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT caligraphic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_p + italic_U ( bold_italic_w ) end_CELL end_ROW (14)

where 1superscript1\mathcal{M}^{-1}caligraphic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT is the inverse of a (fictitious) mass matrix.

The solutions to the equations of motion resulting from this Hamiltonian can be used as a preliminary step in the HMC algorithm. The time reversal invariance of Hamiltonian systems guarantees that the detailed balance holds.

In general, the equations of motions are solved with a numerical integrator, and detailed balance is satisfied only in the limit of time step δt0𝛿𝑡0\delta t\rightarrow 0italic_δ italic_t → 0. However, some integrators like the velocity Verlet algorithm

𝒘(k+1)=𝒘(k)+1𝒑(k)δt1U(𝒘(k))δt22𝒑(k+1)=𝒑(k)δt2[U(𝒘(k+1))+U(𝒘(k))].superscript𝒘𝑘1superscript𝒘𝑘superscript1superscript𝒑𝑘𝛿𝑡superscript1𝑈superscript𝒘𝑘𝛿superscript𝑡22superscript𝒑𝑘1superscript𝒑𝑘𝛿𝑡2delimited-[]𝑈superscript𝒘𝑘1𝑈superscript𝒘𝑘\begin{split}\boldsymbol{w}^{(k+1)}&=\boldsymbol{w}^{(k)}+\mathcal{M}^{-1}% \boldsymbol{p}^{(k)}\delta t-\mathcal{M}^{-1}\nabla U(\boldsymbol{w}^{(k)})% \frac{\delta t^{2}}{2}\\ \boldsymbol{p}^{(k+1)}&=\boldsymbol{p}^{(k)}-\frac{\delta t}{2}\bigl{[}\nabla U% (\boldsymbol{w}^{(k+1)})+\nabla U(\boldsymbol{w}^{(k)})\bigr{]}.\end{split}start_ROW start_CELL bold_italic_w start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT end_CELL start_CELL = bold_italic_w start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT + caligraphic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_p start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT italic_δ italic_t - caligraphic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∇ italic_U ( bold_italic_w start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) divide start_ARG italic_δ italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG end_CELL end_ROW start_ROW start_CELL bold_italic_p start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT end_CELL start_CELL = bold_italic_p start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT - divide start_ARG italic_δ italic_t end_ARG start_ARG 2 end_ARG [ ∇ italic_U ( bold_italic_w start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT ) + ∇ italic_U ( bold_italic_w start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) ] . end_CELL end_ROW (15)

are intrinsically symmetric for time reversal, and thus detailed balance holds for any choice of δt𝛿𝑡\delta titalic_δ italic_t, even if the energy is not conserved. In fact, evaluating the latter of Eqs. (15) for δtδt𝛿𝑡𝛿𝑡\delta t\to-\delta titalic_δ italic_t → - italic_δ italic_t one obtains

𝒑(k+1)+δt2U(𝒘(k+1))=𝒑(k)δt2U(𝒘(k)),superscript𝒑𝑘1𝛿𝑡2𝑈superscript𝒘𝑘1superscript𝒑𝑘𝛿𝑡2𝑈superscript𝒘𝑘\displaystyle\boldsymbol{p}^{(k+1)}+\frac{\delta t}{2}\nabla U(\boldsymbol{w}^% {(k+1)})=\boldsymbol{p}^{(k)}-\frac{\delta t}{2}\nabla U(\boldsymbol{w}^{(k)}),bold_italic_p start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT + divide start_ARG italic_δ italic_t end_ARG start_ARG 2 end_ARG ∇ italic_U ( bold_italic_w start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT ) = bold_italic_p start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT - divide start_ARG italic_δ italic_t end_ARG start_ARG 2 end_ARG ∇ italic_U ( bold_italic_w start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) , (16)

and substituting it in the former,

𝒘(k)superscript𝒘𝑘\displaystyle\boldsymbol{w}^{(k)}bold_italic_w start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT =𝒘(k+1)1𝒑(k+1)δt1U(𝒘(k+1))δt22absentsuperscript𝒘𝑘1superscript1superscript𝒑𝑘1𝛿𝑡superscript1𝑈superscript𝒘𝑘1𝛿superscript𝑡22\displaystyle=\boldsymbol{w}^{(k+1)}-\mathcal{M}^{-1}\boldsymbol{p}^{(k+1)}% \delta t-\mathcal{M}^{-1}\nabla U(\boldsymbol{w}^{(k+1)})\frac{\delta t^{2}}{2}= bold_italic_w start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT - caligraphic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_p start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT italic_δ italic_t - caligraphic_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∇ italic_U ( bold_italic_w start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT ) divide start_ARG italic_δ italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG
𝒑(k)superscript𝒑𝑘\displaystyle\boldsymbol{p}^{(k)}bold_italic_p start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT =𝒑(k+1)+δt2[U(𝒘(k))+U(𝒘(k+1))],absentsuperscript𝒑𝑘1𝛿𝑡2delimited-[]𝑈superscript𝒘𝑘𝑈superscript𝒘𝑘1\displaystyle=\boldsymbol{p}^{(k+1)}+\frac{\delta t}{2}\bigl{[}\nabla U(% \boldsymbol{w}^{(k)})+\nabla U(\boldsymbol{w}^{(k+1)})\bigr{]},= bold_italic_p start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT + divide start_ARG italic_δ italic_t end_ARG start_ARG 2 end_ARG [ ∇ italic_U ( bold_italic_w start_POSTSUPERSCRIPT ( italic_k ) end_POSTSUPERSCRIPT ) + ∇ italic_U ( bold_italic_w start_POSTSUPERSCRIPT ( italic_k + 1 ) end_POSTSUPERSCRIPT ) ] ,

which is a backward trajectory with respect to that of Eqs. (15).

Operatively, at each iteration of the Metropolis algorithm, the momenta are extracted from a Maxwell–Boltzmann distribution at temperature T𝑇Titalic_T and a trajectory in the phase space is generated solving Eqs. (15). The final point of the trajectory is then accepted with probability min(1,exp[ΔH/T])1Δ𝐻𝑇\min(1,\exp[-\Delta H/T])roman_min ( 1 , roman_exp [ - roman_Δ italic_H / italic_T ] ), where ΔH=H(𝒘i+1,𝒑i+1)H(𝒘i,𝒑i)Δ𝐻𝐻subscript𝒘𝑖1subscript𝒑𝑖1𝐻subscript𝒘𝑖subscript𝒑𝑖\Delta H=H(\boldsymbol{w}_{i+1},\boldsymbol{p}_{i+1})-H(\boldsymbol{w}_{i},% \boldsymbol{p}_{i})roman_Δ italic_H = italic_H ( bold_italic_w start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ) - italic_H ( bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). In the condition of detailed balance, the kinetic energy arising from the Metropolis acceptance probability and that coming from the Maxwell–Boltzmann term in the a priori probability papsubscript𝑝𝑎𝑝p_{ap}italic_p start_POSTSUBSCRIPT italic_a italic_p end_POSTSUBSCRIPT cancel out and the algorithm converges to Eq. (12).

The power of this scheme is that it can produce large moves in a direction that change smoothly the energy of the system, and thus the Metropolis acceptance rate can remain high.

II.4 Double ratchet

In deep learning, characterizing the structure of neural network loss landscapes is crucial for understanding both optimization dynamics and generalization properties. A key line of research explores pathways between distinct weight configurations 𝒘𝒘\boldsymbol{w}bold_italic_w and 𝒘superscript𝒘\boldsymbol{w}^{\prime}bold_italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, which typically correspond to solutions with low loss. The simplest approach examines how the loss behaves along the linear (or straight) path connecting these points. As shown in Draxler et al. (2018), overparameterized neural networks trained with stochastic gradient descent (SGD) often exhibit a loss barrier along the linear path. The barrier can be notably decreased by removing the symmetries that the neural network possess, like permutation symmetry of the hidden units Pittorino et al. (2022); Entezari et al. (2022).

While linear paths provide an intuitive and computationally efficient visualization of the loss landscape, they do not explicitly search for paths between solutions. To address these limitations, the authors of Garipov et al. (2018) introduced the use of piecewise linear trajectories, or “polygonal chains”, where the path is optimized by introducing k𝑘kitalic_k intermediate pivot points. Although this method allows for more flexible connections, it is computationally demanding, requiring multiple training runs to optimize the pivot locations. Furthermore, the number of required pivots can grow significantly, particularly in less overparameterized settings, making the approach increasingly impractical in such regimes.

As an alternative to generate trajectories from 𝒘𝒘\boldsymbol{w}bold_italic_w to 𝒘superscript𝒘\boldsymbol{w}^{\prime}bold_italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, we make use of a modified HMC algorithm that damps fluctuations in the direction opposite to 𝒘superscript𝒘\boldsymbol{w}^{\prime}bold_italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. This is inspired from the ratchet–and–paw mechanical system and has given good results in molecular dynamics simulations Camilloni et al. (2011). It can be implemented calculating the dynamics of two points 𝒘1(t)subscript𝒘1𝑡\boldsymbol{w}_{1}(t)bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t ) and 𝒘2(t)subscript𝒘2𝑡\boldsymbol{w}_{2}(t)bold_italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) starting in 𝒘𝒘\boldsymbol{w}bold_italic_w and 𝒘superscript𝒘\boldsymbol{w}^{\prime}bold_italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, respectively, and evolving under a HMC with energy

Urt[𝒘1(t),𝒘2(t)]=U[𝒘1(t)]+U[𝒘2(t)]+U~[𝒘1(t),𝒘2(t)]subscript𝑈𝑟𝑡subscript𝒘1𝑡subscript𝒘2𝑡𝑈delimited-[]subscript𝒘1𝑡𝑈delimited-[]subscript𝒘2𝑡~𝑈subscript𝒘1𝑡subscript𝒘2𝑡U_{rt}\bigl{[}\boldsymbol{w}_{1}(t),\boldsymbol{w}_{2}(t)\bigr{]}\\ =U\bigl{[}\boldsymbol{w}_{1}(t)\bigr{]}+U\bigl{[}\boldsymbol{w}_{2}(t)\bigr{]}% +\widetilde{U}\bigl{[}\boldsymbol{w}_{1}(t),\boldsymbol{w}_{2}(t)\bigr{]}start_ROW start_CELL italic_U start_POSTSUBSCRIPT italic_r italic_t end_POSTSUBSCRIPT [ bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t ) , bold_italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) ] end_CELL end_ROW start_ROW start_CELL = italic_U [ bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t ) ] + italic_U [ bold_italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) ] + over~ start_ARG italic_U end_ARG [ bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t ) , bold_italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) ] end_CELL end_ROW (17)

where U𝑈Uitalic_U is the potential energy of Eq.(11) and

U~[𝒘1(t),𝒘2(t)]={k2[|𝒘1(t)𝒘2(t)|dmin(t)]2if |𝒘1(t)𝒘2(t)|>dmin(t)0if |𝒘1(t)𝒘2(t)|dmin(t)~𝑈subscript𝒘1𝑡subscript𝒘2𝑡cases𝑘2superscriptdelimited-[]subscript𝒘1𝑡subscript𝒘2𝑡subscript𝑑𝑡2if subscript𝒘1𝑡subscript𝒘2𝑡subscript𝑑min𝑡0if subscript𝒘1𝑡subscript𝒘2𝑡subscript𝑑𝑡\widetilde{U}\bigl{[}\boldsymbol{w}_{1}(t),\boldsymbol{w}_{2}(t)\bigr{]}=% \begin{cases}\frac{k}{2}\bigl{[}|\boldsymbol{w}_{1}(t)-\boldsymbol{w}_{2}(t)|-% d_{\min}(t)\bigr{]}^{2}&\text{if }|\boldsymbol{w}_{1}(t)-\boldsymbol{w}_{2}(t)% |>d_{\mathrm{min}}(t)\\ 0&\text{if }|\boldsymbol{w}_{1}(t)-\boldsymbol{w}_{2}(t)|\leq d_{\min}(t)\end{cases}over~ start_ARG italic_U end_ARG [ bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t ) , bold_italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) ] = { start_ROW start_CELL divide start_ARG italic_k end_ARG start_ARG 2 end_ARG [ | bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t ) - bold_italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) | - italic_d start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( italic_t ) ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL start_CELL if | bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t ) - bold_italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) | > italic_d start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( italic_t ) end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL if | bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t ) - bold_italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) | ≤ italic_d start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( italic_t ) end_CELL end_ROW (18)

where dmin(t)mint<t|𝒘1(t)𝒘2(t)|subscript𝑑min𝑡subscriptsuperscript𝑡𝑡subscript𝒘1superscript𝑡subscript𝒘2superscript𝑡d_{\mathrm{min}}(t)\equiv\min_{t^{\prime}<t}|\boldsymbol{w}_{1}(t^{\prime})-% \boldsymbol{w}_{2}(t^{\prime})|italic_d start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT ( italic_t ) ≡ roman_min start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT < italic_t end_POSTSUBSCRIPT | bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) - bold_italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) | is the minimum distance observed along the trajectories of the two points up to time t𝑡titalic_t. The time–dependent term of Eq. (18) favors the moves which get the two weights closer to each other, without exerting work to push them. In this way, the two points can autonomously find the minimum–energy path that connects them.

During each simulation, the two vectors are updated sequentially at each time step according to Eq. (17), until their cosine–similarity (or normalized overlap)

q(𝒘1,𝒘2)=𝒘1𝒘2|𝒘1||𝒘2|𝑞subscript𝒘1subscript𝒘2subscript𝒘1subscript𝒘2subscript𝒘1subscript𝒘2q(\boldsymbol{w}_{1},\boldsymbol{w}_{2})=\frac{\boldsymbol{w}_{1}\cdot% \boldsymbol{w}_{2}}{|\boldsymbol{w}_{1}||\boldsymbol{w}_{2}|}italic_q ( bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = divide start_ARG bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ bold_italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG | bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | | bold_italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | end_ARG (19)

is equal to 1. We will call the point 𝒔superscript𝒔\boldsymbol{s}^{\star}bold_italic_s start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT where the two trajectories 𝒘1(t)subscript𝒘1𝑡\boldsymbol{w}_{1}(t)bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t ) and 𝒘2(t)subscript𝒘2𝑡\boldsymbol{w}_{2}(t)bold_italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) meet as the “anchor” weight of 𝒘1subscript𝒘1\boldsymbol{w}_{1}bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, 𝒘2subscript𝒘2\boldsymbol{w}_{2}bold_italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

II.5 Coupled replica simulations

Besides sampling the Boltzmann–like probability of Eq. (12), it can be useful to bias in order to give more statistical weight to wider energy minima, which were shown to have better generalization properties than narrow ones Baldassi et al. (2015, 2020b, 2021, 2022).

We did this using an entropy–driven search algorithm inspired from ref. Baldassi et al. (2016). We used a system composed of y𝑦yitalic_y coupled replicas {𝒘i(t)}i=1ysuperscriptsubscriptsubscript𝒘𝑖𝑡𝑖1𝑦\{\boldsymbol{w}_{i}(t)\}_{i=1}^{y}{ bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT, each starting from a different solution previously found by using the HMC. The replicas are coupled to the barycenter of the system

𝒘c(t)=1yi=1y|𝒘i(t)||i=1y𝒘i(t)|i=1y𝒘i(t)subscript𝒘𝑐𝑡1𝑦superscriptsubscript𝑖1𝑦subscript𝒘𝑖𝑡superscriptsubscript𝑖1𝑦subscript𝒘𝑖𝑡superscriptsubscript𝑖1𝑦subscript𝒘𝑖𝑡\displaystyle\boldsymbol{w}_{c}(t)=\frac{1}{y}\,\frac{\sum_{i=1}^{y}|% \boldsymbol{w}_{i}(t)|}{|\sum_{i=1}^{y}\boldsymbol{w}_{i}(t)|}\,\sum_{i=1}^{y}% \boldsymbol{w}_{i}(t)bold_italic_w start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( italic_t ) = divide start_ARG 1 end_ARG start_ARG italic_y end_ARG divide start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT | bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) | end_ARG start_ARG | ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) | end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) (20)

with a potential

Urp({𝒘i(t)}i=1y)=i=1yU[𝒘i(t)]+γTyi=1y|𝒘i(t)𝒘c(t)|,subscript𝑈𝑟𝑝superscriptsubscriptsubscript𝒘𝑖𝑡𝑖1𝑦superscriptsubscript𝑖1𝑦𝑈delimited-[]subscript𝒘𝑖𝑡𝛾𝑇𝑦superscriptsubscript𝑖1𝑦subscript𝒘𝑖𝑡subscript𝒘𝑐𝑡U_{rp}(\{\boldsymbol{w}_{i}(t)\}_{i=1}^{y})=\sum_{i=1}^{y}U\bigl{[}\boldsymbol% {w}_{i}(t)\bigr{]}+\frac{\gamma T}{y}\sum_{i=1}^{y}|\boldsymbol{w}_{i}(t)-% \boldsymbol{w}_{c}(t)|,italic_U start_POSTSUBSCRIPT italic_r italic_p end_POSTSUBSCRIPT ( { bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT italic_U [ bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) ] + divide start_ARG italic_γ italic_T end_ARG start_ARG italic_y end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT | bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) - bold_italic_w start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( italic_t ) | , (21)

where γ𝛾\gammaitalic_γ is a Lagrange multiplier regulating the mean distance between the replicas and the barycenter 𝒘csubscript𝒘𝑐\boldsymbol{w}_{c}bold_italic_w start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT.

Algorithmically, the weights {𝒘i}i=1ysuperscriptsubscriptsubscript𝒘𝑖𝑖1𝑦\{\boldsymbol{w}_{i}\}_{i=1}^{y}{ bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT are updated sequentially according to Eq. (21), whereas the barycenter is computed once one of the replicas has moved. The value of γ𝛾\gammaitalic_γ is increased during the simulations, until the y𝑦yitalic_y replicas 𝒘isubscript𝒘𝑖\boldsymbol{w}_{i}bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT all collapse onto a single high–entropy weight 𝒄𝒄\boldsymbol{c}bold_italic_c, that we will refer in the rest of the paper as the “center” weight found by the coupled replica simulation.

Note that in the definition of the barycenter of Eq. (20), the norm is rescaled to match the mean of the other replicas. This prevents the replicas from being driven toward a lower norm point, which would undesirably increase their energy U[𝒘i(t)]𝑈delimited-[]subscript𝒘𝑖𝑡U\bigl{[}\boldsymbol{w}_{i}(t)\bigr{]}italic_U [ bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) ].

III Results

Using the numerical techniques listed in the previous section, we explored the learning loss landscape of the tree committee machine at different energy levels and in different regimes of constrained density. Our main results are the following:

  • Close to interpolation threshold, the low-energy manifold has a spiky shaped structure, i.e. it is composed by sharp protrusions (targeted by GD) from which HMC is not able to escape at low temperatures. We nevertheless show that those protrusions are connected by complex paths through a more compact narrow region that is not completely flat (see section III.1)

  • In the overparametrized regime α1much-less-than𝛼1\alpha\ll 1italic_α ≪ 1 we show that the spikes do not trap anymore HMC. Furthermore, the low-energy manifold is entirely flat in the bulk, giving it a star-shaped structure. We confirm this numerical evidence by studying analytically the error landscape, showing analogous results (see section III.2).

  • Preliminary experiments on highly correlated, real-world datasets indicate that our findings remain robust even when the data exhibit significant structural properties.

Unless stated otherwise, in all the numerical simulations we have used a tree committee machine with N=1000𝑁1000N=1000italic_N = 1000 input neurons, K=50𝐾50K=50italic_K = 50 hidden neurons, and ReLU activation functions φ(x)=max(0,x)𝜑𝑥0𝑥\varphi(x)=\max(0,x)italic_φ ( italic_x ) = roman_max ( 0 , italic_x ). In HMC simulations we have used the cross entropy loss function in Eq. (4). The norm of the weights is controlled by the Lagrange multiplier λ=2P107𝜆2𝑃superscript107\lambda=2P\cdot 10^{-7}italic_λ = 2 italic_P ⋅ 10 start_POSTSUPERSCRIPT - 7 end_POSTSUPERSCRIPT, with P𝑃Pitalic_P being the dimension of the dataset. The mass matrix in Eq. (14) required by the algorithm is set =𝕀𝕀\mathcal{M}=\mathbb{I}caligraphic_M = blackboard_I.

Refer to caption

Figure 2: a) The average energy U𝒔subscriptdelimited-⟨⟩𝑈𝒔\langle U\rangle_{\boldsymbol{s}}⟨ italic_U ⟩ start_POSTSUBSCRIPT bold_italic_s end_POSTSUBSCRIPT (blue dots) and the mean training error ϵ𝒔subscriptdelimited-⟨⟩italic-ϵ𝒔\langle\epsilon\rangle_{\boldsymbol{s}}⟨ italic_ϵ ⟩ start_POSTSUBSCRIPT bold_italic_s end_POSTSUBSCRIPT (orange dots) computed on GD solutions with respect to the constraint density α=P/N𝛼𝑃𝑁\alpha=P/Nitalic_α = italic_P / italic_N. b) The average energy UTsubscriptdelimited-⟨⟩𝑈𝑇\langle U\rangle_{T}⟨ italic_U ⟩ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT (blue dots) and the mean training error ϵTsubscriptdelimited-⟨⟩italic-ϵ𝑇\langle\epsilon\rangle_{T}⟨ italic_ϵ ⟩ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT (orange dots) with respect to the temperature T𝑇Titalic_T. The gray vertical lines identify the three temperatures regimes (low, medium and high). c) The cosine–similarity distribution among solutions 𝒔𝒔\boldsymbol{s}bold_italic_s at α=1.8𝛼1.8\alpha=1.8italic_α = 1.8 (upper panel, in green). Middle panel, in blue: the intra (thin line histograms) and the inter (thick line histogram) state overlap distribution ρ(q)𝜌𝑞\rho(q)italic_ρ ( italic_q ) at low–temperature (T=1.8102𝑇1.8superscript102T=1.8\cdot 10^{-2}italic_T = 1.8 ⋅ 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT). Lower panel, in orange: the intra and inter state overlap distribution ρ(q)𝜌𝑞\rho(q)italic_ρ ( italic_q ) at intermediate temperature (T=1.8101𝑇1.8superscript101T=1.8\cdot 10^{1}italic_T = 1.8 ⋅ 10 start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT, lower panel, in orange). The three overlapping distributions have been slightly shifted along the x𝑥xitalic_x–axis to facilitate the comparison. d) A sketch of the subspace of solutions at fixed norm as suggested by the HMC simulations.

III.1 The model close to the interpolation threshold

We first study the loss landscape of the model in the underparametrized regime, i.e. just below the threshold at which full–batch GD can no longer find weights with zero training error and which is usually called "interpolation threshold" Engel and Van den Broeck (2001); Belkin et al. (2019). This happens approximately around α=1.8𝛼1.8\alpha=1.8italic_α = 1.8 (Fig. 2a), which is very far from the SAT/UNSAT transition αc2.65similar-to-or-equalssubscript𝛼𝑐2.65\alpha_{c}\simeq 2.65italic_α start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ≃ 2.65 where solutions cease to exist and which was computed in Annesi et al. (2024). All the numerical studies of this section therefore refer to the case α=1.8𝛼1.8\alpha=1.8italic_α = 1.8.

III.1.1 The weight space displays three regimes with respect to the temperature

First, we trained the model with GD for approximately 107similar-toabsentsuperscript107\sim 10^{7}∼ 10 start_POSTSUPERSCRIPT 7 end_POSTSUPERSCRIPT epochs and with a fixed learning rate η=1.0𝜂1.0\eta=1.0italic_η = 1.0. The solutions 𝒔isubscript𝒔𝑖\boldsymbol{s}_{i}bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT found with GD, from which we started the sampling, have mean energy U𝒔=(8.5±0.04)102subscriptdelimited-⟨⟩𝑈𝒔plus-or-minus8.50.04superscript102\langle U\rangle_{\boldsymbol{s}}=(8.5\pm 0.04)\cdot 10^{-2}⟨ italic_U ⟩ start_POSTSUBSCRIPT bold_italic_s end_POSTSUBSCRIPT = ( 8.5 ± 0.04 ) ⋅ 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT and mean training error ϵ𝒔=0subscriptdelimited-⟨⟩italic-ϵ𝒔0\langle\epsilon\rangle_{\boldsymbol{s}}=0⟨ italic_ϵ ⟩ start_POSTSUBSCRIPT bold_italic_s end_POSTSUBSCRIPT = 0 (Fig. 2a). The similarity between them is peaked at q0.6𝑞0.6q\approx 0.6italic_q ≈ 0.6 (see upper panel in Fig. 2c), which means that GD finds solutions with an almost fixed mutual distance.

Starting from GD solutions, we explored the space of parameters of the network with the HMC algorithm at different temperatures (Fig. 2b). For T<1.8101𝑇1.8superscript101T<1.8\cdot 10^{-1}italic_T < 1.8 ⋅ 10 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT the average energy UTsubscriptdelimited-⟨⟩𝑈𝑇\langle U\rangle_{T}⟨ italic_U ⟩ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT starts to flatten, suggesting that here the system freezes in the lowest–energy available states. The average energy is comparable with that of the solutions 𝒔isubscript𝒔𝑖\boldsymbol{s}_{i}bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and also the mean training error remains ϵ=0delimited-⟨⟩italic-ϵ0\langle\epsilon\rangle=0⟨ italic_ϵ ⟩ = 0. At these temperatures the sampling of the space of weights is difficult and the system gets trapped in local minima of the loss. The similarity distribution ρ(q)𝜌𝑞\rho(q)italic_ρ ( italic_q ) (see Eq. (19)) calculated on weights sampled from HMC trajectories starting from the same initial condition (intra-state overlap distribution) is strongly peaked around 1, see Fig. 2c, middle panel. This is markedly different from the inter-state overlap distribution, which is obtained by measuring the overlap between weights along HMC trajectories starting from strictly different initial conditions and is peaked at q0.63𝑞0.63q\approx 0.63italic_q ≈ 0.63. Thus, the system is not able to equilibrate at these temperatures but stays close to the initial condition.

In the range of temperature 1.8101<T<1.81021.8superscript101𝑇1.8superscript1021.8\cdot 10^{-1}<T<1.8\cdot 10^{2}1.8 ⋅ 10 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT < italic_T < 1.8 ⋅ 10 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT the average energy increases almost linearly UTTδsimilar-tosubscriptdelimited-⟨⟩𝑈𝑇superscript𝑇𝛿\langle U\rangle_{T}\sim T^{\delta}⟨ italic_U ⟩ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ∼ italic_T start_POSTSUPERSCRIPT italic_δ end_POSTSUPERSCRIPT, with a scaling exponent δ0.989±0.019𝛿plus-or-minus0.9890.019\delta\approx 0.989\pm 0.019italic_δ ≈ 0.989 ± 0.019. This is typical of a glassy thermodynamic phase with a sub-exponential number of accessible states. In this intermediate temperature range, there is no marked difference between the inter and intra state overlap distribution (Fig. 2c, lower panel). This suggests that the system is not trapped in local minima as in the low–temperature phase. The average similarity between states displays a single peak centered at q0.6𝑞0.6q\approx 0.6italic_q ≈ 0.6, similar to that observed at low temperature, but with a larger variance.

In the high temperature phase, the average energy deviates from the linear behavior and the training error further increases. The distribution of similarities q𝑞qitalic_q is very broaden. However, here the HMC algorithm becomes inefficient because the high velocities extracted before each HMC move require a very small step δt𝛿𝑡\delta titalic_δ italic_t for a correct integration of the equations of motion and thus for an acceptable acceptance rate.

We summarize these results in Fig. 2d, where we show a sketch of the space of parameters at fixed norm. On this hypersphere, the solutions found by the GD define energy basins which are explored by the system at low and intermediate temperatures. These basins are disconnected at low temperature, since HMC remains confined near the initialization and cannot find pathways connecting the different basins while remaining at low energy and at zero training error.

Refer to caption

Figure 3: a) A sketch of the ratchet algorithm. The point 𝒘i(t)subscript𝒘𝑖𝑡\boldsymbol{w}_{i}(t)bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) (green sphere) reaches the (fixed) point 𝒘jsubscript𝒘𝑗\boldsymbol{w}_{j}bold_italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT (red sphere), through low energy barriers. Backward moves (red region) are dumped by the quadratic term of (18). b) The energy profile along the geodesic evolving during the simulation time t𝑡titalic_t with respect to the similarity q𝑞qitalic_q between the two moving points of the double ratchet (dashed line). Two different ratcheted trajectories are shown starting from the same weights at T=TL1.8102𝑇subscript𝑇𝐿1.8superscript102T=T_{L}\equiv 1.8\cdot 10^{-2}italic_T = italic_T start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ≡ 1.8 ⋅ 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT. The solid curves indicate the energies calculated along geodesics connecting pairs of points picked at the same time. The mean and the standard deviation of the similarity between the anchor weights 𝒔superscript𝒔\boldsymbol{s}^{*}bold_italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT obtained from independent trajectories are also indicated. c) The similarity distribution ρ(q)𝜌𝑞\rho(q)italic_ρ ( italic_q ) between the GD solutions 𝒔𝒔\boldsymbol{s}bold_italic_s (in blue, upper panel), between the double–ratchet anchor weights 𝒔superscript𝒔\boldsymbol{s}^{*}bold_italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT (in orange, middle panel) and between 𝒔𝒔\boldsymbol{s}bold_italic_s and 𝒔superscript𝒔\boldsymbol{s}^{*}bold_italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT (in green, bottom panel). d) A sketch of the low–energy manifold.

III.1.2 The low–energy basins are connected by complex paths

To explore the connectivity of energy basins at low temperatures, we performed double-ratchet simulations, hoping to find paths which are not found by the HMC algorithm. This method is designed to identify low-barrier pathways between two weight configurations by following the minimal gradient in the direction that brings them closer together (Fig. 3a).

The double ratchets are initialized on different pairs of solutions (𝒔i,𝒔j)subscript𝒔𝑖subscript𝒔𝑗(\boldsymbol{s}_{i},\boldsymbol{s}_{j})( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) found after a short thermalization of the system of approximately 104superscript10410^{4}10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT HMC steps. Along each simulation, the weights do not cross significant energy barriers, as compared to the average energy at this temperature, keeping the training error equal to zero. Conversely, the linear interpolation between pairs of points along the double–ratchet trajectory results in barriers that are substantially higher than the average energy (Fig. 3b). These results suggest that, although the solutions found by gradient descent are linearly mode disconnected, low-energy tortuous paths joining them still exist.

The anchor point 𝒔superscript𝒔\boldsymbol{s}^{\star}bold_italic_s start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT obtained at the end of the double–ratchet trajectory has an energy that is slightly lower than the one of the initial points (𝒔i,𝒔j)subscript𝒔𝑖subscript𝒔𝑗(\boldsymbol{s}_{i},\boldsymbol{s}_{j})( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) and U𝒔=(6.80±0.09)102subscriptdelimited-⟨⟩𝑈superscript𝒔plus-or-minus6.800.09superscript102\langle U\rangle_{\boldsymbol{s}^{\star}}=(6.80\pm 0.09)\cdot 10^{-2}⟨ italic_U ⟩ start_POSTSUBSCRIPT bold_italic_s start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = ( 6.80 ± 0.09 ) ⋅ 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT is compatible with the average UTLsubscriptdelimited-⟨⟩𝑈subscript𝑇𝐿\langle U\rangle_{T_{L}}⟨ italic_U ⟩ start_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT end_POSTSUBSCRIPT, that is the constrained dynamic arising from the double–ratchet simulation helps the equilibration of the system. The anchor weights 𝒔superscript𝒔\boldsymbol{s}^{\star}bold_italic_s start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT obtained from several double ratchet simulations initialized from different weight pairs (𝒔i,𝒔j)subscript𝒔𝑖subscript𝒔𝑗(\boldsymbol{s}_{i},\boldsymbol{s}_{j})( bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) exhibit larger similarity than the respective initial vectors and are separated by energy barriers along their linear interpolation (Fig. 3c). This suggests the presence of a compact and narrow region of the solution space that must be traversed to move from one gradient descent solution to another. The similarity between the anchor weights 𝒔superscript𝒔\boldsymbol{s}^{\star}bold_italic_s start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT and the associated initial weights is also centered around 0.60.60.60.6, as that between the starting solutions.

In summary (Fig. 3d), the basins identified by GD solutions are connected by non–linear paths pointing towards a denser, more compact region with slightly lower energy. The low–energy manifold of the space of weights has thus a spiky shape, with the GD solutions lying on its spikes.

III.1.3 The center is not flat

Refer to caption

Figure 4: a) A sketch of the replica algorithm. The {𝒘i}i=1ysuperscriptsubscriptsubscript𝒘𝑖𝑖1𝑦\{\boldsymbol{w}_{i}\}_{i=1}^{y}{ bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT points are coupled to their barycenter 𝒘csubscript𝒘𝑐\boldsymbol{w}_{c}bold_italic_w start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT and the coupling constant is slowly increased during the simulation, until all the replicas collapse onto one small region of the weights space. b) The energy profile along the geodesic at fixed norm between centers and GD solutions (𝒄𝒔𝒄𝒔\boldsymbol{c}\leftrightarrow\boldsymbol{s}bold_italic_c ↔ bold_italic_s), between centers and double–ratchet solutions (𝒄𝒔𝒄superscript𝒔\boldsymbol{c}\leftrightarrow\boldsymbol{s}^{*}bold_italic_c ↔ bold_italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT) and between centers (𝒄𝒄𝒄𝒄\boldsymbol{c}\leftrightarrow\boldsymbol{c}bold_italic_c ↔ bold_italic_c) (upper panel). The black line marks the mean energy (and standard deviation) at T=TL𝑇subscript𝑇𝐿T=T_{L}italic_T = italic_T start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT. In the lower panel, the similarity distribution ρ(q)𝜌𝑞\rho(q)italic_ρ ( italic_q ) between centers and GD solutions (blue), between centers and double–ratchet solutions (orange) and among centers (green). c) The potential energy U𝑈Uitalic_U along the trajectory obtained by performing double–ratchet simulations connecting a center vector to a GD solution at T=TL𝑇subscript𝑇𝐿T=T_{L}italic_T = italic_T start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT (lower and upper blue curve, respectively) and two center points at T=5.4103<TL𝑇5.4superscript103subscript𝑇𝐿T=5.4\cdot 10^{-3}<T_{L}italic_T = 5.4 ⋅ 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT < italic_T start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT (green curves), where the similarity q𝑞qitalic_q parametrizes the trajectory. The blue line and shaded area are the mean energy and standard deviation, respectively, at T=TL𝑇subscript𝑇𝐿T=T_{L}italic_T = italic_T start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT. d) A sketch of the subspace of solutions where the central region has been studied by coupled replica simulations.

We then studied the geometry of the center of the spiky low energy manifold in more detail using coupled replica simulations, starting with five GD solutions coupled to their barycenter and increasing the harmonic couplings until all the running points converge to a single vector 𝒄𝒄\boldsymbol{c}bold_italic_c (Fig. 4a).

We compared the properties of the centers found in different coupled replica simulations with the solutions 𝒔𝒔\boldsymbol{s}bold_italic_s found by GD and the anchor points 𝒔superscript𝒔\boldsymbol{s}^{\star}bold_italic_s start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT found by double ratchets initialized on different GD solutions at T=TL𝑇subscript𝑇𝐿T=T_{L}italic_T = italic_T start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT. The final configuration 𝒄𝒄\boldsymbol{c}bold_italic_c always has an energy that is lower by several standard deviations than the mean energy at the simulation temperature (Fig. 4b,c). The centers obtained from the coupled replica simulations are confined to a very narrow region of the solution space (Fig. 4b), even narrower than the region spanned by the anchor points (middle panel of Fig. 3c).

In addition, the similarity distribution between centers and anchor points is larger than that between centers and GD solutions. This ordering suggests that the centers lie deep within the bulk of the solution manifold, with anchor points positioned more peripherally and GD solutions even farther away. This “nested overlap” structure has already been observed in the negative perceptron problem, see Annesi et al. (2024).

Despite the high similarity between the 𝒄𝒄\boldsymbol{c}bold_italic_c vectors (q0.90±0.03𝑞plus-or-minus0.900.03q\approx 0.90\pm 0.03italic_q ≈ 0.90 ± 0.03), the latter are not linearly connected at fixed norm (Fig. 4b, upper panel, green curves), even though the barrier height is significantly lower than the one between 𝒄𝒄\boldsymbol{c}bold_italic_c and GD solutions 𝒔𝒔\boldsymbol{s}bold_italic_s, as well as 𝒔superscript𝒔\boldsymbol{s}^{*}bold_italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT points (Fig. 4b, upper panel, blue and red curves, respectively).

Finally, we performed double–ratchet simulations between pairs of 𝒄𝒄\boldsymbol{c}bold_italic_c vectors and between centers and GD solutions. In the latter case, at T=TL𝑇subscript𝑇𝐿T=T_{L}italic_T = italic_T start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT the weight initialized on the center 𝒄𝒄\boldsymbol{c}bold_italic_c stays very close to it along the double ratchet dynamics (and its energy within two standard deviations from the average at the simulation temperature); the weight starting from the GD solution instead slowly lowers its energy until the coupled 𝒄𝒄\boldsymbol{c}bold_italic_c vector is reached (Fig. 4c, blue curves).

On the contrary, when the same simulation is performed between pairs of center points at T<TL𝑇subscript𝑇𝐿T<T_{L}italic_T < italic_T start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT, the latter are able to connect to each other through paths whose energy is lower than the average UTLsubscriptdelimited-⟨⟩𝑈subscript𝑇𝐿\langle U\rangle_{T_{L}}⟨ italic_U ⟩ start_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT end_POSTSUBSCRIPT (Fig. 4c, green curves).

From this last computational analysis, we conclude that the center of the spiky low–energy manifold is not barrier–free. Although there are low–energy paths connecting all low–energy regions of the structure, these are simple as linear paths. Moreover, there is a small slope of the energy towards the center.

III.1.4 The intermediate temperature regime

By increasing the temperature to intermediate values (cf. Fig. 2), the system reaches a phase where the HMC algorithm can easily sample the space of weights. In this phase the training error is still small but non–negligible (up to ϵ0.3italic-ϵ0.3\epsilon\approx 0.3italic_ϵ ≈ 0.3) but the system can still learn some of the data we present.

The similarity distribution between states shows a single peak centered at q0.6𝑞0.6q\approx 0.6italic_q ≈ 0.6, indicating that most of the sample states are equivalent to each other. This is different from what is found at low temperatures, where there is a difference between points at the periphery of the structure and weights in the center of the low–energy manifold. Such a single peak in ρ(q)𝜌𝑞\rho(q)italic_ρ ( italic_q ) is compatible with what is known in the language of complex systems as replica-symmetric behavior of the system Mézard et al. (1987).

The states sampled during simulations in this intermediate regime (T=1.8101𝑇1.8superscript101T=1.8\cdot 10^{1}italic_T = 1.8 ⋅ 10 start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT) show an average similarity q=0.54±0.03delimited-⟨⟩𝑞plus-or-minus0.540.03\langle q\rangle=0.54\pm 0.03⟨ italic_q ⟩ = 0.54 ± 0.03 with respect to GD solutions 𝒔isubscript𝒔𝑖\boldsymbol{s}_{i}bold_italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and q=0.58±0.04delimited-⟨⟩𝑞plus-or-minus0.580.04\langle q\rangle=0.58\pm 0.04⟨ italic_q ⟩ = 0.58 ± 0.04 with respect to both the ssuperscript𝑠s^{*}italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and 𝒄𝒄\boldsymbol{c}bold_italic_c vectors found in the low–temperature phase. Consequently, here the system is never close to any specific region of the subspace explored at low temperatures. In particular, the large mean distance between the center of the manifold and the typical states sampled at intermediate temperatures suggests that the former is entropically unfavorable. This fact, along with the unimodal shape of the cosine–similarity distribution, suggests that the visited manifold at intermediate temperatures still retains a (less rugged) spiky shape, but with a “hollow” center characterized by high free energy.

III.2 The overparametrized regime

III.2.1 Similarities and differences with the underparametrized regime

We then compared the properties of the space of weights that we found at the interpolation threshold with that of the overparametrized regime.

We performed various HMC simulations at α=0.2𝛼0.2\alpha=0.2italic_α = 0.2 for a wide range of temperatures, all of which never indicate a frozen behavior as, similarly to the intermediate–temperature case at α=1.8𝛼1.8\alpha=1.8italic_α = 1.8, the intra e inter overlap distribution coincide, see Fig. 5a. Moreover, the energy profile along the linear paths between the sampled states shows a central free energy barrier, both at α=1.8𝛼1.8\alpha=1.8italic_α = 1.8 and at most temperatures at α=0.2𝛼0.2\alpha=0.2italic_α = 0.2. In the last case, the central barrier disappears for vanishing temperatures and the energy profile assumes a convex shape (Fig. 5b).

In conclusion, the intermediate–temperature configurations visited by the system are similarly distributed both near the interpolation threshold and in the overparametrized regime, where in both scenarios the manifold exhibits a symmetric shape, in the sense that the system populates a single kind of state, belonging to the spiky periphery of the space. Only at low temperatures do the two cases differ, since in the former the explored subspace maintains a spiky shape, where part of the center of the manifold and its periphery are populated, whilst in the latter it becomes convex.

Refer to caption
Figure 5: a) Upper panel: intra-state (thin line histograms) and the inter-state overlap distribution (thick line histogram) for two simulations close to the interpolation threshold of GD (α=1.8𝛼1.8\alpha=1.8italic_α = 1.8) and T=1.8101𝑇1.8superscript101T=1.8\cdot 10^{1}italic_T = 1.8 ⋅ 10 start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT (intermediate temperature). The three overlapping distributions have been slightly shifted along the x𝑥xitalic_x–axis to facilitate the comparison). Lower panel: for the same (α,T)𝛼𝑇(\alpha,T)( italic_α , italic_T ), we show the binary cross-entropy loss along the mean geodesic curve for points sampled at equilibrium, parametrized by the variable γ[0,1]𝛾01\gamma\in[0,1]italic_γ ∈ [ 0 , 1 ]. b) The same quantities are presented for simulations in the overparametrized regime (α=0.2𝛼0.2\alpha=0.2italic_α = 0.2) and T=6103,2103,6104𝑇6superscript1032superscript1036superscript104T=6\cdot 10^{-3},2\cdot 10^{-3},6\cdot 10^{-4}italic_T = 6 ⋅ 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT , 2 ⋅ 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT , 6 ⋅ 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT (intermediate to low temperatures).

III.2.2 The barrier along linear paths between sampled solutions can be computed analytically

In the overparametrized regime, we can compute analytically both the energy (loss) and the training error along the geodesic path connecting two weights 𝒘1superscript𝒘1\boldsymbol{w}^{1}bold_italic_w start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT, 𝒘2superscript𝒘2\boldsymbol{w}^{2}bold_italic_w start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT in the thermodynamic limit, extending the previous analysis performed in ref. Annesi et al. (2023) to the tree committee machine case. In full generality we will assume to sample the two weights from two different Boltzmann distributions 𝒘1p1(𝒘;𝒟)similar-tosuperscript𝒘1subscript𝑝1𝒘𝒟\boldsymbol{w}^{1}\sim p_{1}(\boldsymbol{w};\mathcal{D})bold_italic_w start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ∼ italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_italic_w ; caligraphic_D ), 𝒘2p2(𝒘;𝒟)similar-tosuperscript𝒘2subscript𝑝2𝒘𝒟\boldsymbol{w}^{2}\sim p_{2}(\boldsymbol{w};\mathcal{D})bold_italic_w start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∼ italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_italic_w ; caligraphic_D ) whose individual form are the same as in equation (7) but each of them differing from the choice of the loss function and the prior, i.e.

p1(𝒘;𝒟)=eβ1(𝒘;𝒟)p1(𝒘)Z𝒟1subscript𝑝1𝒘𝒟superscript𝑒𝛽subscript1𝒘𝒟subscript𝑝1𝒘subscriptsuperscript𝑍1𝒟\displaystyle p_{1}(\boldsymbol{w};\mathcal{D})=\frac{e^{-\beta\mathcal{L}_{1}% (\boldsymbol{w};\mathcal{D})}p_{1}(\boldsymbol{w})}{Z^{1}_{\mathcal{D}}}italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_italic_w ; caligraphic_D ) = divide start_ARG italic_e start_POSTSUPERSCRIPT - italic_β caligraphic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_italic_w ; caligraphic_D ) end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_italic_w ) end_ARG start_ARG italic_Z start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT end_ARG (22a)
p2(𝒘;𝒟)=eβ2(𝒘;𝒟)p2(𝒘)Z𝒟2subscript𝑝2𝒘𝒟superscript𝑒𝛽subscript2𝒘𝒟subscript𝑝2𝒘subscriptsuperscript𝑍2𝒟\displaystyle p_{2}(\boldsymbol{w};\mathcal{D})=\frac{e^{-\beta\mathcal{L}_{2}% (\boldsymbol{w};\mathcal{D})}p_{2}(\boldsymbol{w})}{Z^{2}_{\mathcal{D}}}italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_italic_w ; caligraphic_D ) = divide start_ARG italic_e start_POSTSUPERSCRIPT - italic_β caligraphic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_italic_w ; caligraphic_D ) end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_italic_w ) end_ARG start_ARG italic_Z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT end_ARG (22b)

Notice, moreover, that in equations (22) the training data 𝒟𝒟\mathcal{D}caligraphic_D is the same for both Boltzmann distributions. As in Section II we choose a Gaussian distribution as a prior p1(𝒘)subscript𝑝1𝒘p_{1}(\boldsymbol{w})italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_italic_w ), p2(𝒘)subscript𝑝2𝒘p_{2}(\boldsymbol{w})italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_italic_w ), with a L2𝐿2L2italic_L 2 regularization parameter that we denote respectively by λ1subscript𝜆1\lambda_{1}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and λ2subscript𝜆2\lambda_{2}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. In the large N𝑁Nitalic_N limit the choice of the regularization parameter will induce a non-trivial value of the norm of the weights 𝒘1superscript𝒘1\boldsymbol{w}^{1}bold_italic_w start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT and 𝒘2superscript𝒘2\boldsymbol{w}^{2}bold_italic_w start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. In the following we will suppose that λ1subscript𝜆1\lambda_{1}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and λ2subscript𝜆2\lambda_{2}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are chosen such that the norm of 𝒘1subscript𝒘1\boldsymbol{w}_{1}bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is the same as the one of 𝒘2superscript𝒘2\boldsymbol{w}^{2}bold_italic_w start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, i.e. |𝒘1|2=|𝒘2|2Qsuperscriptsuperscript𝒘12superscriptsuperscript𝒘22𝑄|\boldsymbol{w}^{1}|^{2}=|\boldsymbol{w}^{2}|^{2}\equiv Q| bold_italic_w start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = | bold_italic_w start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≡ italic_Q.

We are interested in computing the average training error and training loss landscape on the geodesic path joining 𝒘1superscript𝒘1\boldsymbol{w}^{1}bold_italic_w start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT and 𝒘2superscript𝒘2\boldsymbol{w}^{2}bold_italic_w start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT at fixed squared norm Q𝑄Qitalic_Q. This can be obtained as follows. Given 𝒘1superscript𝒘1\boldsymbol{w}^{1}bold_italic_w start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT and 𝒘2superscript𝒘2\boldsymbol{w}^{2}bold_italic_w start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT we define the interpolating weight 𝒘(γ)𝒘𝛾\boldsymbol{w}(\gamma)bold_italic_w ( italic_γ ) with parameter γ[0,1]𝛾01\gamma\in[0,1]italic_γ ∈ [ 0 , 1 ] as the vector whose components are the linear interpolation of the components of 𝒘1superscript𝒘1\boldsymbol{w}^{1}bold_italic_w start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT and 𝒘2superscript𝒘2\boldsymbol{w}^{2}bold_italic_w start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT

wli(γ)γwli1+(1γ)wli2.subscript𝑤𝑙𝑖𝛾𝛾superscriptsubscript𝑤𝑙𝑖11𝛾superscriptsubscript𝑤𝑙𝑖2w_{li}(\gamma)\equiv\gamma w_{li}^{1}+(1-\gamma)w_{li}^{2}\,.italic_w start_POSTSUBSCRIPT italic_l italic_i end_POSTSUBSCRIPT ( italic_γ ) ≡ italic_γ italic_w start_POSTSUBSCRIPT italic_l italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT + ( 1 - italic_γ ) italic_w start_POSTSUBSCRIPT italic_l italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (23)

In order to compute the geodesic path at the same squared norm Q𝑄Qitalic_Q, we finally need to project the whole straight path on the hypersphere of radius Q𝑄\sqrt{Q}square-root start_ARG italic_Q end_ARG. This defines the weight

w~li(γ)wli(γ)cγsubscript~𝑤𝑙𝑖𝛾subscript𝑤𝑙𝑖𝛾subscript𝑐𝛾\widetilde{w}_{li}(\gamma)\equiv\frac{w_{li}(\gamma)}{c_{\gamma}}over~ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_l italic_i end_POSTSUBSCRIPT ( italic_γ ) ≡ divide start_ARG italic_w start_POSTSUBSCRIPT italic_l italic_i end_POSTSUBSCRIPT ( italic_γ ) end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG (24)

where

cγ|𝒘(γ)|Q1QNl=1Ki=1N/Kwli2(γ)=12γ(1γ)(1pQ)subscript𝑐𝛾𝒘𝛾𝑄1𝑄𝑁superscriptsubscript𝑙1𝐾superscriptsubscript𝑖1𝑁𝐾subscriptsuperscript𝑤2𝑙𝑖𝛾12𝛾1𝛾1𝑝𝑄c_{\gamma}\equiv\frac{\lvert\boldsymbol{w}(\gamma)\rvert}{\sqrt{Q}}\equiv\sqrt% {\frac{1}{QN}\sum_{l=1}^{K}\sum_{i=1}^{N/K}w^{2}_{li}(\gamma)}\\ =\sqrt{1-2\gamma(1-\gamma)\left(1-\frac{p}{Q}\right)}start_ROW start_CELL italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT ≡ divide start_ARG | bold_italic_w ( italic_γ ) | end_ARG start_ARG square-root start_ARG italic_Q end_ARG end_ARG ≡ square-root start_ARG divide start_ARG 1 end_ARG start_ARG italic_Q italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N / italic_K end_POSTSUPERSCRIPT italic_w start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l italic_i end_POSTSUBSCRIPT ( italic_γ ) end_ARG end_CELL end_ROW start_ROW start_CELL = square-root start_ARG 1 - 2 italic_γ ( 1 - italic_γ ) ( 1 - divide start_ARG italic_p end_ARG start_ARG italic_Q end_ARG ) end_ARG end_CELL end_ROW (25)

In the previous expression we have introduced the quantity

p𝔼𝒟1Nliwli1wli2𝒘1p1(;𝒟),𝒘2p2(;𝒟)𝑝subscript𝔼𝒟subscriptdelimited-⟨⟩1𝑁subscript𝑙𝑖superscriptsubscript𝑤𝑙𝑖1superscriptsubscript𝑤𝑙𝑖2formulae-sequencesimilar-tosuperscript𝒘1subscript𝑝1𝒟similar-tosuperscript𝒘2subscript𝑝2𝒟p\equiv\mathbb{E}_{\mathcal{D}}\left\langle\frac{1}{N}\sum_{li}w_{li}^{1}w_{li% }^{2}\right\rangle_{\boldsymbol{w}^{1}\sim p_{1}(\cdot;\mathcal{D}),% \boldsymbol{w}^{2}\sim p_{2}(\cdot;\mathcal{D})}italic_p ≡ blackboard_E start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT ⟨ divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_l italic_i end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_l italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_l italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⟩ start_POSTSUBSCRIPT bold_italic_w start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ∼ italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( ⋅ ; caligraphic_D ) , bold_italic_w start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∼ italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( ⋅ ; caligraphic_D ) end_POSTSUBSCRIPT (26)

which corresponds to the overlap between 𝒘1superscript𝒘1\boldsymbol{w}^{1}bold_italic_w start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT and 𝒘2superscript𝒘2\boldsymbol{w}^{2}bold_italic_w start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. In the previous equation we have denoted by delimited-⟨⟩\langle\cdot\rangle⟨ ⋅ ⟩ the average over the Boltzmann distribution (7) and by 𝔼𝒟subscript𝔼𝒟\mathbb{E}_{\mathcal{D}}blackboard_E start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT the average over the dataset.

We are interested in finding what is the average value of a loss ~~\widetilde{\mathcal{L}}over~ start_ARG caligraphic_L end_ARG of the projected interpolated weight 𝒘~(γ)~𝒘𝛾\widetilde{\boldsymbol{w}}(\gamma)over~ start_ARG bold_italic_w end_ARG ( italic_γ ) as a function of γ𝛾\gammaitalic_γ, and how this profile depends on the choice of the endpoints 𝒘1superscript𝒘1\boldsymbol{w}^{1}bold_italic_w start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT and 𝒘2superscript𝒘2\boldsymbol{w}^{2}bold_italic_w start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT via the probability density functions in (22). In formulas, what we compute is

E(γ)=1P𝔼𝒟𝔼𝒘1p1(;𝒟)𝔼𝒘2p2(;𝒟)~(𝒘~(γ);𝒟)=1P𝔼𝒟𝑑𝒘1𝑑𝒘2p(𝒘1)p(𝒘2)eβ1(𝒘1;𝒟)β2(𝒘2;𝒟)~(𝒘~(γ);𝒟)Z𝒟1Z𝒟2𝐸𝛾1𝑃subscript𝔼𝒟subscript𝔼similar-tosuperscript𝒘1subscript𝑝1𝒟subscript𝔼similar-tosuperscript𝒘2subscript𝑝2𝒟~~𝒘𝛾𝒟1𝑃subscript𝔼𝒟differential-dsuperscript𝒘1differential-dsuperscript𝒘2𝑝superscript𝒘1𝑝superscript𝒘2superscript𝑒𝛽subscript1subscript𝒘1𝒟𝛽subscript2subscript𝒘2𝒟~~𝒘𝛾𝒟superscriptsubscript𝑍𝒟1superscriptsubscript𝑍𝒟2\begin{split}E(\gamma)&=\frac{1}{P}\,\mathbb{E}_{\mathcal{D}}\,\mathbb{E}_{% \boldsymbol{w}^{1}\sim p_{1}(\cdot\,;\mathcal{D})}\,\mathbb{E}_{\boldsymbol{w}% ^{2}\sim p_{2}(\cdot\,;\mathcal{D})}\,\widetilde{\mathcal{L}}\left(\widetilde{% \boldsymbol{w}}(\gamma);\mathcal{D}\right)\\ &=\frac{1}{P}\,\mathbb{E}_{\mathcal{D}}\frac{\int d\boldsymbol{w}^{1}d% \boldsymbol{w}^{2}\,p(\boldsymbol{w}^{1})p(\boldsymbol{w}^{2})\,e^{-\beta% \mathcal{L}_{1}(\boldsymbol{w}_{1};\mathcal{D})-\beta\mathcal{L}_{2}(% \boldsymbol{w}_{2};\mathcal{D})}\,\widetilde{\mathcal{L}}\left(\widetilde{% \boldsymbol{w}}(\gamma);\mathcal{D}\right)}{Z_{\mathcal{D}}^{1}Z_{\mathcal{D}}% ^{2}}\end{split}start_ROW start_CELL italic_E ( italic_γ ) end_CELL start_CELL = divide start_ARG 1 end_ARG start_ARG italic_P end_ARG blackboard_E start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT bold_italic_w start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ∼ italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( ⋅ ; caligraphic_D ) end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT bold_italic_w start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∼ italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( ⋅ ; caligraphic_D ) end_POSTSUBSCRIPT over~ start_ARG caligraphic_L end_ARG ( over~ start_ARG bold_italic_w end_ARG ( italic_γ ) ; caligraphic_D ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = divide start_ARG 1 end_ARG start_ARG italic_P end_ARG blackboard_E start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT divide start_ARG ∫ italic_d bold_italic_w start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT italic_d bold_italic_w start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_p ( bold_italic_w start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ) italic_p ( bold_italic_w start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_e start_POSTSUPERSCRIPT - italic_β caligraphic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; caligraphic_D ) - italic_β caligraphic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ; caligraphic_D ) end_POSTSUPERSCRIPT over~ start_ARG caligraphic_L end_ARG ( over~ start_ARG bold_italic_w end_ARG ( italic_γ ) ; caligraphic_D ) end_ARG start_ARG italic_Z start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_CELL end_ROW (27)

Using a general loss ~~\widetilde{\mathcal{L}}over~ start_ARG caligraphic_L end_ARG allows us also to both have access to training loss and training error profiles. Equation (27) can be computed by the replica method in the large N𝑁Nitalic_N limit. In the following we will focus only on the infinite width limit K𝐾K\to\inftyitalic_K → ∞ with the ratio K/N0𝐾𝑁0K/N\to 0italic_K / italic_N → 0, as done in Baldassi et al. (2019). The full calculation is reported in Appendix B. Here we only mention that the result of the calculation, assuming no replica symmetry breaking of the solution space (i.e. in the so called Replica Symmetric ansatz), will depend on simple geometrical quantities. Those are the typical overlap between two weights 𝒘asuperscript𝒘𝑎\boldsymbol{w}^{a}bold_italic_w start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT, 𝒘bsuperscript𝒘𝑏\boldsymbol{w}^{b}bold_italic_w start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT extracted from p1subscript𝑝1p_{1}italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, 𝒘a,𝒘bp1(𝒘;𝒟)similar-tosuperscript𝒘𝑎superscript𝒘𝑏subscript𝑝1𝒘𝒟\boldsymbol{w}^{a},\boldsymbol{w}^{b}\sim p_{1}(\boldsymbol{w};\mathcal{D})bold_italic_w start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT , bold_italic_w start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT ∼ italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_italic_w ; caligraphic_D ) (respectively from p2subscript𝑝2p_{2}italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, i.e. 𝒘a,𝒘bp2(𝒘;𝒟)similar-tosuperscript𝒘𝑎superscript𝒘𝑏subscript𝑝2𝒘𝒟\boldsymbol{w}^{a},\boldsymbol{w}^{b}\sim p_{2}(\boldsymbol{w};\mathcal{D})bold_italic_w start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT , bold_italic_w start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT ∼ italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_italic_w ; caligraphic_D )), i.e.

q1subscript𝑞1\displaystyle q_{1}italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT =𝔼𝒟1Nliwliawlib𝒘a,𝒘bp1(;𝒟)absentsubscript𝔼𝒟subscriptdelimited-⟨⟩1𝑁subscript𝑙𝑖superscriptsubscript𝑤𝑙𝑖𝑎superscriptsubscript𝑤𝑙𝑖𝑏similar-tosuperscript𝒘𝑎superscript𝒘𝑏subscript𝑝1𝒟\displaystyle=\mathbb{E}_{\mathcal{D}}\left\langle\frac{1}{N}\sum_{li}w_{li}^{% a}w_{li}^{b}\right\rangle_{\boldsymbol{w}^{a},\boldsymbol{w}^{b}\sim p_{1}(% \cdot;\mathcal{D})}= blackboard_E start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT ⟨ divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_l italic_i end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_l italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_l italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT ⟩ start_POSTSUBSCRIPT bold_italic_w start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT , bold_italic_w start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT ∼ italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( ⋅ ; caligraphic_D ) end_POSTSUBSCRIPT (28a)
q2subscript𝑞2\displaystyle q_{2}italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT =𝔼𝒟1Nliwliawlib𝒘a,𝒘bp2(;𝒟)absentsubscript𝔼𝒟subscriptdelimited-⟨⟩1𝑁subscript𝑙𝑖superscriptsubscript𝑤𝑙𝑖𝑎superscriptsubscript𝑤𝑙𝑖𝑏similar-tosuperscript𝒘𝑎superscript𝒘𝑏subscript𝑝2𝒟\displaystyle=\mathbb{E}_{\mathcal{D}}\left\langle\frac{1}{N}\sum_{li}w_{li}^{% a}w_{li}^{b}\right\rangle_{\boldsymbol{w}^{a},\boldsymbol{w}^{b}\sim p_{2}(% \cdot;\mathcal{D})}= blackboard_E start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT ⟨ divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_l italic_i end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_l italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_l italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT ⟩ start_POSTSUBSCRIPT bold_italic_w start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT , bold_italic_w start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT ∼ italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( ⋅ ; caligraphic_D ) end_POSTSUBSCRIPT (28b)

as well as the typical overlap p𝑝pitalic_p between the endpoints 𝒘1superscript𝒘1\boldsymbol{w}^{1}bold_italic_w start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT and 𝒘2superscript𝒘2\boldsymbol{w}^{2}bold_italic_w start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, which was introduced in equation (26). In the Replica Symmetric ansatz all those overlaps concentrate in the large N𝑁Nitalic_N limit. Since the weights 𝒘1superscript𝒘1\boldsymbol{w}^{1}bold_italic_w start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT and 𝒘2superscript𝒘2\boldsymbol{w}^{2}bold_italic_w start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT are in general samples of two different probability distributions (22), the overlaps q1subscript𝑞1q_{1}italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and q2subscript𝑞2q_{2}italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT can be simply obtained by computing with the replica method the corresponding partition function Z𝒟1subscriptsuperscript𝑍1𝒟Z^{1}_{\mathcal{D}}italic_Z start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT and Z𝒟2subscriptsuperscript𝑍2𝒟Z^{2}_{\mathcal{D}}italic_Z start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT. We refer to Baldassi et al. (2019, 2023) for such calculation, but we report for convenience of the reader in Appendix A its outcome. The overlap p𝑝pitalic_p is instead slightly more difficult to compute, because it amounts to study an elastically coupled system of weights 𝒘1superscript𝒘1\boldsymbol{w}^{1}bold_italic_w start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT, 𝒘2superscript𝒘2\boldsymbol{w}^{2}bold_italic_w start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT in the limit where the coupling is sent to zero, see Annesi et al. (2023) for details. We present in Appendix D an alternative and complementary calculation based on the Franz-Parisi potential Franz and Parisi (1995).

III.2.3 Comparison between theory and simulations

We compare here our analytical predictions with the numerical results obtained using the HMC algorithm. We have considered the case K/N=0.005𝐾𝑁0.005K/N=0.005italic_K / italic_N = 0.005, a constraint density α=0.2𝛼0.2\alpha=0.2italic_α = 0.2, temperature T=0.02𝑇0.02T=0.02italic_T = 0.02 and L2 regularization βλ=0.02𝛽𝜆0.02\beta\lambda=0.02italic_β italic_λ = 0.02.

The plot of Figure 6 shows the cross-entropy loss along the geodesic path interpolating between two samples of the Boltzmann distribution. We note that the cross-entropy profile presents a non-trivial non-monotonic behavior: starting from one of the two configurations, the loss starts to decrease and then increasing again, reaching a local maximum in the middle of the path. This same non-trivial behavior is also observed in the numerical estimate. We emphasize that achieving the asymptotic limit predicted by the theory is nontrivial, as it requires accounting for finite N𝑁Nitalic_N and K𝐾Kitalic_K corrections while operating in the scaling regime K/N0𝐾𝑁0K/N\to 0italic_K / italic_N → 0. Nevertheless, by keeping the ratio K/N𝐾𝑁K/Nitalic_K / italic_N small and increasing N𝑁Nitalic_N, we show that the simulations approach the predictions given by our infinite-size theory.

Refer to caption
Figure 6: Loss landscape along the (fixed norm) geodesic path interpolating two weight configurations sampled from the Boltzmann distribution (7) with the cross entropy loss, temperature T=0.02𝑇0.02T=0.02italic_T = 0.02 and regularization parameter βλ=0.02𝛽𝜆0.02\beta\lambda=0.02italic_β italic_λ = 0.02. The ReLU function is used as the activation function. We have fixed the constrained density to α=0.2𝛼0.2\alpha=0.2italic_α = 0.2 and K/N=0.005𝐾𝑁0.005K/N=0.005italic_K / italic_N = 0.005, while increasing N𝑁Nitalic_N (points). Full line refers to the theoretical prediction, which is reported in equation (62) of the appendix.

III.2.4 The star-shaped property of the solution space

Thus far, in both the underparametrized and overparametrized regimes we have focused on the energy landscape induced by the cross-entropy loss. Here, we broaden our scope to examine the entire solution space, i.e. we consider the loss function (6). Notice that this is a larger set of set of weights that includes, but is not limited to, those configurations selected by optimizing the cross-entropy loss.

In Figure 7 we consider the case in which the endpoints are sampled from the large β𝛽\betaitalic_β limit of the Boltzmann distributions corresponding to the theta loss (6), with two equal margin κ1=κ2=κsubscript𝜅1subscript𝜅2𝜅\kappa_{1}=\kappa_{2}=\kappaitalic_κ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_κ. We plot the training error  (3) (i.e. ~(𝒘~;𝒟)=μ=1PΘ(yμΔμ(𝒘))~~𝒘𝒟superscriptsubscript𝜇1𝑃Θsuperscript𝑦𝜇superscriptΔ𝜇𝒘\widetilde{\mathcal{L}}(\widetilde{\boldsymbol{w}};\mathcal{D})=\sum_{\mu=1}^{% P}\Theta(-y^{\mu}\Delta^{\mu}(\boldsymbol{w}))over~ start_ARG caligraphic_L end_ARG ( over~ start_ARG bold_italic_w end_ARG ; caligraphic_D ) = ∑ start_POSTSUBSCRIPT italic_μ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT roman_Θ ( - italic_y start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT roman_Δ start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT ( bold_italic_w ) )) on the geodesic path joining such solutions for several values of κ𝜅\kappaitalic_κ in the low α𝛼\alphaitalic_α regime, i.e. corresponding to a rather overparameterized network. It can be noticed that for κ=0𝜅0\kappa=0italic_κ = 0, i.e. for typical solutions of the learning task, as soon as one moves away from the endpoints the training error is strictly positive, meaning that dE(γ)dγ|γ=0>0evaluated-at𝑑𝐸𝛾𝑑𝛾𝛾00\left.\frac{dE(\gamma)}{d\gamma}\right|_{\gamma=0}>0divide start_ARG italic_d italic_E ( italic_γ ) end_ARG start_ARG italic_d italic_γ end_ARG | start_POSTSUBSCRIPT italic_γ = 0 end_POSTSUBSCRIPT > 0 and dE(γ)dγ|γ=1<0evaluated-at𝑑𝐸𝛾𝑑𝛾𝛾10\left.\frac{dE(\gamma)}{d\gamma}\right|_{\gamma=1}<0divide start_ARG italic_d italic_E ( italic_γ ) end_ARG start_ARG italic_d italic_γ end_ARG | start_POSTSUBSCRIPT italic_γ = 1 end_POSTSUBSCRIPT < 0. Increasing the value of κ𝜅\kappaitalic_κ there is a small neighborhood of the endpoints where the training error vanishes. Overall the whole curve of the training error monotonically decreases if one keeps increasing κ𝜅\kappaitalic_κ. For κ>κ𝜅superscript𝜅\kappa>\kappa^{\star}italic_κ > italic_κ start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT the two endpoints become linear mode connected. This has been called in Annesi et al. (2023) the “geodesically convex” component of the manifold of solutions, as any two solutions sampled within this region are linear mode connected. In the two panels of Figure 7 we also compare two activation functions, the ReLU and the sign activation function. It can be noticed that overall the barrier for a fixed κ𝜅\kappaitalic_κ is smaller in the ReLU activation case. This is consistent with was found in Baldassi et al. (2019), where it has been argued that the training error landscape corresponding to the ReLU activation possess wider and flatter minima with respect to other activation choices like the sign case.

In Figure 8 we consider the case in which the endpoints are sampled from the large β𝛽\betaitalic_β limit of the Boltzmann distributions corresponding to the loss (6), but with two different margin κ1subscript𝜅1\kappa_{1}italic_κ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and κ2subscript𝜅2\kappa_{2}italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. We consider the first endpoint to be a typical solution of the classification task, i.e. to have margin κ1=0subscript𝜅10\kappa_{1}=0italic_κ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0. Similarly as before, we plot the training error, i.e. ~(𝒘~;𝒟)=μ=1PΘ(yμΔμ(𝒘))~~𝒘𝒟superscriptsubscript𝜇1𝑃Θsuperscript𝑦𝜇superscriptΔ𝜇𝒘\widetilde{\mathcal{L}}(\widetilde{\boldsymbol{w}};\mathcal{D})=\sum_{\mu=1}^{% P}\Theta(-y^{\mu}\Delta^{\mu}(\boldsymbol{w}))over~ start_ARG caligraphic_L end_ARG ( over~ start_ARG bold_italic_w end_ARG ; caligraphic_D ) = ∑ start_POSTSUBSCRIPT italic_μ = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT roman_Θ ( - italic_y start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT roman_Δ start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT ( bold_italic_w ) ), on the geodesic path joining 𝒘1superscript𝒘1\boldsymbol{w}^{1}bold_italic_w start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT to 𝒘2subscript𝒘2\boldsymbol{w}_{2}bold_italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT for several values of κ2subscript𝜅2\kappa_{2}italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and for the same α𝛼\alphaitalic_α considered in Figure 7. It can be noticed that the maximum of the barrier is always closer to the less robust solution, and the whole curve monotonically decreases if one keeps increasing κ2subscript𝜅2\kappa_{2}italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. For κ2>κkrnsubscript𝜅2subscript𝜅krn\kappa_{2}>\kappa_{\text{krn}}italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > italic_κ start_POSTSUBSCRIPT krn end_POSTSUBSCRIPT the two solutions become eventually linear mode connected. This means that typical solutions despite not being linear mode connected, are connected by a piecewise path, passing through a solution having a rather large margin κ𝜅\kappaitalic_κ. It therefore exists a subset of solutions called kernel, that are geodesically connected to any other solution of the learning task111Indeed if κ2>κkrnsubscript𝜅2subscript𝜅krn\kappa_{2}>\kappa_{\text{krn}}italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > italic_κ start_POSTSUBSCRIPT krn end_POSTSUBSCRIPT, then not only this solution is linear mode connected to a typical solution with κ1=0subscript𝜅10\kappa_{1}=0italic_κ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0, but also to any other solution with margin κ1>0subscript𝜅10\kappa_{1}>0italic_κ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > 0 extracted from the Boltzmann measure induced by (6).. This implies that the space of solutions is star-shaped in the overparameterized regime. Similar plots hold for other activation functions; we report the case of the Erf activation function in the appendix. This conclusion is consistent with what was found in reference Annesi et al. (2023) for a non-convex but simpler linear model called the negative perceptron. Recently in Lin et al. (2024); Sonthalia et al. (2024), numerical evidence has been presented suggesting that the solutions space of deep networks possess a star-shaped geometry. We first provide theoretical support for this claim in the case of simple, overparameterized one-hidden-layer networks with general activation functions.

We refer to Appendix B.4.3 for a discussion of the training error and loss along the path connecting a typical solution of the cross entropy loss and the error counting loss, which shows that the low temperature configurations sampled from the cross-entropy loss are solutions located deep into the geodesically convex component of the manifold of solutions. Note that this is consistent with what has been observed numerically in Fig. 5. As we have numerically observed, however, this is not true in the underparametrized regime, as typical cross-entropy loss solutions become linear mode disconnected.

Refer to caption
Refer to caption
Figure 7: Training error along the geodesic path connecting 2 solutions sampled from the loss function r(x)Θ(κrx)subscript𝑟𝑥Θsubscript𝜅𝑟𝑥\ell_{r}(x)\equiv\Theta(\kappa_{r}-x)roman_ℓ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_x ) ≡ roman_Θ ( italic_κ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT - italic_x ), r=1,2𝑟12r=1,2italic_r = 1 , 2 with equal margin κ1=κ2κsubscript𝜅1subscript𝜅2𝜅\kappa_{1}=\kappa_{2}\equiv\kappaitalic_κ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≡ italic_κ, α=0.1𝛼0.1\alpha=0.1italic_α = 0.1 and with fixed norm Q=1𝑄1Q=1italic_Q = 1. The two panels refers to the choice of the activation function: ReLU activation (left) and sign activation (right panel). The barrier between typical solutions i.e. κ=0𝜅0\kappa=0italic_κ = 0 is strictly non-vanishing; increasing the margin on the sampled solutions the barrier decreases and eventually vanishes for large enough κ𝜅\kappaitalic_κ, similarly to the finding of Annesi et al. (2023).
Refer to caption
Refer to caption
Figure 8: Case where r(x)Θ(κrx)subscript𝑟𝑥Θsubscript𝜅𝑟𝑥\ell_{r}(x)\equiv\Theta(\kappa_{r}-x)roman_ℓ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_x ) ≡ roman_Θ ( italic_κ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT - italic_x ), r=1,2𝑟12r=1,2italic_r = 1 , 2 and ~(x)=Θ(x)~𝑥Θ𝑥\widetilde{\ell}(x)=\Theta(-x)over~ start_ARG roman_ℓ end_ARG ( italic_x ) = roman_Θ ( - italic_x ), with α=0.1𝛼0.1\alpha=0.1italic_α = 0.1 and fixed norm Q=1𝑄1Q=1italic_Q = 1. We here fixed the first endpoint (that is located at γ=1𝛾1\gamma=1italic_γ = 1) to have a margin κ1=0subscript𝜅10\kappa_{1}=0italic_κ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0. The second endpoint (γ=0𝛾0\gamma=0italic_γ = 0) has a variable value of the margin κ2subscript𝜅2\kappa_{2}italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (from smaller to larger margin, curves from top to bottom). The left panel corresponds to the ReLU activation, the right one to the sign function. Increasing the robustness κ2subscript𝜅2\kappa_{2}italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT the training error barrier on the geodesic monotonically decreases. For large enough κ2subscript𝜅2\kappa_{2}italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT the solutions become geodesically connected, as observed in Annesi et al. (2023) for the negative perceptron model. The solution space is therefore star-shaped, namely there exists a set o solutions (those ones with a very large margin), that are geodesically connected to all other solutions with lower margin.

III.3 The geometrical structure can be robust with respect to highly correlated data

Refer to caption

Figure 9: a) Samples of the images used in the correlated dataset. b) The energy profile along the lines connecting the centers and the GD solutions, between centers and the final points of double ratchet simulations and between centers (upper panel). The black solid line marks the mean energy (and standard deviation) at the simulation temperature TL=8.2103subscript𝑇𝐿8.2superscript103T_{L}=8.2\cdot 10^{-3}italic_T start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT = 8.2 ⋅ 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT. The corresponding similarity distribution ρ(q)𝜌𝑞\rho(q)italic_ρ ( italic_q ) between centers and solutions (blue bars, lower panel), between centers and double ratchet solutions (orange) and among centers (green). c) The potential energy U𝑈Uitalic_U as a function of the similarity q𝑞qitalic_q for a double–ratchet simulation connecting the center to the GD solution at T=8.2103=TL𝑇8.2superscript103subscript𝑇𝐿T=8.2\cdot 10^{-3}=T_{L}italic_T = 8.2 ⋅ 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT = italic_T start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT (blue curve) and center to center at T=2.5103<TL𝑇2.5superscript103subscript𝑇𝐿T=2.5\cdot 10^{-3}<T_{L}italic_T = 2.5 ⋅ 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT < italic_T start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT (green curve). The blue line and shaded area represent respectively the mean energy and standard deviation. d) A sketch of the manifold sampled by the system in the case of correlated data.

To further assess the applicability of our techniques, we repeated the numerical study on the same architecture, utilizing correlated data for both the training phase and the exploration of the weight space via ratchet and coupled replica simulations. Again, we treated a case of binary classification, where the dataset is composed of 32x32 images of cats and dogs from the CIFAR10 repository, each of which labeled respectively as +11+1+ 1 and 11-1- 1. Furthermore, each component of an input vector 𝒙μsuperscript𝒙𝜇\boldsymbol{x}^{\mu}bold_italic_x start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT, representing a given image in the training dataset, has been scaled so that xiμ[1,1]superscriptsubscript𝑥𝑖𝜇11x_{i}^{\mu}\in[-1,1]italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT ∈ [ - 1 , 1 ] i,μfor-all𝑖𝜇\forall i,\mu∀ italic_i , italic_μ (cf. Fig. 9a).

Due to the nature of the task, we again adopted the binary cross–entropy function with a L2superscript𝐿2L^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT regularization term as the potential energy of the system (see Eq. (11)), with the Lagrange multiplier λ=2P108𝜆2𝑃superscript108\lambda=2P\cdot 10^{-8}italic_λ = 2 italic_P ⋅ 10 start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT. On the contrary, we slightly modified the networks parameters, because of the different size of the input vectors 𝒙μsuperscript𝒙𝜇\boldsymbol{x}^{\mu}bold_italic_x start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT. In this case, N/K=K=32𝑁𝐾𝐾32N/K=K=32italic_N / italic_K = italic_K = 32, so that the i𝑖iitalic_i-th neuron in the hidden layer takes as input the i𝑖iitalic_i-th row of each input image. Once again, the value of α=P/N𝛼𝑃𝑁\alpha=P/Nitalic_α = italic_P / italic_N is chosen to be just below the threshold where full–batch GD can no longer find weights with zero training error. In this case, that is α=0.8𝛼0.8\alpha=0.8italic_α = 0.8 (or equivalently P820𝑃820P\approx 820italic_P ≈ 820).

Similarly to the case of random inputs, the energy of the sampled state decreases as we move from the GD solutions to the center (upper panel in Fig. 9b). An important difference is that now the distribution of similarities between the center and the GD solutions partially overlaps with the similarity between the center and the points 𝒔superscript𝒔\boldsymbol{s}^{\star}bold_italic_s start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT found by the double ratchet to connect the GD solutions. In turn, this distribution partially overlaps with that of similarity between the different central points (lower panel in Fig. Fig. 9b). This suggests that the spiky manifold has a larger center for correlated data.

Moreover, the energy profile seems to display higher peaks and to be more rugged than the random label case, as apparent from the shape of the energy along the paths generated by the double ratchet (Fig. 9c) and from the irregular relation between the height of the peaks found along the linear paths (compare the upper panel of Fig. 9b with that of Fig. 4b).

The central points are still connected by low–energy paths (Fig. 9c), even though the mean similarity between them is lower than in the random–label case (q0.84±0.06delimited-⟨⟩𝑞plus-or-minus0.840.06\langle q\rangle\approx 0.84\pm 0.06⟨ italic_q ⟩ ≈ 0.84 ± 0.06).

Summing up, the low–energy states of a tree committee machine trained with correlated data still have a spiky shape, but its energy profile is more rugged and irregular than the random–label case. Moreover, the center is more bulky (see sketch in Fig. 9d).

IV Discussion and Conclusions

In this work, we analyzed the loss landscape of a simple one-hidden-layer artificial neural network with a tree-like structure in both the underparameterized and overparameterized regimes. We employed various numerical techniques, including (Hamiltonian) Monte Carlo methods and biased dynamics that were originally developed in statistical mechanics and have been used to identify native-like conformations in protein folding molecular dynamics simulations. This approach allowed us to sample the weight space manifold at different loss levels and investigate low-energy paths connecting distinct weights.

Close to the interpolation threshold, the numerical exploration of the weight space by using HMC starting from GD solutions, identified two main regimes as a function of temperature. At low temperatures, the system is in a frozen state having zero training error and is unable to move sufficiently far away from initialization (the inter-state overlap distribution is peaked near 1), despite the overlap between HMC trajectories starting from different GD solutions (intra-state overlap) is strictly less then 1. The linear interpolation between GD solutions also shows a loss barrier. Despite this, GD solutions can be connected by a low energy path; those paths are tortuous and difficult to find by an unbiased Monte Carlo algorithm. We identified them by using a double–ratchet hybrid Monte Carlo algorithm, which penalizes moves that cause the two gradient descent solutions to drift apart. These results suggest that the manifold of low-energy weights has a spiky topology, with gradient descent solutions located along its protruding rays. We have also shown that the center of the manifold solutions has also has a complex pattern of valleys and barriers.

At intermediate temperatures, the training error is small but not zero. Differently from the low temperature regime, the HMC dynamics is ergodic, since there is no difference between inter- and intra-state overlaps. This means that the shape of the populated state is particularly symmetric and corresponds, in the language of replica calculations, to a replica–symmetric solution. The energy always displays a maximum at the center of the straight lines between states populated at intermediate temperatures.

The symmetric, hollow structure of states at intermediate temperature is similar to that displayed by the system in the overparametrized regime, suggesting that the two manifolds are quite similar. The main difference between the overparametrized regime and that at the interpolation threshold is at low temperature; while in the latter case we have the spiky shape of the manifold discussed above, in the former they are more spherical, located at the center of the manifold.

In the overparametrized regime we have also resorted to replica computation to study both the loss and the training error landscape on the linear interpolation between two weights sampled with different Boltzmann probability distributions. Our work shows that the training error manifold is star-shaped: it exists a subset of robust, solutions having a large margin κ𝜅\kappaitalic_κ that are linear mode connected to solutions having lower (even zero) margin, similarly to what was found in Annesi et al. (2023). Typical low temperature weights extracted from the Boltzmann measure equipped with the cross entropy loss function tend to focus on the inner core of the star-shaped manifold, that we called geodesically convex component, as any two solutions in this region are linear mode connected. This result also agrees with numerical simulations. Our theory also reproduces the non-monotonic behavior of the energy along the linear interpolation between Boltzmann samples at larger temperatures.

The use of a realistic training dataset, displaying correlated data, instead of random data, does not change substantially the properties of the space of weights at low temperature. The center of the spiky structure becomes more bulky and the barriers more heterogeneous, but the overall geometry does not change.

References

Appendix A Equilibrium measure

We report here the outcome of the equilibrium calculation, through the evaluation of the free entropy

ϕ=limN1N𝔼𝒟lnZ(β;𝒟)italic-ϕsubscript𝑁1𝑁subscript𝔼𝒟𝑍𝛽𝒟\phi=\lim_{N\to\infty}\frac{1}{N}\mathbb{E}_{\mathcal{D}}\ln Z(\beta;\mathcal{% D})italic_ϕ = roman_lim start_POSTSUBSCRIPT italic_N → ∞ end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_N end_ARG blackboard_E start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT roman_ln italic_Z ( italic_β ; caligraphic_D ) (29)

where is the partition function defined in equation (8) with the Gaussian prior (10); 𝔼𝒟subscript𝔼𝒟\mathbb{E}_{\mathcal{D}}blackboard_E start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT refers to the average over the random dataset 𝒟𝒟\mathcal{D}caligraphic_D. This average can be performed by using the replica trick Mézard et al. (1987) and the saddle point method in the large N𝑁Nitalic_N limit. Here we report the final result, which is a slight variation of the one reported in Baldassi et al. (2019)

ϕ=limNlimn01nNln𝔼𝒟Zn=extrq,Q[𝒢S(q,Q)+α𝒢E(q,Q)]italic-ϕsubscript𝑁subscript𝑛01𝑛𝑁subscript𝔼𝒟superscript𝑍𝑛subscriptextr𝑞𝑄delimited-[]subscript𝒢𝑆𝑞𝑄𝛼subscript𝒢𝐸𝑞𝑄\phi=\lim_{N\to\infty}\lim_{n\to 0}\frac{1}{nN}\ln\mathbb{E}_{\mathcal{D}}Z^{n% }=\mathrm{extr}_{q,Q}\left[\mathcal{G}_{S}(q,Q)+\alpha\mathcal{G}_{E}(q,Q)\right]italic_ϕ = roman_lim start_POSTSUBSCRIPT italic_N → ∞ end_POSTSUBSCRIPT roman_lim start_POSTSUBSCRIPT italic_n → 0 end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_n italic_N end_ARG roman_ln blackboard_E start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT italic_Z start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT = roman_extr start_POSTSUBSCRIPT italic_q , italic_Q end_POSTSUBSCRIPT [ caligraphic_G start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( italic_q , italic_Q ) + italic_α caligraphic_G start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ( italic_q , italic_Q ) ] (30)

where we have defined the entropic and the energetic terms respectively as

𝒢S(q,Q)=Q2(Qq)βλ2Q+12ln(2π(Qq))𝒢E(q,Q)=l=1KDxllnl=1KDλleβ(1Klclg(qxl+Qqλl))subscript𝒢𝑆𝑞𝑄𝑄2𝑄𝑞𝛽𝜆2𝑄122𝜋𝑄𝑞subscript𝒢𝐸𝑞𝑄superscriptsubscriptproduct𝑙1𝐾𝐷subscript𝑥𝑙superscriptsubscriptproduct𝑙1𝐾𝐷subscript𝜆𝑙superscript𝑒𝛽1𝐾subscript𝑙subscript𝑐𝑙𝑔𝑞subscript𝑥𝑙𝑄𝑞subscript𝜆𝑙\begin{split}\mathcal{G}_{S}(q,Q)&=\frac{Q}{2(Q-q)}-\frac{\beta\lambda}{2}Q+% \frac{1}{2}\ln(2\pi(Q-q))\\ \mathcal{G}_{E}(q,Q)&=\int\prod_{l=1}^{K}Dx_{l}\ln\int\prod_{l=1}^{K}D\lambda_% {l}\,e^{-\beta\ell\left(\frac{1}{\sqrt{K}}\sum_{l}c_{l}g\left(\sqrt{q}x_{l}+% \sqrt{Q-q}\lambda_{l}\right)\right)}\end{split}start_ROW start_CELL caligraphic_G start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( italic_q , italic_Q ) end_CELL start_CELL = divide start_ARG italic_Q end_ARG start_ARG 2 ( italic_Q - italic_q ) end_ARG - divide start_ARG italic_β italic_λ end_ARG start_ARG 2 end_ARG italic_Q + divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_ln ( 2 italic_π ( italic_Q - italic_q ) ) end_CELL end_ROW start_ROW start_CELL caligraphic_G start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ( italic_q , italic_Q ) end_CELL start_CELL = ∫ ∏ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_D italic_x start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT roman_ln ∫ ∏ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_D italic_λ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - italic_β roman_ℓ ( divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_g ( square-root start_ARG italic_q end_ARG italic_x start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT + square-root start_ARG italic_Q - italic_q end_ARG italic_λ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) ) end_POSTSUPERSCRIPT end_CELL end_ROW (31)

We remind that λ𝜆\lambdaitalic_λ is the Lagrange multiplier (or regularization in machine learning jargon) that fixes the square norm Q𝑄Qitalic_Q of the weights 𝒘𝒘\boldsymbol{w}bold_italic_w. q𝑞qitalic_q and Q𝑄Qitalic_Q can be obtained from the extremization of the right side of (30). q𝑞qitalic_q represent the typical overlap of two weights 𝒘asuperscript𝒘𝑎\boldsymbol{w}^{a}bold_italic_w start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT, 𝒘bsuperscript𝒘𝑏\boldsymbol{w}^{b}bold_italic_w start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT sampled from the Boltzmann distribution (7) (see also equation (28)); Q𝑄Qitalic_Q represent the typical squared norm of a sample of (7). In the large K𝐾Kitalic_K limit (but having K/N0𝐾𝑁0K/N\to 0italic_K / italic_N → 0), the energetic term can be simplified by using the central limit theorem as shown in Baldassi et al. (2019); Annesi et al. (2024)

𝒢E(q,Q)=Dz0lnDz1eβ(ΔQ(q)ΔQ(0)z0+ΔQ(Q)ΔQ(q)z1κ).subscript𝒢𝐸𝑞𝑄𝐷subscript𝑧0𝐷subscript𝑧1superscript𝑒𝛽subscriptΔ𝑄𝑞subscriptΔ𝑄0subscript𝑧0subscriptΔ𝑄𝑄subscriptΔ𝑄𝑞subscript𝑧1𝜅\begin{split}\mathcal{G}_{E}(q,Q)&=\int Dz_{0}\ln\int Dz_{1}\,e^{-\beta\ell% \left(\sqrt{\Delta_{Q}(q)-\Delta_{Q}(0)}z_{0}+\sqrt{\Delta_{Q}(Q)-\Delta_{Q}(q% )}z_{1}-\kappa\right)}\,.\end{split}start_ROW start_CELL caligraphic_G start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ( italic_q , italic_Q ) end_CELL start_CELL = ∫ italic_D italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT roman_ln ∫ italic_D italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - italic_β roman_ℓ ( square-root start_ARG roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_q ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( 0 ) end_ARG italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + square-root start_ARG roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_Q ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_q ) end_ARG italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_κ ) end_POSTSUPERSCRIPT . end_CELL end_ROW (32)

where

ΔQ(q)Dx[Dyφ(qx+Qqy)]2subscriptΔ𝑄𝑞𝐷𝑥superscriptdelimited-[]𝐷𝑦𝜑𝑞𝑥𝑄𝑞𝑦2\Delta_{Q}(q)\equiv\int Dx\left[\int Dy\,\varphi\left(\sqrt{q}x+\sqrt{Q-q}y% \right)\right]^{2}roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_q ) ≡ ∫ italic_D italic_x [ ∫ italic_D italic_y italic_φ ( square-root start_ARG italic_q end_ARG italic_x + square-root start_ARG italic_Q - italic_q end_ARG italic_y ) ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (33)

is an effective order parameter Barkai et al. (1990); Engel et al. (1992) whose expression depends on the choice of the activation function φ𝜑\varphiitalic_φ. This kernel has also the same expression of the Neural Network Gaussian Process (NNGP) kernel that appears in neural networks learning a finite number of examples in the large width limit Neal (1996).

A.1 Large β𝛽\betaitalic_β limit

In the large β𝛽\betaitalic_β limit we have the following scaling

q=Qδqβ𝑞𝑄𝛿𝑞𝛽q=Q-\frac{\delta q}{\beta}italic_q = italic_Q - divide start_ARG italic_δ italic_q end_ARG start_ARG italic_β end_ARG (34)

This induces the following scaling on the effective order parameter difference

ΔQ(Q)ΔQ(q)ΔQ(Q)δqβsimilar-to-or-equalssubscriptΔ𝑄𝑄subscriptΔ𝑄𝑞subscriptsuperscriptΔ𝑄𝑄𝛿𝑞𝛽\Delta_{Q}(Q)-\Delta_{Q}(q)\simeq\Delta^{\prime}_{Q}(Q)\frac{\delta q}{\beta}roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_Q ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_q ) ≃ roman_Δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_Q ) divide start_ARG italic_δ italic_q end_ARG start_ARG italic_β end_ARG (35)

where

ΔQ(q)ΔQ(q)q=Dx[Dyφ(qx+Qqy)]2.superscriptsubscriptΔ𝑄𝑞subscriptΔ𝑄𝑞𝑞𝐷𝑥superscriptdelimited-[]𝐷𝑦superscript𝜑𝑞𝑥𝑄𝑞𝑦2\Delta_{Q}^{\prime}(q)\equiv\frac{\partial\Delta_{Q}(q)}{\partial q}=\int Dx% \left[\int Dy\,\varphi^{\prime}\left(\sqrt{q}x+\sqrt{Q-q}y\right)\right]^{2}\,.roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_q ) ≡ divide start_ARG ∂ roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_q ) end_ARG start_ARG ∂ italic_q end_ARG = ∫ italic_D italic_x [ ∫ italic_D italic_y italic_φ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( square-root start_ARG italic_q end_ARG italic_x + square-root start_ARG italic_Q - italic_q end_ARG italic_y ) ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (36)

We therefore have that the free energy of the system is

f𝑓\displaystyle-f- italic_f limβϕ=extrδq,Q[𝒢S(δq,Q)+α𝒢E(δq,Q)]absentsubscript𝛽italic-ϕsubscriptextr𝛿𝑞𝑄delimited-[]subscript𝒢𝑆𝛿𝑞𝑄𝛼subscript𝒢𝐸𝛿𝑞𝑄\displaystyle\equiv\lim_{\beta\to\infty}\phi=\mathrm{extr}_{\delta q,Q}\left[% \mathcal{G}_{S}(\delta q,Q)+\alpha\mathcal{G}_{E}(\delta q,Q)\right]≡ roman_lim start_POSTSUBSCRIPT italic_β → ∞ end_POSTSUBSCRIPT italic_ϕ = roman_extr start_POSTSUBSCRIPT italic_δ italic_q , italic_Q end_POSTSUBSCRIPT [ caligraphic_G start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( italic_δ italic_q , italic_Q ) + italic_α caligraphic_G start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ( italic_δ italic_q , italic_Q ) ] (37a)
𝒢S(δq,Q)subscript𝒢𝑆𝛿𝑞𝑄\displaystyle\mathcal{G}_{S}(\delta q,Q)caligraphic_G start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( italic_δ italic_q , italic_Q ) =Q2δqλ2Qabsent𝑄2𝛿𝑞𝜆2𝑄\displaystyle=\frac{Q}{2\delta q}-\frac{\lambda}{2}Q= divide start_ARG italic_Q end_ARG start_ARG 2 italic_δ italic_q end_ARG - divide start_ARG italic_λ end_ARG start_ARG 2 end_ARG italic_Q (37b)
𝒢E(δq,Q)subscript𝒢𝐸𝛿𝑞𝑄\displaystyle\mathcal{G}_{E}(\delta q,Q)caligraphic_G start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ( italic_δ italic_q , italic_Q ) =Dz0z(z0),absent𝐷subscript𝑧0subscript𝑧subscript𝑧0\displaystyle=\int Dz_{0}\,z_{\star}(z_{0})\,,= ∫ italic_D italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , (37c)

where we have defined the function z(z0)subscript𝑧subscript𝑧0z_{\star}(z_{0})italic_z start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) as

z(z0)=argmaxz1[z1221(ΔQ(Q)ΔQ(0)z0+ΔQ(Q)δq1z1)]subscript𝑧subscript𝑧0subscriptargmaxsubscript𝑧1superscriptsubscript𝑧122subscript1subscriptΔ𝑄𝑄subscriptΔ𝑄0subscript𝑧0superscriptsubscriptΔ𝑄𝑄𝛿subscript𝑞1subscript𝑧1\begin{split}z_{\star}(z_{0})&=\operatorname*{argmax}_{z_{1}}\left[-\frac{z_{1% }^{2}}{2}-\ell_{1}\left(\sqrt{\Delta_{Q}(Q)-\Delta_{Q}(0)}\,z_{0}+\sqrt{\Delta% _{Q}^{\prime}(Q)\delta q_{1}}\,z_{1}\right)\right]\end{split}start_ROW start_CELL italic_z start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_CELL start_CELL = roman_argmax start_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ - divide start_ARG italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG - roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( square-root start_ARG roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_Q ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( 0 ) end_ARG italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + square-root start_ARG roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_Q ) italic_δ italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ] end_CELL end_ROW (38)

Appendix B Loss Landscape on the linear interpolation between weights

In this section we want to compute the average loss ~~\widetilde{\mathcal{L}}over~ start_ARG caligraphic_L end_ARG of 𝒘~(γ)~𝒘𝛾\widetilde{\boldsymbol{w}}(\gamma)over~ start_ARG bold_italic_w end_ARG ( italic_γ ) as defined in (27). 𝒘~(γ)~𝒘𝛾\widetilde{\boldsymbol{w}}(\gamma)over~ start_ARG bold_italic_w end_ARG ( italic_γ ) is the interpolation of 𝒘1superscript𝒘1\boldsymbol{w}^{1}bold_italic_w start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT and 𝒘2superscript𝒘2\boldsymbol{w}^{2}bold_italic_w start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, see equation (24). As stated in the main text, both weights 𝒘1superscript𝒘1\boldsymbol{w}^{1}bold_italic_w start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT and 𝒘2superscript𝒘2\boldsymbol{w}^{2}bold_italic_w start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and their interpolation 𝒘~(γ)~𝒘𝛾\widetilde{\boldsymbol{w}}(\gamma)over~ start_ARG bold_italic_w end_ARG ( italic_γ ) are considered to have the same squared norm Q𝑄Qitalic_Q. Equation (27) can be written also in terms of the corresponding loss per pattern ~~\widetilde{\ell}over~ start_ARG roman_ℓ end_ARG

E(γ)=𝔼𝒟𝑑𝒘1𝑑𝒘2p(𝒘1)p(𝒘2)eβ1(𝒘1;𝒟)β2(𝒘2;𝒟)~(Δμ(𝒘~(γ)))Z𝒟1Z𝒟2𝐸𝛾subscript𝔼𝒟differential-dsuperscript𝒘1differential-dsuperscript𝒘2𝑝superscript𝒘1𝑝superscript𝒘2superscript𝑒𝛽subscript1subscript𝒘1𝒟𝛽subscript2subscript𝒘2𝒟~superscriptΔ𝜇~𝒘𝛾superscriptsubscript𝑍𝒟1superscriptsubscript𝑍𝒟2\begin{split}E(\gamma)&=\mathbb{E}_{\mathcal{D}}\frac{\int d\boldsymbol{w}^{1}% d\boldsymbol{w}^{2}\,p(\boldsymbol{w}^{1})p(\boldsymbol{w}^{2})\,e^{-\beta% \mathcal{L}_{1}(\boldsymbol{w}_{1};\mathcal{D})-\beta\mathcal{L}_{2}(% \boldsymbol{w}_{2};\mathcal{D})}\,\tilde{\ell}\left(\Delta^{\mu}(\widetilde{% \boldsymbol{w}}(\gamma))\right)}{Z_{\mathcal{D}}^{1}Z_{\mathcal{D}}^{2}}\end{split}start_ROW start_CELL italic_E ( italic_γ ) end_CELL start_CELL = blackboard_E start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT divide start_ARG ∫ italic_d bold_italic_w start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT italic_d bold_italic_w start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_p ( bold_italic_w start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ) italic_p ( bold_italic_w start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_e start_POSTSUPERSCRIPT - italic_β caligraphic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; caligraphic_D ) - italic_β caligraphic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ; caligraphic_D ) end_POSTSUPERSCRIPT over~ start_ARG roman_ℓ end_ARG ( roman_Δ start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT ( over~ start_ARG bold_italic_w end_ARG ( italic_γ ) ) ) end_ARG start_ARG italic_Z start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT italic_Z start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_CELL end_ROW (39)

we have here focused on a particular pattern μ𝜇\muitalic_μ, since in the thermodynamic limit all input patterns will give the same contribution on average.

B.1 Replica approach

The computation proceeds as usual introducing replicas via the identity (Z𝒟r)1=limn0(Z𝒟r)n1superscriptsuperscriptsubscript𝑍𝒟𝑟1subscript𝑛0superscriptsuperscriptsubscript𝑍𝒟𝑟𝑛1(Z_{\mathcal{D}}^{r})^{-1}=\lim\limits_{n\to 0}(Z_{\mathcal{D}}^{r})^{n-1}( italic_Z start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = roman_lim start_POSTSUBSCRIPT italic_n → 0 end_POSTSUBSCRIPT ( italic_Z start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT with r=1,2𝑟12r=1,2italic_r = 1 , 2. We have (denoting with r,s=1,2formulae-sequence𝑟𝑠12r,s=1,2italic_r , italic_s = 1 , 2 the index that runs over the real replicas 𝒘1superscript𝒘1\boldsymbol{w}^{1}bold_italic_w start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT and 𝒘2superscript𝒘2\boldsymbol{w}^{2}bold_italic_w start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT)

E(γ)=arlμdλlrμadλ^lrμa2πeiλlrμaλ^lrμaard𝒘raraμeβr(1Klclφ(λlrμa))~[1Klclφ(γλl1μ1+(1γ)λl2μ1cγ)]×eλ12lia(wliar=1)2λ22lia(wliar=2)2liμ𝔼ξliμeiξliμNarwliarλ^lrμa=ar,bs,ldqrslabdq^rslab2πeNar,bslqrslabq^rslabλ12laq11laaλ22rlaq22laa+NGS+(Nα1)GE×arldλlradλ^lra2πeiλlraλ^lraraeβr(1Klclφ(λlrμa))~[1Klclφ(γλl1μ1+(1γ)λl2μ1cγ)]e12ar,bsqrsabλ^raλ^sb𝐸𝛾subscriptproduct𝑎𝑟𝑙𝜇𝑑superscriptsubscript𝜆𝑙𝑟𝜇𝑎𝑑superscriptsubscript^𝜆𝑙𝑟𝜇𝑎2𝜋superscript𝑒𝑖superscriptsubscript𝜆𝑙𝑟𝜇𝑎superscriptsubscript^𝜆𝑙𝑟𝜇𝑎subscriptproduct𝑎𝑟𝑑superscript𝒘𝑟𝑎subscriptproduct𝑟𝑎𝜇superscript𝑒𝛽subscript𝑟1𝐾subscript𝑙subscript𝑐𝑙𝜑subscriptsuperscript𝜆𝜇𝑎𝑙𝑟~delimited-[]1𝐾subscript𝑙subscript𝑐𝑙𝜑𝛾superscriptsubscript𝜆𝑙1𝜇11𝛾superscriptsubscript𝜆𝑙2𝜇1subscript𝑐𝛾superscript𝑒subscript𝜆12subscript𝑙𝑖𝑎superscriptsuperscriptsubscript𝑤𝑙𝑖𝑎𝑟12subscript𝜆22subscript𝑙𝑖𝑎superscriptsuperscriptsubscript𝑤𝑙𝑖𝑎𝑟22subscriptproduct𝑙𝑖𝜇subscript𝔼superscriptsubscript𝜉𝑙𝑖𝜇superscript𝑒𝑖superscriptsubscript𝜉𝑙𝑖𝜇𝑁subscript𝑎𝑟superscriptsubscript𝑤𝑙𝑖𝑎𝑟superscriptsubscript^𝜆𝑙𝑟𝜇𝑎subscriptproduct𝑎𝑟𝑏𝑠𝑙𝑑superscriptsubscript𝑞𝑟𝑠𝑙𝑎𝑏𝑑superscriptsubscript^𝑞𝑟𝑠𝑙𝑎𝑏2𝜋superscript𝑒𝑁subscript𝑎𝑟𝑏𝑠𝑙superscriptsubscript𝑞𝑟𝑠𝑙𝑎𝑏superscriptsubscript^𝑞𝑟𝑠𝑙𝑎𝑏subscript𝜆12subscript𝑙𝑎subscriptsuperscript𝑞𝑎𝑎11𝑙subscript𝜆22subscript𝑟𝑙𝑎subscriptsuperscript𝑞𝑎𝑎22𝑙𝑁subscript𝐺𝑆𝑁𝛼1subscript𝐺𝐸subscriptproduct𝑎𝑟𝑙𝑑superscriptsubscript𝜆𝑙𝑟𝑎𝑑superscriptsubscript^𝜆𝑙𝑟𝑎2𝜋superscript𝑒𝑖superscriptsubscript𝜆𝑙𝑟𝑎superscriptsubscript^𝜆𝑙𝑟𝑎subscriptproduct𝑟𝑎superscript𝑒𝛽subscript𝑟1𝐾subscript𝑙subscript𝑐𝑙𝜑subscriptsuperscript𝜆𝜇𝑎𝑙𝑟~delimited-[]1𝐾subscript𝑙subscript𝑐𝑙𝜑𝛾superscriptsubscript𝜆𝑙1𝜇11𝛾superscriptsubscript𝜆𝑙2𝜇1subscript𝑐𝛾superscript𝑒12subscript𝑎𝑟𝑏𝑠subscriptsuperscript𝑞𝑎𝑏𝑟𝑠superscriptsubscript^𝜆𝑟𝑎superscriptsubscript^𝜆𝑠𝑏\begin{split}E(\gamma)&=\int\prod_{arl\mu}\frac{d\lambda_{lr}^{\mu a}d\hat{% \lambda}_{lr}^{\mu a}}{2\pi}e^{i\lambda_{lr}^{\mu a}\hat{\lambda}_{lr}^{\mu a}% }\int\prod_{ar}d\boldsymbol{w}^{ra}\,\prod_{ra\mu}e^{-\beta\ell_{r}\left(\frac% {1}{\sqrt{K}}\sum_{l}c_{l}\varphi\left(\lambda^{\mu a}_{lr}\right)\right)}\,% \tilde{\ell}\left[\frac{1}{\sqrt{K}}\sum_{l}c_{l}\,\varphi\left(\frac{\gamma% \lambda_{l1}^{\mu 1}+(1-\gamma)\lambda_{l2}^{\mu 1}}{c_{\gamma}}\right)\right]% \\ &\times e^{-\frac{\lambda_{1}}{2}\sum_{lia}(w_{li}^{ar=1})^{2}-\frac{\lambda_{% 2}}{2}\sum_{lia}(w_{li}^{ar=2})^{2}}\prod_{li\mu}\mathbb{E}_{\xi_{li}^{\mu}}e^% {-i\frac{\xi_{li}^{\mu}}{\sqrt{N}}\sum_{ar}w_{li}^{ar}\hat{\lambda}_{lr}^{\mu a% }}\\ &=\int\prod_{ar,bs,l}\frac{dq_{rsl}^{ab}d\hat{q}_{rsl}^{ab}}{2\pi}e^{-N\sum_{% ar,bsl}q_{rsl}^{ab}\hat{q}_{rsl}^{ab}-\frac{\lambda_{1}}{2}\sum_{la}q^{aa}_{11% l}-\frac{\lambda_{2}}{2}\sum_{rla}q^{aa}_{22l}+NG_{S}+(N\alpha-1)G_{E}}\\ &\times\int\prod_{arl}\frac{d\lambda_{lr}^{a}d\hat{\lambda}_{lr}^{a}}{2\pi}\,e% ^{i\lambda_{lr}^{a}\hat{\lambda}_{lr}^{a}}\prod_{ra}e^{-\beta\ell_{r}\left(% \frac{1}{\sqrt{K}}\sum_{l}c_{l}\varphi\left(\lambda^{\mu a}_{lr}\right)\right)% }\,\tilde{\ell}\left[\frac{1}{\sqrt{K}}\sum_{l}c_{l}\,\varphi\left(\frac{% \gamma\lambda_{l1}^{\mu 1}+(1-\gamma)\lambda_{l2}^{\mu 1}}{c_{\gamma}}\right)% \right]e^{-\frac{1}{2}\sum_{ar,bs}q^{ab}_{rs}\hat{\lambda}_{r}^{a}\hat{\lambda% }_{s}^{b}}\end{split}start_ROW start_CELL italic_E ( italic_γ ) end_CELL start_CELL = ∫ ∏ start_POSTSUBSCRIPT italic_a italic_r italic_l italic_μ end_POSTSUBSCRIPT divide start_ARG italic_d italic_λ start_POSTSUBSCRIPT italic_l italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_μ italic_a end_POSTSUPERSCRIPT italic_d over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_l italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_μ italic_a end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_π end_ARG italic_e start_POSTSUPERSCRIPT italic_i italic_λ start_POSTSUBSCRIPT italic_l italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_μ italic_a end_POSTSUPERSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_l italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_μ italic_a end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ∫ ∏ start_POSTSUBSCRIPT italic_a italic_r end_POSTSUBSCRIPT italic_d bold_italic_w start_POSTSUPERSCRIPT italic_r italic_a end_POSTSUPERSCRIPT ∏ start_POSTSUBSCRIPT italic_r italic_a italic_μ end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - italic_β roman_ℓ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_φ ( italic_λ start_POSTSUPERSCRIPT italic_μ italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l italic_r end_POSTSUBSCRIPT ) ) end_POSTSUPERSCRIPT over~ start_ARG roman_ℓ end_ARG [ divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_φ ( divide start_ARG italic_γ italic_λ start_POSTSUBSCRIPT italic_l 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_μ 1 end_POSTSUPERSCRIPT + ( 1 - italic_γ ) italic_λ start_POSTSUBSCRIPT italic_l 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_μ 1 end_POSTSUPERSCRIPT end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG ) ] end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL × italic_e start_POSTSUPERSCRIPT - divide start_ARG italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_l italic_i italic_a end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_l italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_r = 1 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_l italic_i italic_a end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_l italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_r = 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ∏ start_POSTSUBSCRIPT italic_l italic_i italic_μ end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_ξ start_POSTSUBSCRIPT italic_l italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - italic_i divide start_ARG italic_ξ start_POSTSUBSCRIPT italic_l italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_μ end_POSTSUPERSCRIPT end_ARG start_ARG square-root start_ARG italic_N end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_a italic_r end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_l italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_r end_POSTSUPERSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_l italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_μ italic_a end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = ∫ ∏ start_POSTSUBSCRIPT italic_a italic_r , italic_b italic_s , italic_l end_POSTSUBSCRIPT divide start_ARG italic_d italic_q start_POSTSUBSCRIPT italic_r italic_s italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_b end_POSTSUPERSCRIPT italic_d over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_r italic_s italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_b end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_π end_ARG italic_e start_POSTSUPERSCRIPT - italic_N ∑ start_POSTSUBSCRIPT italic_a italic_r , italic_b italic_s italic_l end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_r italic_s italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_b end_POSTSUPERSCRIPT over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_r italic_s italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_b end_POSTSUPERSCRIPT - divide start_ARG italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_l italic_a end_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_a italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 11 italic_l end_POSTSUBSCRIPT - divide start_ARG italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_r italic_l italic_a end_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_a italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 22 italic_l end_POSTSUBSCRIPT + italic_N italic_G start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT + ( italic_N italic_α - 1 ) italic_G start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL × ∫ ∏ start_POSTSUBSCRIPT italic_a italic_r italic_l end_POSTSUBSCRIPT divide start_ARG italic_d italic_λ start_POSTSUBSCRIPT italic_l italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT italic_d over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_l italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_π end_ARG italic_e start_POSTSUPERSCRIPT italic_i italic_λ start_POSTSUBSCRIPT italic_l italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_l italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ∏ start_POSTSUBSCRIPT italic_r italic_a end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - italic_β roman_ℓ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_φ ( italic_λ start_POSTSUPERSCRIPT italic_μ italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l italic_r end_POSTSUBSCRIPT ) ) end_POSTSUPERSCRIPT over~ start_ARG roman_ℓ end_ARG [ divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_φ ( divide start_ARG italic_γ italic_λ start_POSTSUBSCRIPT italic_l 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_μ 1 end_POSTSUPERSCRIPT + ( 1 - italic_γ ) italic_λ start_POSTSUBSCRIPT italic_l 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_μ 1 end_POSTSUPERSCRIPT end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG ) ] italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_a italic_r , italic_b italic_s end_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_a italic_b end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_CELL end_ROW (40)

where we remind that cγsubscript𝑐𝛾c_{\gamma}italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT is the the quantity that appears in equation (24). We have also introduced the usual entropic and energetic terms Baldassi et al. (2019) as

GSlnardwlraear,bslq^rslabwlrawlsbGElnarldλrladλ^rla2πeβrar(1Klclφ(λrla))eirlaλrlaλ^rla12ar,bslqrslabλ^rlaλ^slb.subscript𝐺𝑆subscriptproduct𝑎𝑟𝑑subscriptsuperscript𝑤𝑟𝑎𝑙superscript𝑒subscript𝑎𝑟𝑏𝑠𝑙subscriptsuperscript^𝑞𝑎𝑏𝑟𝑠𝑙superscriptsubscript𝑤𝑙𝑟𝑎superscriptsubscript𝑤𝑙𝑠𝑏subscript𝐺𝐸subscriptproduct𝑎𝑟𝑙𝑑superscriptsubscript𝜆𝑟𝑙𝑎𝑑superscriptsubscript^𝜆𝑟𝑙𝑎2𝜋superscript𝑒𝛽subscript𝑟𝑎subscript𝑟1𝐾subscript𝑙subscript𝑐𝑙𝜑subscriptsuperscript𝜆𝑎𝑟𝑙superscript𝑒𝑖subscript𝑟𝑙𝑎superscriptsubscript𝜆𝑟𝑙𝑎superscriptsubscript^𝜆𝑟𝑙𝑎12subscript𝑎𝑟𝑏𝑠𝑙subscriptsuperscript𝑞𝑎𝑏𝑟𝑠𝑙superscriptsubscript^𝜆𝑟𝑙𝑎superscriptsubscript^𝜆𝑠𝑙𝑏\begin{split}G_{S}&\equiv\ln\int\prod_{ar}dw^{ra}_{l}\,e^{\sum_{ar,bsl}\hat{q}% ^{ab}_{rsl}w_{l}^{ra}w_{l}^{sb}}\\ G_{E}&\equiv\ln\int\prod_{arl}\frac{d\lambda_{rl}^{a}d\hat{\lambda}_{rl}^{a}}{% 2\pi}\,e^{-\beta\sum_{ra}\ell_{r}\left(\frac{1}{\sqrt{K}}\sum_{l}c_{l}\,% \varphi\left(\lambda^{a}_{rl}\right)\right)}\,e^{i\sum_{rla}\lambda_{rl}^{a}% \hat{\lambda}_{rl}^{a}-\frac{1}{2}\sum_{ar,bsl}q^{ab}_{rsl}\hat{\lambda}_{rl}^% {a}\hat{\lambda}_{sl}^{b}}\,.\end{split}start_ROW start_CELL italic_G start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT end_CELL start_CELL ≡ roman_ln ∫ ∏ start_POSTSUBSCRIPT italic_a italic_r end_POSTSUBSCRIPT italic_d italic_w start_POSTSUPERSCRIPT italic_r italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_a italic_r , italic_b italic_s italic_l end_POSTSUBSCRIPT over^ start_ARG italic_q end_ARG start_POSTSUPERSCRIPT italic_a italic_b end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_r italic_s italic_l end_POSTSUBSCRIPT italic_w start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r italic_a end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s italic_b end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_G start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT end_CELL start_CELL ≡ roman_ln ∫ ∏ start_POSTSUBSCRIPT italic_a italic_r italic_l end_POSTSUBSCRIPT divide start_ARG italic_d italic_λ start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT italic_d over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_π end_ARG italic_e start_POSTSUPERSCRIPT - italic_β ∑ start_POSTSUBSCRIPT italic_r italic_a end_POSTSUBSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_φ ( italic_λ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT ) ) end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT italic_i ∑ start_POSTSUBSCRIPT italic_r italic_l italic_a end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_a italic_r , italic_b italic_s italic_l end_POSTSUBSCRIPT italic_q start_POSTSUPERSCRIPT italic_a italic_b end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_r italic_s italic_l end_POSTSUBSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_s italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT . end_CELL end_ROW (41)

B.2 Replica Symmetric ansatz

We impose the Replica Symmetric (RS) ansatz over order parameters

qrslaasuperscriptsubscript𝑞𝑟𝑠𝑙𝑎𝑎\displaystyle q_{rsl}^{aa}italic_q start_POSTSUBSCRIPT italic_r italic_s italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_a end_POSTSUPERSCRIPT Qδrs+(1δrs)pabsent𝑄subscript𝛿𝑟𝑠1subscript𝛿𝑟𝑠𝑝\displaystyle\equiv Q\delta_{rs}+(1-\delta_{rs})p≡ italic_Q italic_δ start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT + ( 1 - italic_δ start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT ) italic_p a[n],l𝑎delimited-[]𝑛for-all𝑙\displaystyle a\in[n]\,,\forall litalic_a ∈ [ italic_n ] , ∀ italic_l (42a)
qrslabsuperscriptsubscript𝑞𝑟𝑠𝑙𝑎𝑏\displaystyle q_{rsl}^{ab}italic_q start_POSTSUBSCRIPT italic_r italic_s italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_b end_POSTSUPERSCRIPT trs=qrδrs+(1δrs)pabsentsubscript𝑡𝑟𝑠subscript𝑞𝑟subscript𝛿𝑟𝑠1subscript𝛿𝑟𝑠𝑝\displaystyle\equiv t_{rs}=q_{r}\delta_{rs}+(1-\delta_{rs})p≡ italic_t start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT = italic_q start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT + ( 1 - italic_δ start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT ) italic_p ab,l𝑎𝑏for-all𝑙\displaystyle a\neq b\,,\forall litalic_a ≠ italic_b , ∀ italic_l (42b)

Notice that we called by q1subscript𝑞1q_{1}italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and q2subscript𝑞2q_{2}italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT the typical overlap between solutions extracted from the distribution of the endpoint γ=1𝛾1\gamma=1italic_γ = 1 and γ=0𝛾0\gamma=0italic_γ = 0 respectively. The overlap p𝑝pitalic_p represents the typical overlap between the two endpoints. A similar ansatz is imposed over the conjugated order parameters q^rslabsuperscriptsubscript^𝑞𝑟𝑠𝑙𝑎𝑏\hat{q}_{rsl}^{ab}over^ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_r italic_s italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_b end_POSTSUPERSCRIPT. However in the n0𝑛0n\to 0italic_n → 0 limit the conjugated order parameters will not appear explicitly in the expression of E(γ)𝐸𝛾E(\gamma)italic_E ( italic_γ ). We need to express decompose the term

12abrslqrslabλ^lraλ^lsb=12r(Qqr)al(λ^rla)212rstrsablλ^rlaλ^slb=12r(Qqr)al(λ^rla)212rl(as𝒯rsλ^sla)212subscript𝑎𝑏𝑟𝑠𝑙superscriptsubscript𝑞𝑟𝑠𝑙𝑎𝑏superscriptsubscript^𝜆𝑙𝑟𝑎superscriptsubscript^𝜆𝑙𝑠𝑏12subscript𝑟𝑄subscript𝑞𝑟subscript𝑎𝑙superscriptsuperscriptsubscript^𝜆𝑟𝑙𝑎212subscript𝑟𝑠subscript𝑡𝑟𝑠subscript𝑎𝑏𝑙superscriptsubscript^𝜆𝑟𝑙𝑎superscriptsubscript^𝜆𝑠𝑙𝑏12subscript𝑟𝑄subscript𝑞𝑟subscript𝑎𝑙superscriptsuperscriptsubscript^𝜆𝑟𝑙𝑎212subscript𝑟𝑙superscriptsubscript𝑎𝑠subscript𝒯𝑟𝑠subscriptsuperscript^𝜆𝑎𝑠𝑙2\begin{split}-\frac{1}{2}\sum_{abrsl}q_{rsl}^{ab}\hat{\lambda}_{lr}^{a}\hat{% \lambda}_{ls}^{b}&=-\frac{1}{2}\sum_{r}(Q-q_{r})\sum_{al}\left(\hat{\lambda}_{% rl}^{a}\right)^{2}-\frac{1}{2}\sum_{rs}t_{rs}\sum_{abl}\hat{\lambda}_{rl}^{a}% \hat{\lambda}_{sl}^{b}\\ &=-\frac{1}{2}\sum_{r}(Q-q_{r})\sum_{al}\left(\hat{\lambda}_{rl}^{a}\right)^{2% }-\frac{1}{2}\sum_{rl}\left(\sum_{as}\mathcal{T}_{rs}\hat{\lambda}^{a}_{sl}% \right)^{2}\end{split}start_ROW start_CELL - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_a italic_b italic_r italic_s italic_l end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_r italic_s italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a italic_b end_POSTSUPERSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_l italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_l italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT end_CELL start_CELL = - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_Q - italic_q start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) ∑ start_POSTSUBSCRIPT italic_a italic_l end_POSTSUBSCRIPT ( over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_a italic_b italic_l end_POSTSUBSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_s italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_Q - italic_q start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) ∑ start_POSTSUBSCRIPT italic_a italic_l end_POSTSUBSCRIPT ( over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_a italic_s end_POSTSUBSCRIPT caligraphic_T start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s italic_l end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW (43)

where 𝒯rssubscript𝒯𝑟𝑠\mathcal{T}_{rs}caligraphic_T start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT is (r,s)𝑟𝑠(r,s)( italic_r , italic_s ) the element of the square root of the matrix trssubscript𝑡𝑟𝑠t_{rs}italic_t start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT defined in (42b). The computation proceeds in a standard way by using a Hubbard-Stratonovich transformation and integrating over λ^rlasuperscriptsubscript^𝜆𝑟𝑙𝑎\hat{\lambda}_{rl}^{a}over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT. We get

E(γ)=arldλrladλ^rla2πeβr(1Klclφ(λlra))~[1Klclφ(γλl11+(1γ)λl21cγ)]×eiarlλrlaλ^rla12r(Qqr)al(λ^rla)212rl(as𝒯rsλ^sla)2=rlDxrlrldλrldλ^rl2πeiλ^rl(λrl+s𝒯rsxsl)12(Qqr)λ^rl2βrr(1Klclφ(λlr))~[1Klclφ(γλl1+(1γ)λl2cγ)]rldλldλ^l2πeβr(1Klclφ(λl))eilλ^l(λl+s𝒯srxrl)12r(Qqr)lλ^l2=rlDxrlrlDλrleβrr(1Klclφ(Qqrλlrs𝒯rsxsl))~[1Klclφ(rγr(Qqrλrls𝒯rsxslcγ))]rlDλleβr(1Klclφ(Qqrλls𝒯rsxsl))𝐸𝛾subscriptproduct𝑎𝑟𝑙𝑑superscriptsubscript𝜆𝑟𝑙𝑎𝑑superscriptsubscript^𝜆𝑟𝑙𝑎2𝜋superscript𝑒𝛽subscript𝑟1𝐾subscript𝑙subscript𝑐𝑙𝜑subscriptsuperscript𝜆𝑎𝑙𝑟~delimited-[]1𝐾subscript𝑙subscript𝑐𝑙𝜑𝛾superscriptsubscript𝜆𝑙111𝛾superscriptsubscript𝜆𝑙21subscript𝑐𝛾superscript𝑒𝑖subscript𝑎𝑟𝑙superscriptsubscript𝜆𝑟𝑙𝑎superscriptsubscript^𝜆𝑟𝑙𝑎12subscript𝑟𝑄subscript𝑞𝑟subscript𝑎𝑙superscriptsuperscriptsubscript^𝜆𝑟𝑙𝑎212subscript𝑟𝑙superscriptsubscript𝑎𝑠subscript𝒯𝑟𝑠subscriptsuperscript^𝜆𝑎𝑠𝑙2subscriptproduct𝑟𝑙𝐷subscript𝑥𝑟𝑙subscriptproduct𝑟𝑙𝑑subscript𝜆𝑟𝑙𝑑subscript^𝜆𝑟𝑙2𝜋superscript𝑒𝑖subscript^𝜆𝑟𝑙subscript𝜆𝑟𝑙subscript𝑠subscript𝒯𝑟𝑠subscript𝑥𝑠𝑙12𝑄subscript𝑞𝑟superscriptsubscript^𝜆𝑟𝑙2𝛽subscript𝑟subscript𝑟1𝐾subscript𝑙subscript𝑐𝑙𝜑subscript𝜆𝑙𝑟~delimited-[]1𝐾subscript𝑙subscript𝑐𝑙𝜑𝛾subscript𝜆𝑙11𝛾subscript𝜆𝑙2subscript𝑐𝛾subscriptproduct𝑟subscriptproduct𝑙𝑑subscript𝜆𝑙𝑑subscript^𝜆𝑙2𝜋superscript𝑒𝛽subscript𝑟1𝐾subscript𝑙subscript𝑐𝑙𝜑subscript𝜆𝑙superscript𝑒𝑖subscript𝑙subscript^𝜆𝑙subscript𝜆𝑙subscript𝑠subscript𝒯𝑠𝑟subscript𝑥𝑟𝑙12subscript𝑟𝑄subscript𝑞𝑟subscript𝑙superscriptsubscript^𝜆𝑙2subscriptproduct𝑟𝑙𝐷subscript𝑥𝑟𝑙subscriptproduct𝑟𝑙𝐷subscript𝜆𝑟𝑙superscript𝑒𝛽subscript𝑟subscript𝑟1𝐾subscript𝑙subscript𝑐𝑙𝜑𝑄subscript𝑞𝑟subscript𝜆𝑙𝑟subscript𝑠subscript𝒯𝑟𝑠subscript𝑥𝑠𝑙~delimited-[]1𝐾subscript𝑙subscript𝑐𝑙𝜑subscript𝑟subscript𝛾𝑟𝑄subscript𝑞𝑟subscript𝜆𝑟𝑙subscript𝑠subscript𝒯𝑟𝑠subscript𝑥𝑠𝑙subscript𝑐𝛾subscriptproduct𝑟subscriptproduct𝑙𝐷subscript𝜆𝑙superscript𝑒𝛽subscript𝑟1𝐾subscript𝑙subscript𝑐𝑙𝜑𝑄subscript𝑞𝑟subscript𝜆𝑙subscript𝑠subscript𝒯𝑟𝑠subscript𝑥𝑠𝑙\begin{split}E(\gamma)&=\int\prod_{arl}\frac{d\lambda_{rl}^{a}d\hat{\lambda}_{% rl}^{a}}{2\pi}\,e^{-\beta\ell_{r}\left(\frac{1}{\sqrt{K}}\sum_{l}c_{l}\varphi% \left(\lambda^{a}_{lr}\right)\right)}\,\tilde{\ell}\left[\frac{1}{\sqrt{K}}% \sum_{l}c_{l}\,\varphi\left(\frac{\gamma\lambda_{l1}^{1}+(1-\gamma)\lambda_{l2% }^{1}}{c_{\gamma}}\right)\right]\\ &\times e^{i\sum_{arl}\lambda_{rl}^{a}\hat{\lambda}_{rl}^{a}-\frac{1}{2}\sum_{% r}(Q-q_{r})\sum_{al}\left(\hat{\lambda}_{rl}^{a}\right)^{2}-\frac{1}{2}\sum_{% rl}\left(\sum_{as}\mathcal{T}_{rs}\hat{\lambda}^{a}_{sl}\right)^{2}}\\ &=\int\!\prod_{rl}Dx_{rl}\frac{\int\prod_{rl}\frac{d\lambda_{rl}d\hat{\lambda}% _{rl}}{2\pi}\,e^{i\hat{\lambda}_{rl}\left(\lambda_{rl}+\sum_{s}\mathcal{T}_{rs% }x_{sl}\right)-\frac{1}{2}(Q-q_{r})\hat{\lambda}_{rl}^{2}-\beta\sum_{r}\ell_{r% }\left(\frac{1}{\sqrt{K}}\sum_{l}c_{l}\varphi\left(\lambda_{lr}\right)\right)}% \,\tilde{\ell}\left[\frac{1}{\sqrt{K}}\sum_{l}c_{l}\,\varphi\left(\frac{\gamma% \lambda_{l1}+(1-\gamma)\lambda_{l2}}{c_{\gamma}}\right)\right]}{\prod_{r}\int% \prod_{l}\frac{d\lambda_{l}d\hat{\lambda}_{l}}{2\pi}\,e^{-\beta\ell_{r}\left(% \frac{1}{\sqrt{K}}\sum_{l}c_{l}\varphi\left(\lambda_{l}\right)\right)}\,e^{i% \sum_{l}\hat{\lambda}_{l}\left(\lambda_{l}+\sum_{s}\mathcal{T}_{sr}\,x_{rl}% \right)-\frac{1}{2}\sum_{r}(Q-q_{r})\sum_{l}\hat{\lambda}_{l}^{2}}}\\ &=\int\!\prod_{rl}Dx_{rl}\frac{\int\prod_{rl}D\lambda_{rl}\,e^{-\beta\sum_{r}% \ell_{r}\left(\frac{1}{\sqrt{K}}\sum_{l}c_{l}\varphi\left(\sqrt{Q-q_{r}}% \lambda_{lr}-\sum_{s}\mathcal{T}_{rs}x_{sl}\right)\right)}\,\tilde{\ell}\left[% \frac{1}{\sqrt{K}}\sum_{l}c_{l}\,\varphi\left(\sum_{r}\gamma_{r}\left(\frac{% \sqrt{Q-q_{r}}\lambda_{rl}-\sum_{s}\mathcal{T}_{rs}x_{sl}}{c_{\gamma}}\right)% \right)\right]}{\prod_{r}\int\prod_{l}D\lambda_{l}\,e^{-\beta\ell_{r}\left(% \frac{1}{\sqrt{K}}\sum_{l}c_{l}\varphi\left(\sqrt{Q-q_{r}}\lambda_{l}-\sum_{s}% \mathcal{T}_{rs}x_{sl}\right)\right)}}\end{split}start_ROW start_CELL italic_E ( italic_γ ) end_CELL start_CELL = ∫ ∏ start_POSTSUBSCRIPT italic_a italic_r italic_l end_POSTSUBSCRIPT divide start_ARG italic_d italic_λ start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT italic_d over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_π end_ARG italic_e start_POSTSUPERSCRIPT - italic_β roman_ℓ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_φ ( italic_λ start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l italic_r end_POSTSUBSCRIPT ) ) end_POSTSUPERSCRIPT over~ start_ARG roman_ℓ end_ARG [ divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_φ ( divide start_ARG italic_γ italic_λ start_POSTSUBSCRIPT italic_l 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT + ( 1 - italic_γ ) italic_λ start_POSTSUBSCRIPT italic_l 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG ) ] end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL × italic_e start_POSTSUPERSCRIPT italic_i ∑ start_POSTSUBSCRIPT italic_a italic_r italic_l end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_Q - italic_q start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) ∑ start_POSTSUBSCRIPT italic_a italic_l end_POSTSUBSCRIPT ( over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT ( ∑ start_POSTSUBSCRIPT italic_a italic_s end_POSTSUBSCRIPT caligraphic_T start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s italic_l end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = ∫ ∏ start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT italic_D italic_x start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT divide start_ARG ∫ ∏ start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT divide start_ARG italic_d italic_λ start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT italic_d over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_π end_ARG italic_e start_POSTSUPERSCRIPT italic_i over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT caligraphic_T start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s italic_l end_POSTSUBSCRIPT ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_Q - italic_q start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_β ∑ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_φ ( italic_λ start_POSTSUBSCRIPT italic_l italic_r end_POSTSUBSCRIPT ) ) end_POSTSUPERSCRIPT over~ start_ARG roman_ℓ end_ARG [ divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_φ ( divide start_ARG italic_γ italic_λ start_POSTSUBSCRIPT italic_l 1 end_POSTSUBSCRIPT + ( 1 - italic_γ ) italic_λ start_POSTSUBSCRIPT italic_l 2 end_POSTSUBSCRIPT end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG ) ] end_ARG start_ARG ∏ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ∫ ∏ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT divide start_ARG italic_d italic_λ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_d over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_π end_ARG italic_e start_POSTSUPERSCRIPT - italic_β roman_ℓ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_φ ( italic_λ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ) ) end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT italic_i ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT caligraphic_T start_POSTSUBSCRIPT italic_s italic_r end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_Q - italic_q start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT over^ start_ARG italic_λ end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_ARG end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = ∫ ∏ start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT italic_D italic_x start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT divide start_ARG ∫ ∏ start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT italic_D italic_λ start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - italic_β ∑ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_φ ( square-root start_ARG italic_Q - italic_q start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG italic_λ start_POSTSUBSCRIPT italic_l italic_r end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT caligraphic_T start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s italic_l end_POSTSUBSCRIPT ) ) end_POSTSUPERSCRIPT over~ start_ARG roman_ℓ end_ARG [ divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_φ ( ∑ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( divide start_ARG square-root start_ARG italic_Q - italic_q start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG italic_λ start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT caligraphic_T start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s italic_l end_POSTSUBSCRIPT end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG ) ) ] end_ARG start_ARG ∏ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ∫ ∏ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_D italic_λ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - italic_β roman_ℓ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_φ ( square-root start_ARG italic_Q - italic_q start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG italic_λ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT caligraphic_T start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s italic_l end_POSTSUBSCRIPT ) ) end_POSTSUPERSCRIPT end_ARG end_CELL end_ROW (44)

where in the last expression we have defined γ1γsubscript𝛾1𝛾\gamma_{1}\equiv\gammaitalic_γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≡ italic_γ and γ2=1γsubscript𝛾21𝛾\gamma_{2}=1-\gammaitalic_γ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 1 - italic_γ for convenience.

B.3 Large K𝐾Kitalic_K limit

We now perform the large K𝐾Kitalic_K limit, in order to further simplify and the expression in (44) by repeated usage of the central limit theorem. We will call by I𝐼Iitalic_I the numerator of the last expression in (44).

The numerator of the fraction can be written as

IrlDλrleβrr(1Klclφ(Qqrλlrs𝒯rsxsl))~[1Klclφ(rγr(Qqrλrls𝒯rsxslcγ))]=rdhrdh^r2πeirhrh^rβrr(hr)dvdv^2πeiv^v~[v]×rlDλrleiKrh^rlclφ(Qqrλlrs𝒯rsxsl)iv^Klclφ(rγr(Qqrλrls𝒯rsxslcγ))𝐼subscriptproduct𝑟𝑙𝐷subscript𝜆𝑟𝑙superscript𝑒𝛽subscript𝑟subscript𝑟1𝐾subscript𝑙subscript𝑐𝑙𝜑𝑄subscript𝑞𝑟subscript𝜆𝑙𝑟subscript𝑠subscript𝒯𝑟𝑠subscript𝑥𝑠𝑙~delimited-[]1𝐾subscript𝑙subscript𝑐𝑙𝜑subscript𝑟subscript𝛾𝑟𝑄subscript𝑞𝑟subscript𝜆𝑟𝑙subscript𝑠subscript𝒯𝑟𝑠subscript𝑥𝑠𝑙subscript𝑐𝛾subscriptproduct𝑟𝑑subscript𝑟𝑑subscript^𝑟2𝜋superscript𝑒𝑖subscript𝑟subscript𝑟subscript^𝑟𝛽subscript𝑟subscript𝑟subscript𝑟𝑑𝑣𝑑^𝑣2𝜋superscript𝑒𝑖^𝑣𝑣~delimited-[]𝑣subscriptproduct𝑟𝑙𝐷subscript𝜆𝑟𝑙superscript𝑒𝑖𝐾subscript𝑟subscript^𝑟subscript𝑙subscript𝑐𝑙𝜑𝑄subscript𝑞𝑟subscript𝜆𝑙𝑟subscript𝑠subscript𝒯𝑟𝑠subscript𝑥𝑠𝑙𝑖^𝑣𝐾subscript𝑙subscript𝑐𝑙𝜑subscript𝑟subscript𝛾𝑟𝑄subscript𝑞𝑟subscript𝜆𝑟𝑙subscript𝑠subscript𝒯𝑟𝑠subscript𝑥𝑠𝑙subscript𝑐𝛾\begin{split}I&\equiv\int\prod_{rl}D\lambda_{rl}\,e^{-\beta\sum_{r}\ell_{r}% \left(\frac{1}{\sqrt{K}}\sum_{l}c_{l}\varphi\left(\sqrt{Q-q_{r}}\lambda_{lr}-% \sum_{s}\mathcal{T}_{rs}x_{sl}\right)\right)}\,\tilde{\ell}\left[\frac{1}{% \sqrt{K}}\sum_{l}c_{l}\,\varphi\left(\sum_{r}\gamma_{r}\left(\frac{\sqrt{Q-q_{% r}}\lambda_{rl}-\sum_{s}\mathcal{T}_{rs}x_{sl}}{c_{\gamma}}\right)\right)% \right]\\ &=\int\prod_{r}\frac{dh_{r}d\hat{h}_{r}}{2\pi}\,e^{i\sum_{r}h_{r}\hat{h}_{r}-% \beta\sum_{r}\ell_{r}\left(h_{r}\right)}\int\frac{dvd\hat{v}}{2\pi}e^{i\hat{v}% v}\,\tilde{\ell}\left[v\right]\\ &\times\int\prod_{rl}D\lambda_{rl}\,e^{-\frac{i}{\sqrt{K}}\sum_{r}\hat{h}_{r}% \sum_{l}c_{l}\varphi\left(\sqrt{Q-q_{r}}\lambda_{lr}-\sum_{s}\mathcal{T}_{rs}x% _{sl}\right)-\frac{i\hat{v}}{\sqrt{K}}\sum_{l}c_{l}\,\varphi\left(\sum_{r}% \gamma_{r}\left(\frac{\sqrt{Q-q_{r}}\lambda_{rl}-\sum_{s}\mathcal{T}_{rs}x_{sl% }}{c_{\gamma}}\right)\right)}\end{split}start_ROW start_CELL italic_I end_CELL start_CELL ≡ ∫ ∏ start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT italic_D italic_λ start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - italic_β ∑ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_φ ( square-root start_ARG italic_Q - italic_q start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG italic_λ start_POSTSUBSCRIPT italic_l italic_r end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT caligraphic_T start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s italic_l end_POSTSUBSCRIPT ) ) end_POSTSUPERSCRIPT over~ start_ARG roman_ℓ end_ARG [ divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_φ ( ∑ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( divide start_ARG square-root start_ARG italic_Q - italic_q start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG italic_λ start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT caligraphic_T start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s italic_l end_POSTSUBSCRIPT end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG ) ) ] end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = ∫ ∏ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT divide start_ARG italic_d italic_h start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_d over^ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_π end_ARG italic_e start_POSTSUPERSCRIPT italic_i ∑ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT over^ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT - italic_β ∑ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_h start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ∫ divide start_ARG italic_d italic_v italic_d over^ start_ARG italic_v end_ARG end_ARG start_ARG 2 italic_π end_ARG italic_e start_POSTSUPERSCRIPT italic_i over^ start_ARG italic_v end_ARG italic_v end_POSTSUPERSCRIPT over~ start_ARG roman_ℓ end_ARG [ italic_v ] end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL × ∫ ∏ start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT italic_D italic_λ start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - divide start_ARG italic_i end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT over^ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_φ ( square-root start_ARG italic_Q - italic_q start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG italic_λ start_POSTSUBSCRIPT italic_l italic_r end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT caligraphic_T start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s italic_l end_POSTSUBSCRIPT ) - divide start_ARG italic_i over^ start_ARG italic_v end_ARG end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_φ ( ∑ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( divide start_ARG square-root start_ARG italic_Q - italic_q start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG italic_λ start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT caligraphic_T start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s italic_l end_POSTSUBSCRIPT end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG ) ) end_POSTSUPERSCRIPT end_CELL end_ROW (45)

Expanding the exponential up to second order, averaging over λlrsubscript𝜆𝑙𝑟\lambda_{lr}italic_λ start_POSTSUBSCRIPT italic_l italic_r end_POSTSUBSCRIPT and re-exponentiating we get

rlDλrleiKrh^rlclφ(Qqrλlrs𝒯rsxsl)iv^Klclφ(rγr(Qqrλrls𝒯rsxslcγ))eirh^rMr(0)iv^N(0)12rΔr(0)h^r2Ξ(0)2v^2v^rh^rΩr(0)similar-to-or-equalssubscriptproduct𝑟𝑙𝐷subscript𝜆𝑟𝑙superscript𝑒𝑖𝐾subscript𝑟subscript^𝑟subscript𝑙subscript𝑐𝑙𝜑𝑄subscript𝑞𝑟subscript𝜆𝑙𝑟subscript𝑠subscript𝒯𝑟𝑠subscript𝑥𝑠𝑙𝑖^𝑣𝐾subscript𝑙subscript𝑐𝑙𝜑subscript𝑟subscript𝛾𝑟𝑄subscript𝑞𝑟subscript𝜆𝑟𝑙subscript𝑠subscript𝒯𝑟𝑠subscript𝑥𝑠𝑙subscript𝑐𝛾superscript𝑒𝑖subscript𝑟subscript^𝑟superscriptsubscript𝑀𝑟0𝑖^𝑣superscript𝑁012subscript𝑟superscriptsubscriptΔ𝑟0superscriptsubscript^𝑟2superscriptΞ02superscript^𝑣2^𝑣subscript𝑟subscript^𝑟superscriptsubscriptΩ𝑟0\begin{split}&\int\prod_{rl}D\lambda_{rl}\,e^{-\frac{i}{\sqrt{K}}\sum_{r}\hat{% h}_{r}\sum_{l}c_{l}\varphi\left(\sqrt{Q-q_{r}}\lambda_{lr}-\sum_{s}\mathcal{T}% _{rs}x_{sl}\right)-\frac{i\hat{v}}{\sqrt{K}}\sum_{l}c_{l}\,\varphi\left(\sum_{% r}\gamma_{r}\left(\frac{\sqrt{Q-q_{r}}\lambda_{rl}-\sum_{s}\mathcal{T}_{rs}x_{% sl}}{c_{\gamma}}\right)\right)}\\ &\simeq e^{-i\sum_{r}\hat{h}_{r}M_{r}^{(0)}-i\hat{v}N^{(0)}-\frac{1}{2}\sum_{r% }\Delta_{r}^{(0)}\hat{h}_{r}^{2}-\frac{\Xi^{(0)}}{2}\hat{v}^{2}-\hat{v}\sum_{r% }\hat{h}_{r}\Omega_{r}^{(0)}}\end{split}start_ROW start_CELL end_CELL start_CELL ∫ ∏ start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT italic_D italic_λ start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - divide start_ARG italic_i end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT over^ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_φ ( square-root start_ARG italic_Q - italic_q start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG italic_λ start_POSTSUBSCRIPT italic_l italic_r end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT caligraphic_T start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s italic_l end_POSTSUBSCRIPT ) - divide start_ARG italic_i over^ start_ARG italic_v end_ARG end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_φ ( ∑ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( divide start_ARG square-root start_ARG italic_Q - italic_q start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG italic_λ start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT caligraphic_T start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s italic_l end_POSTSUBSCRIPT end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG ) ) end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ≃ italic_e start_POSTSUPERSCRIPT - italic_i ∑ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT over^ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT - italic_i over^ start_ARG italic_v end_ARG italic_N start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT roman_Δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT over^ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG roman_Ξ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG over^ start_ARG italic_v end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - over^ start_ARG italic_v end_ARG ∑ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT over^ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT roman_Ω start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_CELL end_ROW (46)

where we have introduced the notation

φrl(λr,x)subscript𝜑𝑟𝑙subscript𝜆𝑟𝑥\displaystyle\varphi_{rl}(\lambda_{r},x)italic_φ start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT , italic_x ) φ(Qqrλlrs𝒯rsxsl)absent𝜑𝑄subscript𝑞𝑟subscript𝜆𝑙𝑟subscript𝑠subscript𝒯𝑟𝑠subscript𝑥𝑠𝑙\displaystyle\equiv\varphi\left(\sqrt{Q-q_{r}}\lambda_{lr}-\sum_{s}\mathcal{T}% _{rs}x_{sl}\right)≡ italic_φ ( square-root start_ARG italic_Q - italic_q start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG italic_λ start_POSTSUBSCRIPT italic_l italic_r end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT caligraphic_T start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s italic_l end_POSTSUBSCRIPT ) (47a)
φ~l(γ;{λr}r=1y,x)subscript~𝜑𝑙𝛾superscriptsubscriptsubscript𝜆𝑟𝑟1𝑦𝑥\displaystyle\tilde{\varphi}_{l}(\gamma;\{\lambda_{r}\}_{r=1}^{y},x)over~ start_ARG italic_φ end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( italic_γ ; { italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_r = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT , italic_x ) φ(rγr(Qqrλrls𝒯rsxslcγ))absent𝜑subscript𝑟subscript𝛾𝑟𝑄subscript𝑞𝑟subscript𝜆𝑟𝑙subscript𝑠subscript𝒯𝑟𝑠subscript𝑥𝑠𝑙subscript𝑐𝛾\displaystyle\equiv\varphi\left(\sum_{r}\gamma_{r}\left(\frac{\sqrt{Q-q_{r}}% \lambda_{rl}-\sum_{s}\mathcal{T}_{rs}x_{sl}}{c_{\gamma}}\right)\right)≡ italic_φ ( ∑ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( divide start_ARG square-root start_ARG italic_Q - italic_q start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG italic_λ start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT caligraphic_T start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s italic_l end_POSTSUBSCRIPT end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG ) ) (47b)

to define the following quantities

Mr(0)subscriptsuperscript𝑀0𝑟\displaystyle M^{(0)}_{r}italic_M start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT 1Klclφ(Qqrλrs𝒯rsxsl)λr1Klclφrlλabsent1𝐾subscript𝑙subscript𝑐𝑙subscriptdelimited-⟨⟩𝜑𝑄subscript𝑞𝑟subscript𝜆𝑟subscript𝑠subscript𝒯𝑟𝑠subscript𝑥𝑠𝑙subscript𝜆𝑟1𝐾subscript𝑙subscript𝑐𝑙subscriptdelimited-⟨⟩subscript𝜑𝑟𝑙𝜆\displaystyle\equiv\frac{1}{\sqrt{K}}\sum_{l}c_{l}\,\left\langle\varphi\left(% \sqrt{Q-q_{r}}\lambda_{r}-\sum_{s}\mathcal{T}_{rs}x_{sl}\right)\right\rangle_{% \lambda_{r}}\equiv\frac{1}{\sqrt{K}}\sum_{l}c_{l}\,\left\langle\varphi_{rl}% \right\rangle_{\lambda}≡ divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ⟨ italic_φ ( square-root start_ARG italic_Q - italic_q start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT caligraphic_T start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s italic_l end_POSTSUBSCRIPT ) ⟩ start_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≡ divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ⟨ italic_φ start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT (48a)
N(0)superscript𝑁0\displaystyle N^{(0)}italic_N start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT 1Klclφ(rγr(Qqrλrls𝒯rsxslcγ)){λr}r=1y1Klclφ~lλabsent1𝐾subscript𝑙subscript𝑐𝑙subscriptdelimited-⟨⟩𝜑subscript𝑟subscript𝛾𝑟𝑄subscript𝑞𝑟subscript𝜆𝑟𝑙subscript𝑠subscript𝒯𝑟𝑠subscript𝑥𝑠𝑙subscript𝑐𝛾superscriptsubscriptsubscript𝜆𝑟𝑟1𝑦1𝐾subscript𝑙subscript𝑐𝑙subscriptdelimited-⟨⟩subscript~𝜑𝑙𝜆\displaystyle\equiv\frac{1}{\sqrt{K}}\sum_{l}c_{l}\,\left\langle\varphi\left(% \sum_{r}\gamma_{r}\left(\frac{\sqrt{Q-q_{r}}\lambda_{rl}-\sum_{s}\mathcal{T}_{% rs}x_{sl}}{c_{\gamma}}\right)\right)\right\rangle_{\{\lambda_{r}\}_{r=1}^{y}}% \equiv\frac{1}{\sqrt{K}}\sum_{l}c_{l}\,\left\langle\tilde{\varphi}_{l}\right% \rangle_{\lambda}≡ divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ⟨ italic_φ ( ∑ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( divide start_ARG square-root start_ARG italic_Q - italic_q start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG italic_λ start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT caligraphic_T start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s italic_l end_POSTSUBSCRIPT end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG ) ) ⟩ start_POSTSUBSCRIPT { italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_r = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_y end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≡ divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_K end_ARG end_ARG ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ⟨ over~ start_ARG italic_φ end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT (48b)
Δr(0)subscriptsuperscriptΔ0𝑟\displaystyle\Delta^{(0)}_{r}roman_Δ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT 1Klcl2[φrl2λφrlλ2]absent1𝐾subscript𝑙superscriptsubscript𝑐𝑙2delimited-[]subscriptdelimited-⟨⟩subscriptsuperscript𝜑2𝑟𝑙𝜆subscriptsuperscriptdelimited-⟨⟩subscript𝜑𝑟𝑙2𝜆\displaystyle\equiv\frac{1}{K}\sum_{l}c_{l}^{2}\,\left[\langle\varphi^{2}_{rl}% \rangle_{\lambda}-\langle\varphi_{rl}\rangle^{2}_{\lambda}\right]≡ divide start_ARG 1 end_ARG start_ARG italic_K end_ARG ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT [ ⟨ italic_φ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT - ⟨ italic_φ start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT ⟩ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ] (48c)
Ξ(0)superscriptΞ0\displaystyle\Xi^{(0)}roman_Ξ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT 1Klcl2[φ~l2λφ~lλ2]absent1𝐾subscript𝑙superscriptsubscript𝑐𝑙2delimited-[]subscriptdelimited-⟨⟩subscriptsuperscript~𝜑2𝑙𝜆subscriptsuperscriptdelimited-⟨⟩subscript~𝜑𝑙2𝜆\displaystyle\equiv\frac{1}{K}\sum_{l}c_{l}^{2}\left[\langle\tilde{\varphi}^{2% }_{l}\rangle_{\lambda}-\langle\tilde{\varphi}_{l}\rangle^{2}_{\lambda}\right]≡ divide start_ARG 1 end_ARG start_ARG italic_K end_ARG ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT [ ⟨ over~ start_ARG italic_φ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT - ⟨ over~ start_ARG italic_φ end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ⟩ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ] (48d)
Ωr(0)superscriptsubscriptΩ𝑟0\displaystyle\Omega_{r}^{(0)}roman_Ω start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT 1Klcl2[φ~lφlrλφ~lλφlrλr]absent1𝐾subscript𝑙superscriptsubscript𝑐𝑙2delimited-[]subscriptdelimited-⟨⟩subscript~𝜑𝑙subscript𝜑𝑙𝑟𝜆subscriptdelimited-⟨⟩subscript~𝜑𝑙𝜆subscriptdelimited-⟨⟩subscript𝜑𝑙𝑟subscript𝜆𝑟\displaystyle\equiv\frac{1}{K}\sum_{l}c_{l}^{2}\left[\langle\tilde{\varphi}_{l% }\varphi_{lr}\rangle_{\lambda}-\langle\tilde{\varphi}_{l}\rangle_{\lambda}% \langle\varphi_{lr}\rangle_{\lambda_{r}}\right]≡ divide start_ARG 1 end_ARG start_ARG italic_K end_ARG ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT [ ⟨ over~ start_ARG italic_φ end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_φ start_POSTSUBSCRIPT italic_l italic_r end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT - ⟨ over~ start_ARG italic_φ end_ARG start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ⟨ italic_φ start_POSTSUBSCRIPT italic_l italic_r end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUBSCRIPT ] (48e)

Notice that Δrs(0)=1Klcl2[φrlφslλφrlλφslλ]=0subscriptsuperscriptΔ0𝑟𝑠1𝐾subscript𝑙superscriptsubscript𝑐𝑙2delimited-[]subscriptdelimited-⟨⟩subscript𝜑𝑟𝑙subscript𝜑𝑠𝑙𝜆subscriptdelimited-⟨⟩subscript𝜑𝑟𝑙𝜆subscriptdelimited-⟨⟩subscript𝜑𝑠𝑙𝜆0\Delta^{(0)}_{rs}=\frac{1}{K}\sum_{l}c_{l}^{2}\,\left[\langle\varphi_{rl}% \varphi_{sl}\rangle_{\lambda}-\langle\varphi_{rl}\rangle_{\lambda}\langle% \varphi_{sl}\rangle_{\lambda}\right]=0roman_Δ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_K end_ARG ∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT [ ⟨ italic_φ start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT italic_φ start_POSTSUBSCRIPT italic_s italic_l end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT - ⟨ italic_φ start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ⟨ italic_φ start_POSTSUBSCRIPT italic_s italic_l end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ] = 0 if rs𝑟𝑠r\neq sitalic_r ≠ italic_s since the Gaussian variable λlrsubscript𝜆𝑙𝑟\lambda_{lr}italic_λ start_POSTSUBSCRIPT italic_l italic_r end_POSTSUBSCRIPT is independent of λlssubscript𝜆𝑙𝑠\lambda_{ls}italic_λ start_POSTSUBSCRIPT italic_l italic_s end_POSTSUBSCRIPT. We then inserting (46) back into (45) and perform the integrals over h^rsubscript^𝑟\hat{h}_{r}over^ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT and v^^𝑣\hat{v}over^ start_ARG italic_v end_ARG; in general the integral is of the form

rdhrdh^r2πdvdv^2πeih^r(hrMr)+iv^(vN)12rsΔrsh^rh^sΞ2v^2v^rΩrh^rf({hr},v)rdhrdh^r2πDveirh^r(hrMrvΞΩr)12rs(ΔrsΩrΩsΞ)h^rh^sf({hr},N+Ξv)=DvrDhrf({Mr+ΩrΞv+s(A)rshs},N+Ξv)similar-to-or-equalssubscriptproduct𝑟𝑑subscript𝑟𝑑subscript^𝑟2𝜋𝑑𝑣𝑑^𝑣2𝜋superscript𝑒𝑖subscript^𝑟subscript𝑟subscript𝑀𝑟𝑖^𝑣𝑣𝑁12subscript𝑟𝑠subscriptΔ𝑟𝑠subscript^𝑟subscript^𝑠Ξ2superscript^𝑣2^𝑣subscript𝑟subscriptΩ𝑟subscript^𝑟𝑓subscript𝑟𝑣subscriptproduct𝑟𝑑subscript𝑟𝑑subscript^𝑟2𝜋𝐷𝑣superscript𝑒𝑖subscript𝑟subscript^𝑟subscript𝑟subscript𝑀𝑟𝑣ΞsubscriptΩ𝑟12subscript𝑟𝑠subscriptΔ𝑟𝑠subscriptΩ𝑟subscriptΩ𝑠Ξsubscript^𝑟subscript^𝑠𝑓subscript𝑟𝑁Ξ𝑣𝐷𝑣subscriptproduct𝑟𝐷subscript𝑟𝑓subscript𝑀𝑟subscriptΩ𝑟Ξ𝑣subscript𝑠subscript𝐴𝑟𝑠subscript𝑠𝑁Ξ𝑣\begin{split}&\int\prod_{r}\frac{dh_{r}d\hat{h}_{r}}{2\pi}\frac{dvd\hat{v}}{2% \pi}e^{i\hat{h}_{r}(h_{r}-M_{r})+i\hat{v}(v-N)-\frac{1}{2}\sum_{rs}\Delta_{rs}% \hat{h}_{r}\hat{h}_{s}-\frac{\Xi}{2}\hat{v}^{2}-\hat{v}\sum_{r}\Omega_{r}\hat{% h}_{r}}f(\{h_{r}\},v)\\ &\simeq\int\prod_{r}\frac{dh_{r}d\hat{h}_{r}}{2\pi}Dv\,e^{i\sum_{r}\hat{h}_{r}% (h_{r}-M_{r}-\frac{v}{\sqrt{\Xi}}\Omega_{r})-\frac{1}{2}\sum_{rs}\left(\Delta_% {rs}-\frac{\Omega_{r}\Omega_{s}}{\Xi}\right)\hat{h}_{r}\hat{h}_{s}}f(\{h_{r}\}% ,N+\sqrt{\Xi}v)\\ &=\int Dv\prod_{r}Dh_{r}\,f\left(\left\{M_{r}+\frac{\Omega_{r}}{\sqrt{\Xi}}\,v% +\sum_{s}\left(\sqrt{A}\right)_{rs}h_{s}\right\},N+\sqrt{\Xi}v\right)\end{split}start_ROW start_CELL end_CELL start_CELL ∫ ∏ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT divide start_ARG italic_d italic_h start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_d over^ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_π end_ARG divide start_ARG italic_d italic_v italic_d over^ start_ARG italic_v end_ARG end_ARG start_ARG 2 italic_π end_ARG italic_e start_POSTSUPERSCRIPT italic_i over^ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_h start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT - italic_M start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) + italic_i over^ start_ARG italic_v end_ARG ( italic_v - italic_N ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT roman_Δ start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT over^ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT over^ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT - divide start_ARG roman_Ξ end_ARG start_ARG 2 end_ARG over^ start_ARG italic_v end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - over^ start_ARG italic_v end_ARG ∑ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT roman_Ω start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT over^ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_f ( { italic_h start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT } , italic_v ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ≃ ∫ ∏ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT divide start_ARG italic_d italic_h start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_d over^ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_π end_ARG italic_D italic_v italic_e start_POSTSUPERSCRIPT italic_i ∑ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT over^ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_h start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT - italic_M start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT - divide start_ARG italic_v end_ARG start_ARG square-root start_ARG roman_Ξ end_ARG end_ARG roman_Ω start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT ( roman_Δ start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT - divide start_ARG roman_Ω start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT roman_Ω start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_ARG start_ARG roman_Ξ end_ARG ) over^ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT over^ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_f ( { italic_h start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT } , italic_N + square-root start_ARG roman_Ξ end_ARG italic_v ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = ∫ italic_D italic_v ∏ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_D italic_h start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_f ( { italic_M start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT + divide start_ARG roman_Ω start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Ξ end_ARG end_ARG italic_v + ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( square-root start_ARG italic_A end_ARG ) start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT } , italic_N + square-root start_ARG roman_Ξ end_ARG italic_v ) end_CELL end_ROW (49)

with ArsΔrsΩrΩsΞsubscript𝐴𝑟𝑠subscriptΔ𝑟𝑠subscriptΩ𝑟subscriptΩ𝑠ΞA_{rs}\equiv\Delta_{rs}-\frac{\Omega_{r}\Omega_{s}}{\Xi}italic_A start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT ≡ roman_Δ start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT - divide start_ARG roman_Ω start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT roman_Ω start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_ARG start_ARG roman_Ξ end_ARG. Specializing this identity to our case, i.e. using f({hr},v)=eβrr(hr)~(v)𝑓subscript𝑟𝑣superscript𝑒𝛽subscript𝑟subscript𝑟subscript𝑟~𝑣f(\left\{h_{r}\right\},v)=e^{-\beta\sum_{r}\ell_{r}(h_{r})}\,\widetilde{\ell}(v)italic_f ( { italic_h start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT } , italic_v ) = italic_e start_POSTSUPERSCRIPT - italic_β ∑ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_h start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT over~ start_ARG roman_ℓ end_ARG ( italic_v ) we get

E(γ)=lDxrlrDλrDs~(N(0)+Ξ(0)s)eβr(Mr(0)+Ωr(0)Ξ(0)s+s[Λ(0)]rsλs)rDλeβr(Mr(0)+Δr(0)λ)𝐸𝛾subscriptproduct𝑙𝐷subscript𝑥𝑟𝑙subscriptproduct𝑟𝐷subscript𝜆𝑟𝐷𝑠~superscript𝑁0superscriptΞ0𝑠superscript𝑒𝛽subscript𝑟superscriptsubscript𝑀𝑟0superscriptsubscriptΩ𝑟0superscriptΞ0𝑠subscript𝑠subscriptdelimited-[]superscriptΛ0𝑟𝑠subscript𝜆𝑠subscriptproduct𝑟𝐷𝜆superscript𝑒𝛽subscript𝑟superscriptsubscript𝑀𝑟0subscriptsuperscriptΔ0𝑟𝜆\begin{split}E(\gamma)&=\int\prod_{l}Dx_{rl}\frac{\int\prod_{r}D\lambda_{r}Ds% \,\tilde{\ell}\left(N^{(0)}+\sqrt{\Xi^{(0)}}s\right)\,e^{-\beta\sum_{r}\ell% \left(M_{r}^{(0)}+\frac{\Omega_{r}^{(0)}}{\sqrt{\Xi^{(0)}}}s+\sum_{s}\left[% \sqrt{\Lambda^{(0)}}\right]_{rs}\lambda_{s}\right)}}{\prod_{r}\int D\lambda\,e% ^{-\beta\ell_{r}\left(M_{r}^{(0)}+\sqrt{\Delta^{(0)}_{r}}\lambda\right)}}\end{split}start_ROW start_CELL italic_E ( italic_γ ) end_CELL start_CELL = ∫ ∏ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_D italic_x start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT divide start_ARG ∫ ∏ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_D italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_D italic_s over~ start_ARG roman_ℓ end_ARG ( italic_N start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT + square-root start_ARG roman_Ξ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT end_ARG italic_s ) italic_e start_POSTSUPERSCRIPT - italic_β ∑ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT roman_ℓ ( italic_M start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT + divide start_ARG roman_Ω start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT end_ARG start_ARG square-root start_ARG roman_Ξ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT end_ARG end_ARG italic_s + ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT [ square-root start_ARG roman_Λ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT end_ARG ] start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT end_ARG start_ARG ∏ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ∫ italic_D italic_λ italic_e start_POSTSUPERSCRIPT - italic_β roman_ℓ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_M start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT + square-root start_ARG roman_Δ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG italic_λ ) end_POSTSUPERSCRIPT end_ARG end_CELL end_ROW (50)

with Λrs(0)Δr(0)δrsΩr(0)Ωs(0)Ξ(0)superscriptsubscriptΛ𝑟𝑠0subscriptsuperscriptΔ0𝑟subscript𝛿𝑟𝑠superscriptsubscriptΩ𝑟0superscriptsubscriptΩ𝑠0superscriptΞ0\Lambda_{rs}^{(0)}\equiv\Delta^{(0)}_{r}\delta_{rs}-\frac{\Omega_{r}^{(0)}% \Omega_{s}^{(0)}}{\Xi^{(0)}}roman_Λ start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ≡ roman_Δ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT - divide start_ARG roman_Ω start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT roman_Ω start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT end_ARG start_ARG roman_Ξ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT end_ARG. Notice that we have done the same steps also in the denominator of the fraction.

Finally we apply the central limit again to simplify the integrals over xlsubscript𝑥𝑙x_{l}italic_x start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT variables; concerning all the variance terms, i.e. Δr(0)superscriptsubscriptΔ𝑟0\Delta_{r}^{(0)}roman_Δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT, Ξ(0)superscriptΞ0\Xi^{(0)}roman_Ξ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT and Ωr(0)superscriptsubscriptΩ𝑟0\Omega_{r}^{(0)}roman_Ω start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT we can trivially compute their mean with respect to xrsubscript𝑥𝑟x_{r}italic_x start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT, their variance being subleading in K𝐾Kitalic_K. The only new terms come from the variance of the mean terms, i.e. the parameters Mr(0)superscriptsubscript𝑀𝑟0M_{r}^{(0)}italic_M start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT and N(0)superscript𝑁0N^{(0)}italic_N start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT. We get a term of the type

rlDxrlrdhrdh^r2πdvdv^2πeihrh^r+ivv^ih^rMr(0)iv^N(0)f({hr},v)rdhrdh^r2πdvdv^2πeih^r(hrM)+iv^(vN)12rsΨrsh^rh^sT2v^2v^rUrh^rf({hr},v)=DxrDyrf({M+UrTx+sHrsys},N+Tx)similar-to-or-equalssubscriptproduct𝑟𝑙𝐷subscript𝑥𝑟𝑙subscriptproduct𝑟𝑑subscript𝑟𝑑subscript^𝑟2𝜋𝑑𝑣𝑑^𝑣2𝜋superscript𝑒𝑖subscript𝑟subscript^𝑟𝑖𝑣^𝑣𝑖subscript^𝑟superscriptsubscript𝑀𝑟0𝑖^𝑣superscript𝑁0𝑓subscript𝑟𝑣subscriptproduct𝑟𝑑subscript𝑟𝑑subscript^𝑟2𝜋𝑑𝑣𝑑^𝑣2𝜋superscript𝑒𝑖subscript^𝑟subscript𝑟𝑀𝑖^𝑣𝑣𝑁12subscript𝑟𝑠subscriptΨ𝑟𝑠subscript^𝑟subscript^𝑠𝑇2superscript^𝑣2^𝑣subscript𝑟subscript𝑈𝑟subscript^𝑟𝑓subscript𝑟𝑣𝐷𝑥subscriptproduct𝑟𝐷subscript𝑦𝑟𝑓𝑀subscript𝑈𝑟𝑇𝑥subscript𝑠subscript𝐻𝑟𝑠subscript𝑦𝑠𝑁𝑇𝑥\begin{split}&\int\prod_{rl}Dx_{rl}\int\prod_{r}\frac{dh_{r}d\hat{h}_{r}}{2\pi% }\frac{dvd\hat{v}}{2\pi}\,e^{ih_{r}\hat{h}_{r}+iv\hat{v}-i\hat{h}_{r}M_{r}^{(0% )}-i\hat{v}N^{(0)}}f(\{h_{r}\},v)\\ &\simeq\int\prod_{r}\frac{dh_{r}d\hat{h}_{r}}{2\pi}\frac{dvd\hat{v}}{2\pi}e^{i% \hat{h}_{r}(h_{r}-M)+i\hat{v}(v-N)-\frac{1}{2}\sum_{rs}\Psi_{rs}\hat{h}_{r}% \hat{h}_{s}-\frac{T}{2}\hat{v}^{2}-\hat{v}\sum_{r}U_{r}\hat{h}_{r}}f(\{h_{r}\}% ,v)\\ &=\int Dx\prod_{r}Dy_{r}\,f\left(\left\{M+\frac{U_{r}}{\sqrt{T}}\,x+\sum_{s}% \sqrt{H}_{rs}y_{s}\right\},N+\sqrt{T}x\right)\end{split}start_ROW start_CELL end_CELL start_CELL ∫ ∏ start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT italic_D italic_x start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT ∫ ∏ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT divide start_ARG italic_d italic_h start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_d over^ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_π end_ARG divide start_ARG italic_d italic_v italic_d over^ start_ARG italic_v end_ARG end_ARG start_ARG 2 italic_π end_ARG italic_e start_POSTSUPERSCRIPT italic_i italic_h start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT over^ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT + italic_i italic_v over^ start_ARG italic_v end_ARG - italic_i over^ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT - italic_i over^ start_ARG italic_v end_ARG italic_N start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_f ( { italic_h start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT } , italic_v ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ≃ ∫ ∏ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT divide start_ARG italic_d italic_h start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_d over^ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG start_ARG 2 italic_π end_ARG divide start_ARG italic_d italic_v italic_d over^ start_ARG italic_v end_ARG end_ARG start_ARG 2 italic_π end_ARG italic_e start_POSTSUPERSCRIPT italic_i over^ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_h start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT - italic_M ) + italic_i over^ start_ARG italic_v end_ARG ( italic_v - italic_N ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT roman_Ψ start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT over^ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT over^ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT - divide start_ARG italic_T end_ARG start_ARG 2 end_ARG over^ start_ARG italic_v end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - over^ start_ARG italic_v end_ARG ∑ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_U start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT over^ start_ARG italic_h end_ARG start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_f ( { italic_h start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT } , italic_v ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = ∫ italic_D italic_x ∏ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_D italic_y start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_f ( { italic_M + divide start_ARG italic_U start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG italic_T end_ARG end_ARG italic_x + ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT square-root start_ARG italic_H end_ARG start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT } , italic_N + square-root start_ARG italic_T end_ARG italic_x ) end_CELL end_ROW (51)

where we have used in the last step the general identity (LABEL:eq::generic_identity). The final result is

E(γ)=lDxrlrDλrDs~(N(0)+Ξ(0)s)eβr(Mr(0)+Ωr(0)Ξ(0)s+s[Λ(0)]rsλs)rDλeβr(Mr(0)+Δr(0)λ)=DxrDyrrDλrDs~(Tx+Ξs)eβrr(UrTx+sHrsys+ΩrΞs+s[Λ]rsλs)rDλeβr(UrTx+sHrsys+Δrλ)𝐸𝛾subscriptproduct𝑙𝐷subscript𝑥𝑟𝑙subscriptproduct𝑟𝐷subscript𝜆𝑟𝐷𝑠~superscript𝑁0superscriptΞ0𝑠superscript𝑒𝛽subscript𝑟superscriptsubscript𝑀𝑟0superscriptsubscriptΩ𝑟0superscriptΞ0𝑠subscript𝑠subscriptdelimited-[]superscriptΛ0𝑟𝑠subscript𝜆𝑠subscriptproduct𝑟𝐷𝜆superscript𝑒𝛽subscript𝑟superscriptsubscript𝑀𝑟0subscriptsuperscriptΔ0𝑟𝜆𝐷𝑥subscriptproduct𝑟𝐷subscript𝑦𝑟subscriptproduct𝑟𝐷subscript𝜆𝑟𝐷𝑠~𝑇𝑥Ξ𝑠superscript𝑒𝛽subscript𝑟subscript𝑟subscript𝑈𝑟𝑇𝑥subscript𝑠subscript𝐻𝑟𝑠subscript𝑦𝑠subscriptΩ𝑟Ξ𝑠subscript𝑠subscriptdelimited-[]Λ𝑟𝑠subscript𝜆𝑠subscriptproduct𝑟𝐷𝜆superscript𝑒𝛽subscript𝑟subscript𝑈𝑟𝑇𝑥subscript𝑠subscript𝐻𝑟𝑠subscript𝑦𝑠subscriptΔ𝑟𝜆\begin{split}E(\gamma)&=\int\prod_{l}Dx_{rl}\frac{\int\prod_{r}D\lambda_{r}Ds% \,\tilde{\ell}\left(N^{(0)}+\sqrt{\Xi^{(0)}}s\right)\,e^{-\beta\sum_{r}\ell% \left(M_{r}^{(0)}+\frac{\Omega_{r}^{(0)}}{\sqrt{\Xi^{(0)}}}s+\sum_{s}\left[% \sqrt{\Lambda^{(0)}}\right]_{rs}\lambda_{s}\right)}}{\prod_{r}\int D\lambda\,e% ^{-\beta\ell_{r}\left(M_{r}^{(0)}+\sqrt{\Delta^{(0)}_{r}}\lambda\right)}}\\ &=\int Dx\prod_{r}Dy_{r}\frac{\int\prod_{r}D\lambda_{r}Ds\,\tilde{\ell}\left(% \sqrt{T}x+\sqrt{\Xi}s\right)\,e^{-\beta\sum_{r}\ell_{r}\left(\frac{U_{r}}{% \sqrt{T}}x+\sum_{s}\sqrt{H}_{rs}y_{s}+\frac{\Omega_{r}}{\sqrt{\Xi}}s+\sum_{s}% \left[\sqrt{\Lambda}\right]_{rs}\lambda_{s}\right)}}{\prod_{r}\int D\lambda\,e% ^{-\beta\ell_{r}\left(\frac{U_{r}}{\sqrt{T}}x+\sum_{s}\sqrt{H}_{rs}y_{s}+\sqrt% {\Delta_{r}}\lambda\right)}}\end{split}start_ROW start_CELL italic_E ( italic_γ ) end_CELL start_CELL = ∫ ∏ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_D italic_x start_POSTSUBSCRIPT italic_r italic_l end_POSTSUBSCRIPT divide start_ARG ∫ ∏ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_D italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_D italic_s over~ start_ARG roman_ℓ end_ARG ( italic_N start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT + square-root start_ARG roman_Ξ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT end_ARG italic_s ) italic_e start_POSTSUPERSCRIPT - italic_β ∑ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT roman_ℓ ( italic_M start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT + divide start_ARG roman_Ω start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT end_ARG start_ARG square-root start_ARG roman_Ξ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT end_ARG end_ARG italic_s + ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT [ square-root start_ARG roman_Λ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT end_ARG ] start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT end_ARG start_ARG ∏ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ∫ italic_D italic_λ italic_e start_POSTSUPERSCRIPT - italic_β roman_ℓ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_M start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT + square-root start_ARG roman_Δ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG italic_λ ) end_POSTSUPERSCRIPT end_ARG end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = ∫ italic_D italic_x ∏ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_D italic_y start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT divide start_ARG ∫ ∏ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_D italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_D italic_s over~ start_ARG roman_ℓ end_ARG ( square-root start_ARG italic_T end_ARG italic_x + square-root start_ARG roman_Ξ end_ARG italic_s ) italic_e start_POSTSUPERSCRIPT - italic_β ∑ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( divide start_ARG italic_U start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG italic_T end_ARG end_ARG italic_x + ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT square-root start_ARG italic_H end_ARG start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT + divide start_ARG roman_Ω start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Ξ end_ARG end_ARG italic_s + ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT [ square-root start_ARG roman_Λ end_ARG ] start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT end_ARG start_ARG ∏ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ∫ italic_D italic_λ italic_e start_POSTSUPERSCRIPT - italic_β roman_ℓ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( divide start_ARG italic_U start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG italic_T end_ARG end_ARG italic_x + ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT square-root start_ARG italic_H end_ARG start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT + square-root start_ARG roman_Δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG italic_λ ) end_POSTSUPERSCRIPT end_ARG end_CELL end_ROW (52)

where we have defined the quantities

Hrssubscript𝐻𝑟𝑠\displaystyle H_{rs}italic_H start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT ΨrsUrUsTabsentsubscriptΨ𝑟𝑠subscript𝑈𝑟subscript𝑈𝑠𝑇\displaystyle\equiv\Psi_{rs}-\frac{U_{r}U_{s}}{T}≡ roman_Ψ start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT - divide start_ARG italic_U start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_U start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_ARG start_ARG italic_T end_ARG (53a)
ΛrssubscriptΛ𝑟𝑠\displaystyle\Lambda_{rs}roman_Λ start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT ΔrδrsΩrΩsΞabsentsubscriptΔ𝑟subscript𝛿𝑟𝑠subscriptΩ𝑟subscriptΩ𝑠Ξ\displaystyle\equiv\Delta_{r}\delta_{rs}-\frac{\Omega_{r}\Omega_{s}}{\Xi}≡ roman_Δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT - divide start_ARG roman_Ω start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT roman_Ω start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_ARG start_ARG roman_Ξ end_ARG (53b)

and

ΨrssubscriptΨ𝑟𝑠\displaystyle\Psi_{rs}roman_Ψ start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT φrλφsλxφrλ,xφsλ,xabsentsubscriptdelimited-⟨⟩subscriptdelimited-⟨⟩subscript𝜑𝑟𝜆subscriptdelimited-⟨⟩subscript𝜑𝑠𝜆𝑥subscriptdelimited-⟨⟩subscript𝜑𝑟𝜆𝑥subscriptdelimited-⟨⟩subscript𝜑𝑠𝜆𝑥\displaystyle\equiv\langle\langle\varphi_{r}\rangle_{\lambda}\langle\varphi_{s% }\rangle_{\lambda}\rangle_{x}-\langle\varphi_{r}\rangle_{\lambda,x}\langle% \varphi_{s}\rangle_{\lambda,x}≡ ⟨ ⟨ italic_φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ⟨ italic_φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT - ⟨ italic_φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ , italic_x end_POSTSUBSCRIPT ⟨ italic_φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ , italic_x end_POSTSUBSCRIPT (54a)
ΔrsubscriptΔ𝑟\displaystyle\Delta_{r}roman_Δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT φ2λ,xφrλ2xabsentsubscriptdelimited-⟨⟩superscript𝜑2𝜆𝑥subscriptdelimited-⟨⟩superscriptsubscriptdelimited-⟨⟩subscript𝜑𝑟𝜆2𝑥\displaystyle\equiv\langle\varphi^{2}\rangle_{\lambda,x}-\langle\langle\varphi% _{r}\rangle_{\lambda}^{2}\rangle_{x}≡ ⟨ italic_φ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ , italic_x end_POSTSUBSCRIPT - ⟨ ⟨ italic_φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⟩ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT (54b)
T𝑇\displaystyle Titalic_T φ~λ2xφ~λ,x2absentsubscriptdelimited-⟨⟩subscriptsuperscriptdelimited-⟨⟩~𝜑2𝜆𝑥superscriptsubscriptdelimited-⟨⟩~𝜑𝜆𝑥2\displaystyle\equiv\langle\langle\tilde{\varphi}\rangle^{2}_{\lambda}\rangle_{% x}-\langle\tilde{\varphi}\rangle_{\lambda,x}^{2}≡ ⟨ ⟨ over~ start_ARG italic_φ end_ARG ⟩ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT - ⟨ over~ start_ARG italic_φ end_ARG ⟩ start_POSTSUBSCRIPT italic_λ , italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (54c)
ΞΞ\displaystyle\Xiroman_Ξ φ~2λ,xφ~λ2xabsentsubscriptdelimited-⟨⟩superscript~𝜑2𝜆𝑥subscriptdelimited-⟨⟩subscriptsuperscriptdelimited-⟨⟩~𝜑2𝜆𝑥\displaystyle\equiv\langle\tilde{\varphi}^{2}\rangle_{\lambda,x}-\langle% \langle\tilde{\varphi}\rangle^{2}_{\lambda}\rangle_{x}≡ ⟨ over~ start_ARG italic_φ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ , italic_x end_POSTSUBSCRIPT - ⟨ ⟨ over~ start_ARG italic_φ end_ARG ⟩ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT (54d)
Ursubscript𝑈𝑟\displaystyle U_{r}italic_U start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT φrλφ~λxφλ,xφ~λ,xabsentsubscriptdelimited-⟨⟩subscriptdelimited-⟨⟩subscript𝜑𝑟𝜆subscriptdelimited-⟨⟩~𝜑𝜆𝑥subscriptdelimited-⟨⟩𝜑𝜆𝑥subscriptdelimited-⟨⟩~𝜑𝜆𝑥\displaystyle\equiv\langle\langle\varphi_{r}\rangle_{\lambda}\langle\tilde{% \varphi}\rangle_{\lambda}\rangle_{x}-\langle\varphi\rangle_{\lambda,x}\langle% \tilde{\varphi}\rangle_{\lambda,x}≡ ⟨ ⟨ italic_φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ⟨ over~ start_ARG italic_φ end_ARG ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT - ⟨ italic_φ ⟩ start_POSTSUBSCRIPT italic_λ , italic_x end_POSTSUBSCRIPT ⟨ over~ start_ARG italic_φ end_ARG ⟩ start_POSTSUBSCRIPT italic_λ , italic_x end_POSTSUBSCRIPT (54e)
ΩrsubscriptΩ𝑟\displaystyle\Omega_{r}roman_Ω start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT φ~φrλxφ~λφrλxabsentsubscriptdelimited-⟨⟩subscriptdelimited-⟨⟩~𝜑subscript𝜑𝑟𝜆𝑥subscriptdelimited-⟨⟩subscriptdelimited-⟨⟩~𝜑𝜆subscriptdelimited-⟨⟩subscript𝜑𝑟𝜆𝑥\displaystyle\equiv\langle\langle\tilde{\varphi}\varphi_{r}\rangle_{\lambda}% \rangle_{x}-\langle\langle\tilde{\varphi}\rangle_{\lambda}\langle\varphi_{r}% \rangle_{\lambda}\rangle_{x}≡ ⟨ ⟨ over~ start_ARG italic_φ end_ARG italic_φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT - ⟨ ⟨ over~ start_ARG italic_φ end_ARG ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ⟨ italic_φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT (54f)

Note that in doing the last step in equation (52) we have used the fact that M=N=0𝑀𝑁0M=N=0italic_M = italic_N = 0 since they are both proportional to lclsubscript𝑙subscript𝑐𝑙\sum_{l}c_{l}∑ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT. We show in appendix C that the quantities above can be all written in terms of the following function that depends on the activation function φ𝜑\varphiitalic_φ

ΔQ(q)Dx[Dyφ(qx+Qqy)]2subscriptΔ𝑄𝑞𝐷𝑥superscriptdelimited-[]𝐷𝑦𝜑𝑞𝑥𝑄𝑞𝑦2\Delta_{Q}(q)\equiv\int Dx\left[\int Dy\,\varphi\left(\sqrt{q}x+\sqrt{Q-q}y% \right)\right]^{2}roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_q ) ≡ ∫ italic_D italic_x [ ∫ italic_D italic_y italic_φ ( square-root start_ARG italic_q end_ARG italic_x + square-root start_ARG italic_Q - italic_q end_ARG italic_y ) ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (55)

as

ΨrssubscriptΨ𝑟𝑠\displaystyle\Psi_{rs}roman_Ψ start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT =ΔQ(trs)ΔQ(0)absentsubscriptΔ𝑄subscript𝑡𝑟𝑠subscriptΔ𝑄0\displaystyle=\Delta_{Q}(t_{rs})-\Delta_{Q}(0)= roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( 0 ) (56a)
ΔrsubscriptΔ𝑟\displaystyle\Delta_{r}roman_Δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT =ΔQ(Q)ΔQ(trr)absentsubscriptΔ𝑄𝑄subscriptΔ𝑄subscript𝑡𝑟𝑟\displaystyle=\Delta_{Q}(Q)-\Delta_{Q}(t_{rr})= roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_Q ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_r italic_r end_POSTSUBSCRIPT ) (56b)
T𝑇\displaystyle Titalic_T =ΔQ(γ2q1+(1γ)2q2+2γ(1γ)pcγ2)ΔQ(0)absentsubscriptΔ𝑄superscript𝛾2subscript𝑞1superscript1𝛾2subscript𝑞22𝛾1𝛾𝑝superscriptsubscript𝑐𝛾2subscriptΔ𝑄0\displaystyle=\Delta_{Q}\left(\frac{\gamma^{2}q_{1}+(1-\gamma)^{2}q_{2}+2% \gamma(1-\gamma)p}{c_{\gamma}^{2}}\right)-\Delta_{Q}(0)= roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( divide start_ARG italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ( 1 - italic_γ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + 2 italic_γ ( 1 - italic_γ ) italic_p end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( 0 ) (56c)
ΞΞ\displaystyle\Xiroman_Ξ =ΔQ(Q)ΔQ(γ2q1+(1γ)2q2+2γ(1γ)pcγ2)absentsubscriptΔ𝑄𝑄subscriptΔ𝑄superscript𝛾2subscript𝑞1superscript1𝛾2subscript𝑞22𝛾1𝛾𝑝superscriptsubscript𝑐𝛾2\displaystyle=\Delta_{Q}(Q)-\Delta_{Q}\left(\frac{\gamma^{2}q_{1}+(1-\gamma)^{% 2}q_{2}+2\gamma(1-\gamma)p}{c_{\gamma}^{2}}\right)= roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_Q ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( divide start_ARG italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ( 1 - italic_γ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + 2 italic_γ ( 1 - italic_γ ) italic_p end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) (56d)
U1subscript𝑈1\displaystyle U_{1}italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT =ΔQ(γq1+(1γ)pcγ)ΔQ(0)absentsubscriptΔ𝑄𝛾subscript𝑞11𝛾𝑝subscript𝑐𝛾subscriptΔ𝑄0\displaystyle=\Delta_{Q}\left(\frac{\gamma q_{1}+(1-\gamma)p}{c_{\gamma}}% \right)-\Delta_{Q}(0)= roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( divide start_ARG italic_γ italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ( 1 - italic_γ ) italic_p end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( 0 ) (56e)
U2subscript𝑈2\displaystyle U_{2}italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT =ΔQ((1γ)q2+γpcγ)ΔQ(0)absentsubscriptΔ𝑄1𝛾subscript𝑞2𝛾𝑝subscript𝑐𝛾subscriptΔ𝑄0\displaystyle=\Delta_{Q}\left(\frac{(1-\gamma)q_{2}+\gamma p}{c_{\gamma}}% \right)-\Delta_{Q}(0)= roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( divide start_ARG ( 1 - italic_γ ) italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_γ italic_p end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( 0 ) (56f)
Ω1subscriptΩ1\displaystyle\Omega_{1}roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT =ΔQ(γQ+(1γ)pcγ)ΔQ(γq1+(1γ)pcγ)absentsubscriptΔ𝑄𝛾𝑄1𝛾𝑝subscript𝑐𝛾subscriptΔ𝑄𝛾subscript𝑞11𝛾𝑝subscript𝑐𝛾\displaystyle=\Delta_{Q}\left(\frac{\gamma Q+(1-\gamma)p}{c_{\gamma}}\right)-% \Delta_{Q}\left(\frac{\gamma q_{1}+(1-\gamma)p}{c_{\gamma}}\right)= roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( divide start_ARG italic_γ italic_Q + ( 1 - italic_γ ) italic_p end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( divide start_ARG italic_γ italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ( 1 - italic_γ ) italic_p end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG ) (56g)
Ω2subscriptΩ2\displaystyle\Omega_{2}roman_Ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT =ΔQ((1γ)Q+γpcγ)ΔQ((1γ)q2+γpcγ)absentsubscriptΔ𝑄1𝛾𝑄𝛾𝑝subscript𝑐𝛾subscriptΔ𝑄1𝛾subscript𝑞2𝛾𝑝subscript𝑐𝛾\displaystyle=\Delta_{Q}\left(\frac{(1-\gamma)Q+\gamma p}{c_{\gamma}}\right)-% \Delta_{Q}\left(\frac{(1-\gamma)q_{2}+\gamma p}{c_{\gamma}}\right)= roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( divide start_ARG ( 1 - italic_γ ) italic_Q + italic_γ italic_p end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( divide start_ARG ( 1 - italic_γ ) italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_γ italic_p end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG ) (56h)

We can simplify (52) by performing 2-dimensional rotations of the integration over the λ𝜆\lambdaitalic_λ variables

E(γ)=DxDy1Dy2Ds~(U1Ψ11x+U2Ψ11U1Ψ12Ψ11detΨy1+TdetHdetΨy2+Ξs)Dλ1Dλ2eβ1(Ψ11x+Ω1Ξs+Λ11λ1)eβ2(Ψ12Ψ11x+detΨΨ11y1+Ω2Ξs+Λ12Λ11λ1+detΛΛ11λ2)Dλeβ1(Ψ11x+Δ1λ)Dλeβ2(Ψ12Ψ11x+detΨΨ11y1+Δ2λ)𝐸𝛾𝐷𝑥𝐷subscript𝑦1𝐷subscript𝑦2𝐷𝑠~subscript𝑈1subscriptΨ11𝑥subscript𝑈2subscriptΨ11subscript𝑈1subscriptΨ12subscriptΨ11Ψsubscript𝑦1𝑇𝐻Ψsubscript𝑦2Ξ𝑠𝐷subscript𝜆1𝐷subscript𝜆2superscript𝑒𝛽subscript1subscriptΨ11𝑥subscriptΩ1Ξ𝑠subscriptΛ11subscript𝜆1superscript𝑒𝛽subscript2subscriptΨ12subscriptΨ11𝑥ΨsubscriptΨ11subscript𝑦1subscriptΩ2Ξ𝑠subscriptΛ12subscriptΛ11subscript𝜆1ΛsubscriptΛ11subscript𝜆2𝐷𝜆superscript𝑒𝛽subscript1subscriptΨ11𝑥subscriptΔ1𝜆𝐷𝜆superscript𝑒𝛽subscript2subscriptΨ12subscriptΨ11𝑥ΨsubscriptΨ11subscript𝑦1subscriptΔ2𝜆\begin{split}E(\gamma)&=\int DxDy_{1}Dy_{2}Ds\,\tilde{\ell}\left(\frac{U_{1}}{% \sqrt{\Psi_{11}}}x+\frac{U_{2}\Psi_{11}-U_{1}\Psi_{12}}{\sqrt{\Psi_{11}\det% \Psi}}y_{1}+\sqrt{\frac{T\det H}{\det\Psi}}y_{2}+\sqrt{\Xi}s\right)\\ &\frac{\int D\lambda_{1}D\lambda_{2}\,e^{-\beta\ell_{1}\left(\sqrt{\Psi_{11}}x% +\frac{\Omega_{1}}{\sqrt{\Xi}}s+\sqrt{\Lambda_{11}}\lambda_{1}\right)}e^{-% \beta\ell_{2}\left(\frac{\Psi_{12}}{\sqrt{\Psi_{11}}}x+\sqrt{\frac{\det\Psi}{% \Psi_{11}}}y_{1}+\frac{\Omega_{2}}{\sqrt{\Xi}}s+\frac{\Lambda_{12}}{\sqrt{% \Lambda_{11}}}\lambda_{1}+\sqrt{\frac{\det\Lambda}{\Lambda_{11}}}\lambda_{2}% \right)}}{\int D\lambda\,e^{-\beta\ell_{1}\left(\sqrt{\Psi_{11}}x+\sqrt{\Delta% _{1}}\lambda\right)}\int D\lambda\,e^{-\beta\ell_{2}\left(\frac{\Psi_{12}}{% \sqrt{\Psi_{11}}}x+\sqrt{\frac{\det\Psi}{\Psi_{11}}}y_{1}+\sqrt{\Delta_{2}}% \lambda\right)}}\end{split}start_ROW start_CELL italic_E ( italic_γ ) end_CELL start_CELL = ∫ italic_D italic_x italic_D italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_D italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_D italic_s over~ start_ARG roman_ℓ end_ARG ( divide start_ARG italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG end_ARG italic_x + divide start_ARG italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT - italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_Ψ start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT roman_det roman_Ψ end_ARG end_ARG italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + square-root start_ARG divide start_ARG italic_T roman_det italic_H end_ARG start_ARG roman_det roman_Ψ end_ARG end_ARG italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + square-root start_ARG roman_Ξ end_ARG italic_s ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL divide start_ARG ∫ italic_D italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_D italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - italic_β roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( square-root start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG italic_x + divide start_ARG roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Ξ end_ARG end_ARG italic_s + square-root start_ARG roman_Λ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_β roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( divide start_ARG roman_Ψ start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG end_ARG italic_x + square-root start_ARG divide start_ARG roman_det roman_Ψ end_ARG start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG end_ARG italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + divide start_ARG roman_Ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Ξ end_ARG end_ARG italic_s + divide start_ARG roman_Λ start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Λ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG end_ARG italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + square-root start_ARG divide start_ARG roman_det roman_Λ end_ARG start_ARG roman_Λ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG end_ARG italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT end_ARG start_ARG ∫ italic_D italic_λ italic_e start_POSTSUPERSCRIPT - italic_β roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( square-root start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG italic_x + square-root start_ARG roman_Δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG italic_λ ) end_POSTSUPERSCRIPT ∫ italic_D italic_λ italic_e start_POSTSUPERSCRIPT - italic_β roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( divide start_ARG roman_Ψ start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG end_ARG italic_x + square-root start_ARG divide start_ARG roman_det roman_Ψ end_ARG start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG end_ARG italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + square-root start_ARG roman_Δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG italic_λ ) end_POSTSUPERSCRIPT end_ARG end_CELL end_ROW (57)

Notice how the variable y2subscript𝑦2y_{2}italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT appears only in ~~\tilde{\ell}over~ start_ARG roman_ℓ end_ARG. Performing a rotation over λ1subscript𝜆1\lambda_{1}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and s𝑠sitalic_s

ϵt(γ)=DxDy1Dy2Dλ1Ds~(U1Ψ11x+U2Ψ11U1Ψ12Ψ11detΨy1+TdetHdetΨy2+Ω1Δ1λ1+Λ11ΞΔ1s)eβ1(Ψ11x+Δ1λ1)Dλ2eβ2(Ψ12Ψ11x+detΨΨ11y1+Δ1ΞΛ11Ω2s+detΛΛ11λ2)Dλeβ1(Ψ11x+Δ1λ)Dλeβ2(Ψ12Ψ11x+detΨΨ11y1+Δ2λ)subscriptitalic-ϵ𝑡𝛾𝐷𝑥𝐷subscript𝑦1𝐷subscript𝑦2𝐷subscript𝜆1𝐷𝑠~subscript𝑈1subscriptΨ11𝑥subscript𝑈2subscriptΨ11subscript𝑈1subscriptΨ12subscriptΨ11Ψsubscript𝑦1𝑇𝐻Ψsubscript𝑦2subscriptΩ1subscriptΔ1subscript𝜆1subscriptΛ11ΞsubscriptΔ1𝑠superscript𝑒𝛽subscript1subscriptΨ11𝑥subscriptΔ1subscript𝜆1𝐷subscript𝜆2superscript𝑒𝛽subscript2subscriptΨ12subscriptΨ11𝑥ΨsubscriptΨ11subscript𝑦1subscriptΔ1ΞsubscriptΛ11subscriptΩ2𝑠ΛsubscriptΛ11subscript𝜆2𝐷𝜆superscript𝑒𝛽subscript1subscriptΨ11𝑥subscriptΔ1𝜆𝐷𝜆superscript𝑒𝛽subscript2subscriptΨ12subscriptΨ11𝑥ΨsubscriptΨ11subscript𝑦1subscriptΔ2𝜆\begin{split}\epsilon_{t}(\gamma)&=\int DxDy_{1}Dy_{2}\int D\lambda_{1}Ds\,% \tilde{\ell}\left(\frac{U_{1}}{\sqrt{\Psi_{11}}}x+\frac{U_{2}\Psi_{11}-U_{1}% \Psi_{12}}{\sqrt{\Psi_{11}\det\Psi}}y_{1}+\sqrt{\frac{T\det H}{\det\Psi}}y_{2}% +\frac{\Omega_{1}}{\sqrt{\Delta_{1}}}\lambda_{1}+\sqrt{\frac{\Lambda_{11}\Xi}{% \Delta_{1}}}s\right)\\ &\frac{e^{-\beta\ell_{1}\left(\sqrt{\Psi_{11}}x+\sqrt{\Delta_{1}}\lambda_{1}% \right)}\int D\lambda_{2}\,e^{-\beta\ell_{2}\left(\frac{\Psi_{12}}{\sqrt{\Psi_% {11}}}x+\sqrt{\frac{\det\Psi}{\Psi_{11}}}y_{1}+\sqrt{\frac{\Delta_{1}}{\Xi% \Lambda_{11}}}\Omega_{2}s+\sqrt{\frac{\det\Lambda}{\Lambda_{11}}}\lambda_{2}% \right)}}{\int D\lambda\,e^{-\beta\ell_{1}\left(\sqrt{\Psi_{11}}x+\sqrt{\Delta% _{1}}\lambda\right)}\int D\lambda\,e^{-\beta\ell_{2}\left(\frac{\Psi_{12}}{% \sqrt{\Psi_{11}}}x+\sqrt{\frac{\det\Psi}{\Psi_{11}}}y_{1}+\sqrt{\Delta_{2}}% \lambda\right)}}\end{split}start_ROW start_CELL italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_γ ) end_CELL start_CELL = ∫ italic_D italic_x italic_D italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_D italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∫ italic_D italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_D italic_s over~ start_ARG roman_ℓ end_ARG ( divide start_ARG italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG end_ARG italic_x + divide start_ARG italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT - italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_Ψ start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT roman_det roman_Ψ end_ARG end_ARG italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + square-root start_ARG divide start_ARG italic_T roman_det italic_H end_ARG start_ARG roman_det roman_Ψ end_ARG end_ARG italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + divide start_ARG roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG end_ARG italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + square-root start_ARG divide start_ARG roman_Λ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT roman_Ξ end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG end_ARG italic_s ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL divide start_ARG italic_e start_POSTSUPERSCRIPT - italic_β roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( square-root start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG italic_x + square-root start_ARG roman_Δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ∫ italic_D italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - italic_β roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( divide start_ARG roman_Ψ start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG end_ARG italic_x + square-root start_ARG divide start_ARG roman_det roman_Ψ end_ARG start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG end_ARG italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + square-root start_ARG divide start_ARG roman_Δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG roman_Ξ roman_Λ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG end_ARG roman_Ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_s + square-root start_ARG divide start_ARG roman_det roman_Λ end_ARG start_ARG roman_Λ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG end_ARG italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT end_ARG start_ARG ∫ italic_D italic_λ italic_e start_POSTSUPERSCRIPT - italic_β roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( square-root start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG italic_x + square-root start_ARG roman_Δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG italic_λ ) end_POSTSUPERSCRIPT ∫ italic_D italic_λ italic_e start_POSTSUPERSCRIPT - italic_β roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( divide start_ARG roman_Ψ start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG end_ARG italic_x + square-root start_ARG divide start_ARG roman_det roman_Ψ end_ARG start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG end_ARG italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + square-root start_ARG roman_Δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG italic_λ ) end_POSTSUPERSCRIPT end_ARG end_CELL end_ROW (58)

Notice how λ1subscript𝜆1\lambda_{1}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT disappears from the argument of the second loss function 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Finally (57) can be further simplified by letting appear the same argument in 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT in the numerator and denominator. This can be obtained by performing a rotation over the Gaussian variables s𝑠sitalic_s and λ2subscript𝜆2\lambda_{2}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. After one can integrate explicitly over y2subscript𝑦2y_{2}italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT obtaining

E(γ)=DxDy1Dλ1Dλ2eβ1(Ψ11x+Δ1λ1)eβ2(Ψ12Ψ11x+detΨΨ11y1+Δ2λ2)Dλeβ1(Ψ11x+Δ1λ)Dλeβ2(Ψ12Ψ11x+detΨΨ11y1+Δ2λ)Ds~(U1Ψ11x+U2Ψ11U1Ψ12Ψ11detΨy1+Ω1Δ1λ1+Ω2Δ2λ2+ΞΩ12Δ1Ω22Δ2+TdetHdetΨs)𝐸𝛾𝐷𝑥𝐷subscript𝑦1𝐷subscript𝜆1𝐷subscript𝜆2superscript𝑒𝛽subscript1subscriptΨ11𝑥subscriptΔ1subscript𝜆1superscript𝑒𝛽subscript2subscriptΨ12subscriptΨ11𝑥ΨsubscriptΨ11subscript𝑦1subscriptΔ2subscript𝜆2𝐷𝜆superscript𝑒𝛽subscript1subscriptΨ11𝑥subscriptΔ1𝜆𝐷𝜆superscript𝑒𝛽subscript2subscriptΨ12subscriptΨ11𝑥ΨsubscriptΨ11subscript𝑦1subscriptΔ2𝜆𝐷𝑠~subscript𝑈1subscriptΨ11𝑥subscript𝑈2subscriptΨ11subscript𝑈1subscriptΨ12subscriptΨ11Ψsubscript𝑦1subscriptΩ1subscriptΔ1subscript𝜆1subscriptΩ2subscriptΔ2subscript𝜆2ΞsuperscriptsubscriptΩ12subscriptΔ1superscriptsubscriptΩ22subscriptΔ2𝑇𝐻Ψ𝑠\begin{split}E(\gamma)&=\int DxDy_{1}\int D\lambda_{1}D\lambda_{2}\,\frac{e^{-% \beta\ell_{1}\left(\sqrt{\Psi_{11}}x+\sqrt{\Delta_{1}}\lambda_{1}\right)}\,e^{% -\beta\ell_{2}\left(\frac{\Psi_{12}}{\sqrt{\Psi_{11}}}x+\sqrt{\frac{\det\Psi}{% \Psi_{11}}}y_{1}+\sqrt{\Delta_{2}}\lambda_{2}\right)}}{\int D\lambda\,e^{-% \beta\ell_{1}\left(\sqrt{\Psi_{11}}x+\sqrt{\Delta_{1}}\lambda\right)}\int D% \lambda\,e^{-\beta\ell_{2}\left(\frac{\Psi_{12}}{\sqrt{\Psi_{11}}}x+\sqrt{% \frac{\det\Psi}{\Psi_{11}}}y_{1}+\sqrt{\Delta_{2}}\lambda\right)}}\\ &\int Ds\,\tilde{\ell}\left(\frac{U_{1}}{\sqrt{\Psi_{11}}}x+\frac{U_{2}\Psi_{1% 1}-U_{1}\Psi_{12}}{\sqrt{\Psi_{11}\det\Psi}}y_{1}+\frac{\Omega_{1}}{\sqrt{% \Delta_{1}}}\lambda_{1}+\frac{\Omega_{2}}{\sqrt{\Delta_{2}}}\lambda_{2}+\sqrt{% \Xi-\frac{\Omega_{1}^{2}}{\Delta_{1}}-\frac{\Omega_{2}^{2}}{\Delta_{2}}+\frac{% T\det H}{\det\Psi}}s\right)\end{split}start_ROW start_CELL italic_E ( italic_γ ) end_CELL start_CELL = ∫ italic_D italic_x italic_D italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∫ italic_D italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_D italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT divide start_ARG italic_e start_POSTSUPERSCRIPT - italic_β roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( square-root start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG italic_x + square-root start_ARG roman_Δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_β roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( divide start_ARG roman_Ψ start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG end_ARG italic_x + square-root start_ARG divide start_ARG roman_det roman_Ψ end_ARG start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG end_ARG italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + square-root start_ARG roman_Δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT end_ARG start_ARG ∫ italic_D italic_λ italic_e start_POSTSUPERSCRIPT - italic_β roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( square-root start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG italic_x + square-root start_ARG roman_Δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG italic_λ ) end_POSTSUPERSCRIPT ∫ italic_D italic_λ italic_e start_POSTSUPERSCRIPT - italic_β roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( divide start_ARG roman_Ψ start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG end_ARG italic_x + square-root start_ARG divide start_ARG roman_det roman_Ψ end_ARG start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG end_ARG italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + square-root start_ARG roman_Δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG italic_λ ) end_POSTSUPERSCRIPT end_ARG end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ∫ italic_D italic_s over~ start_ARG roman_ℓ end_ARG ( divide start_ARG italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG end_ARG italic_x + divide start_ARG italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT - italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_Ψ start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT roman_det roman_Ψ end_ARG end_ARG italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + divide start_ARG roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG end_ARG italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + divide start_ARG roman_Ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + square-root start_ARG roman_Ξ - divide start_ARG roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG - divide start_ARG roman_Ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG + divide start_ARG italic_T roman_det italic_H end_ARG start_ARG roman_det roman_Ψ end_ARG end_ARG italic_s ) end_CELL end_ROW (59)

B.4 Analytical predictions in some particular cases

In the following we will specialize equation (59) to several interesting cases depending on the Boltzmann distribution through which the endpoints γ=0,1𝛾01\gamma=0,1italic_γ = 0 , 1 are sampled.

B.4.1 Error counting loss with a margin

Refer to caption
Refer to caption
Figure 10: Training error on the geodesic paths among differently sampled solutions of the erf activation function with fixed squared norm Q=1𝑄1Q=1italic_Q = 1. Left panel: Training error along the geodesic path connecting 2 solutions sampled from the loss function r(x)Θ(κrx)subscript𝑟𝑥Θsubscript𝜅𝑟𝑥\ell_{r}(x)\equiv\Theta(\kappa_{r}-x)roman_ℓ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_x ) ≡ roman_Θ ( italic_κ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT - italic_x ), r=1,2𝑟12r=1,2italic_r = 1 , 2 with equal margin κ1=κ2κsubscript𝜅1subscript𝜅2𝜅\kappa_{1}=\kappa_{2}\equiv\kappaitalic_κ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≡ italic_κ and α=0.1𝛼0.1\alpha=0.1italic_α = 0.1. The barrier between typical solutions i.e. κ=0𝜅0\kappa=0italic_κ = 0 is strictly non-vanishing; increasing the margin on the sampled solutions the barrier decreases and eventually vanishes for large enough κ𝜅\kappaitalic_κ. Right panel: Case where r(x)Θ(κrx)subscript𝑟𝑥Θsubscript𝜅𝑟𝑥\ell_{r}(x)\equiv\Theta(\kappa_{r}-x)roman_ℓ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_x ) ≡ roman_Θ ( italic_κ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT - italic_x ), r=1,2𝑟12r=1,2italic_r = 1 , 2 and ~(x)=Θ(x)~𝑥Θ𝑥\widetilde{\ell}(x)=\Theta(-x)over~ start_ARG roman_ℓ end_ARG ( italic_x ) = roman_Θ ( - italic_x ), with α=0.1𝛼0.1\alpha=0.1italic_α = 0.1. We here fixed the first endpoint (that is located at γ=1𝛾1\gamma=1italic_γ = 1) to have a margin κ1=0subscript𝜅10\kappa_{1}=0italic_κ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0. The second endpoint (γ=0𝛾0\gamma=0italic_γ = 0) has a margin κ2=0,0.1,0.2,0.3,0.4subscript𝜅200.10.20.30.4\kappa_{2}=0,0.1,0.2,0.3,0.4italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0 , 0.1 , 0.2 , 0.3 , 0.4 (curves from top to bottom). As observed in the main text, increasing the robustness κ2subscript𝜅2\kappa_{2}italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT the training error barrier on the geodesic monotonically decreases. For large enough κ2subscript𝜅2\kappa_{2}italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT the solutions become geodesically connected.

Here we specialize (59) to the case 1(x)=Θ(x+κ1)subscript1𝑥Θ𝑥subscript𝜅1\ell_{1}(x)=\Theta(-x+\kappa_{1})roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) = roman_Θ ( - italic_x + italic_κ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) and 2(x)=Θ(x+κ2)subscript2𝑥Θ𝑥subscript𝜅2\ell_{2}(x)=\Theta(-x+\kappa_{2})roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ) = roman_Θ ( - italic_x + italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) where κ1,κ20subscript𝜅1subscript𝜅20\kappa_{1}\,,\kappa_{2}\geq 0italic_κ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≥ 0 impose a certain degree of robustness 𝒘1subscript𝒘1\boldsymbol{w}_{1}bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 𝒘2subscript𝒘2\boldsymbol{w}_{2}bold_italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. We further consider ~(x)=Θ(x)~𝑥Θ𝑥\tilde{\ell}(x)=\Theta(-x)over~ start_ARG roman_ℓ end_ARG ( italic_x ) = roman_Θ ( - italic_x ) and we will focus on the β𝛽\beta\to\inftyitalic_β → ∞ limit for simplicity. Starting from (57) we have

ϵt(γ)=DxDy1×DsH(U1Ψ11x+U2Ψ11U1Ψ12Ψ11detΨy1+ΞsTdetHdetΨ)κ1Ψ11xΩ1ΞsΛ11Dλ1H(Λ11(κ2Ψ12Ψ11xdetΨΨ11y1Ω2Ξs)Λ12λ1detΛ)H(κ1Ψ11xΔ1)H(κ2Ψ12Ψ11xdetΨΨ11y1Δ2)subscriptitalic-ϵ𝑡𝛾𝐷𝑥𝐷subscript𝑦1𝐷𝑠𝐻subscript𝑈1subscriptΨ11𝑥subscript𝑈2subscriptΨ11subscript𝑈1subscriptΨ12subscriptΨ11Ψsubscript𝑦1Ξ𝑠𝑇𝐻Ψsuperscriptsubscriptsubscript𝜅1subscriptΨ11𝑥subscriptΩ1Ξ𝑠subscriptΛ11𝐷subscript𝜆1𝐻subscriptΛ11subscript𝜅2subscriptΨ12subscriptΨ11𝑥ΨsubscriptΨ11subscript𝑦1subscriptΩ2Ξ𝑠subscriptΛ12subscript𝜆1Λ𝐻subscript𝜅1subscriptΨ11𝑥subscriptΔ1𝐻subscript𝜅2subscriptΨ12subscriptΨ11𝑥ΨsubscriptΨ11subscript𝑦1subscriptΔ2\begin{split}\epsilon_{t}(\gamma)&=\int DxDy_{1}\,\\ &\times\frac{\int DsH\left(\frac{\frac{U_{1}}{\sqrt{\Psi_{11}}}x+\frac{U_{2}% \Psi_{11}-U_{1}\Psi_{12}}{\sqrt{\Psi_{11}\det\Psi}}y_{1}+\sqrt{\Xi}s}{\sqrt{% \frac{T\det H}{\det\Psi}}}\right)\int_{\frac{\kappa_{1}-\sqrt{\Psi_{11}}x-% \frac{\Omega_{1}}{\sqrt{\Xi}}s}{\sqrt{\Lambda_{11}}}}^{\infty}D\lambda_{1}\,H% \left(\frac{\sqrt{\Lambda_{11}}\left(\kappa_{2}-\frac{\Psi_{12}}{\sqrt{\Psi_{1% 1}}}x-\sqrt{\frac{\det\Psi}{\Psi_{11}}}y_{1}-\frac{\Omega_{2}}{\sqrt{\Xi}}s% \right)-\Lambda_{12}\lambda_{1}}{\sqrt{\det\Lambda}}\right)}{H\left(\frac{% \kappa_{1}-\sqrt{\Psi_{11}}x}{\sqrt{\Delta_{1}}}\right)H\left(\frac{\kappa_{2}% -\frac{\Psi_{12}}{\sqrt{\Psi_{11}}}x-\sqrt{\frac{\det\Psi}{\Psi_{11}}}y_{1}}{% \sqrt{\Delta_{2}}}\right)}\end{split}start_ROW start_CELL italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_γ ) end_CELL start_CELL = ∫ italic_D italic_x italic_D italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL × divide start_ARG ∫ italic_D italic_s italic_H ( divide start_ARG divide start_ARG italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG end_ARG italic_x + divide start_ARG italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT - italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_Ψ start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT roman_det roman_Ψ end_ARG end_ARG italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + square-root start_ARG roman_Ξ end_ARG italic_s end_ARG start_ARG square-root start_ARG divide start_ARG italic_T roman_det italic_H end_ARG start_ARG roman_det roman_Ψ end_ARG end_ARG end_ARG ) ∫ start_POSTSUBSCRIPT divide start_ARG italic_κ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - square-root start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG italic_x - divide start_ARG roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Ξ end_ARG end_ARG italic_s end_ARG start_ARG square-root start_ARG roman_Λ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_D italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_H ( divide start_ARG square-root start_ARG roman_Λ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG ( italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - divide start_ARG roman_Ψ start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG end_ARG italic_x - square-root start_ARG divide start_ARG roman_det roman_Ψ end_ARG start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG end_ARG italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - divide start_ARG roman_Ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Ξ end_ARG end_ARG italic_s ) - roman_Λ start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_det roman_Λ end_ARG end_ARG ) end_ARG start_ARG italic_H ( divide start_ARG italic_κ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - square-root start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG italic_x end_ARG start_ARG square-root start_ARG roman_Δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG end_ARG ) italic_H ( divide start_ARG italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - divide start_ARG roman_Ψ start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG end_ARG italic_x - square-root start_ARG divide start_ARG roman_det roman_Ψ end_ARG start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG end_ARG italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG ) end_ARG end_CELL end_ROW (60)

This is the expression that we have used to produce the plots in Figure 7 of the main text. We show similar plots for the Erf activation function in Figure 10. If the endpoints are sampled with the same margin κ1=κ2=κsubscript𝜅1subscript𝜅2𝜅\kappa_{1}=\kappa_{2}=\kappaitalic_κ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_κ then as stated before q1=q2=psubscript𝑞1subscript𝑞2𝑝q_{1}=q_{2}=pitalic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_p and this implies the relation TdetHdetΨ=TU2Ψ𝑇𝐻Ψ𝑇superscript𝑈2Ψ\sqrt{\frac{T\det H}{\det\Psi}}=\sqrt{T-\frac{U^{2}}{\Psi}}square-root start_ARG divide start_ARG italic_T roman_det italic_H end_ARG start_ARG roman_det roman_Ψ end_ARG end_ARG = square-root start_ARG italic_T - divide start_ARG italic_U start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_Ψ end_ARG end_ARG. In this case, the formula simplifies even further and reads

ϵt(γ)=DxDsH(UΨx+ΞsTU2Ψ)κΨxΩ1ΞsΔΩ12ΞDλ1H(ΔΩ12Ξ(κΨxΩ2Ξs)+Ω1Ω2Ξλ1detΛ)H2(κΨxΔ).subscriptitalic-ϵ𝑡𝛾𝐷𝑥𝐷𝑠𝐻𝑈Ψ𝑥Ξ𝑠𝑇superscript𝑈2Ψsuperscriptsubscript𝜅Ψ𝑥subscriptΩ1Ξ𝑠ΔsuperscriptsubscriptΩ12Ξ𝐷subscript𝜆1𝐻ΔsuperscriptsubscriptΩ12Ξ𝜅Ψ𝑥subscriptΩ2Ξ𝑠subscriptΩ1subscriptΩ2Ξsubscript𝜆1Λsuperscript𝐻2𝜅Ψ𝑥Δ\begin{split}\epsilon_{t}(\gamma)&=\int Dx\frac{\int Ds\,H\left(\frac{\frac{U}% {\sqrt{\Psi}}x+\sqrt{\Xi}s}{\sqrt{T-\frac{U^{2}}{\Psi}}}\right)\,\int_{\frac{% \kappa-\sqrt{\Psi}x-\frac{\Omega_{1}}{\sqrt{\Xi}}s}{\sqrt{\Delta-\frac{\Omega_% {1}^{2}}{\Xi}}}}^{\infty}D\lambda_{1}\,H\left(\frac{\sqrt{\Delta-\frac{\Omega_% {1}^{2}}{\Xi}}\left(\kappa-\sqrt{\Psi}x-\frac{\Omega_{2}}{\sqrt{\Xi}}s\right)+% \frac{\Omega_{1}\Omega_{2}}{\Xi}\lambda_{1}}{\sqrt{\det\Lambda}}\right)}{H^{2}% \left(\frac{\kappa-\sqrt{\Psi}x}{\sqrt{\Delta}}\right)}\,.\end{split}start_ROW start_CELL italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_γ ) end_CELL start_CELL = ∫ italic_D italic_x divide start_ARG ∫ italic_D italic_s italic_H ( divide start_ARG divide start_ARG italic_U end_ARG start_ARG square-root start_ARG roman_Ψ end_ARG end_ARG italic_x + square-root start_ARG roman_Ξ end_ARG italic_s end_ARG start_ARG square-root start_ARG italic_T - divide start_ARG italic_U start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_Ψ end_ARG end_ARG end_ARG ) ∫ start_POSTSUBSCRIPT divide start_ARG italic_κ - square-root start_ARG roman_Ψ end_ARG italic_x - divide start_ARG roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Ξ end_ARG end_ARG italic_s end_ARG start_ARG square-root start_ARG roman_Δ - divide start_ARG roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_Ξ end_ARG end_ARG end_ARG end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_D italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_H ( divide start_ARG square-root start_ARG roman_Δ - divide start_ARG roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_Ξ end_ARG end_ARG ( italic_κ - square-root start_ARG roman_Ψ end_ARG italic_x - divide start_ARG roman_Ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Ξ end_ARG end_ARG italic_s ) + divide start_ARG roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_Ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG roman_Ξ end_ARG italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_det roman_Λ end_ARG end_ARG ) end_ARG start_ARG italic_H start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( divide start_ARG italic_κ - square-root start_ARG roman_Ψ end_ARG italic_x end_ARG start_ARG square-root start_ARG roman_Δ end_ARG end_ARG ) end_ARG . end_CELL end_ROW (61)

B.4.2 Equal general loss functions at finite temperature

We will here suppose that 1=2=~subscript1subscript2~\ell_{1}=\ell_{2}=\tilde{\ell}\equiv\ellroman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = over~ start_ARG roman_ℓ end_ARG ≡ roman_ℓ and β<𝛽\beta<\inftyitalic_β < ∞. In this case q1=q2=pqsubscript𝑞1subscript𝑞2𝑝𝑞q_{1}=q_{2}=p\equiv qitalic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_p ≡ italic_q, so that Ψrs=ΔQ(q)ΔQ(0)ΨsubscriptΨ𝑟𝑠subscriptΔ𝑄𝑞subscriptΔ𝑄0Ψ\Psi_{rs}=\Delta_{Q}(q)-\Delta_{Q}(0)\equiv\Psiroman_Ψ start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT = roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_q ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( 0 ) ≡ roman_Ψ, U1=U2U=ΔQ(qcγ)ΔQ(0)subscript𝑈1subscript𝑈2𝑈subscriptΔ𝑄𝑞subscript𝑐𝛾subscriptΔ𝑄0U_{1}=U_{2}\equiv U=\Delta_{Q}\left(\frac{q}{c_{\gamma}}\right)-\Delta_{Q}(0)italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≡ italic_U = roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( divide start_ARG italic_q end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( 0 ) and Δ1=Δ2Δ=ΔQ(Q)ΔQ(q)subscriptΔ1subscriptΔ2ΔsubscriptΔ𝑄𝑄subscriptΔ𝑄𝑞\Delta_{1}=\Delta_{2}\equiv\Delta=\Delta_{Q}(Q)-\Delta_{Q}(q)roman_Δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = roman_Δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≡ roman_Δ = roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_Q ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_q ). In this case we obtain

E(γ)=DxDλ1Dλ2eβ(Ψx+Δλ1)eβ(Ψx+Δλ2)[Dλeβ(Ψx+Δλ)]2Ds(UΨx+Ω1Δλ1+Ω2Δλ2+ΞΩ12ΔΩ22Δ+TU2Ψs)𝐸𝛾𝐷𝑥𝐷subscript𝜆1𝐷subscript𝜆2superscript𝑒𝛽Ψ𝑥Δsubscript𝜆1superscript𝑒𝛽Ψ𝑥Δsubscript𝜆2superscriptdelimited-[]𝐷𝜆superscript𝑒𝛽Ψ𝑥Δ𝜆2𝐷𝑠𝑈Ψ𝑥subscriptΩ1Δsubscript𝜆1subscriptΩ2Δsubscript𝜆2ΞsuperscriptsubscriptΩ12ΔsuperscriptsubscriptΩ22Δ𝑇superscript𝑈2Ψ𝑠\begin{split}E(\gamma)&=\int DxD\lambda_{1}D\lambda_{2}\,\frac{e^{-\beta\ell% \left(\sqrt{\Psi}x+\sqrt{\Delta}\lambda_{1}\right)}\,e^{-\beta\ell\left(\sqrt{% \Psi}x+\sqrt{\Delta}\lambda_{2}\right)}}{\left[\int D\lambda\,e^{-\beta\ell% \left(\sqrt{\Psi}x+\sqrt{\Delta}\lambda\right)}\right]^{2}}\\ &\int Ds\,\ell\left(\frac{U}{\sqrt{\Psi}}x+\frac{\Omega_{1}}{\sqrt{\Delta}}% \lambda_{1}+\frac{\Omega_{2}}{\sqrt{\Delta}}\lambda_{2}+\sqrt{\Xi-\frac{\Omega% _{1}^{2}}{\Delta}-\frac{\Omega_{2}^{2}}{\Delta}+T-\frac{U^{2}}{\Psi}}s\right)% \end{split}start_ROW start_CELL italic_E ( italic_γ ) end_CELL start_CELL = ∫ italic_D italic_x italic_D italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_D italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT divide start_ARG italic_e start_POSTSUPERSCRIPT - italic_β roman_ℓ ( square-root start_ARG roman_Ψ end_ARG italic_x + square-root start_ARG roman_Δ end_ARG italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_β roman_ℓ ( square-root start_ARG roman_Ψ end_ARG italic_x + square-root start_ARG roman_Δ end_ARG italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT end_ARG start_ARG [ ∫ italic_D italic_λ italic_e start_POSTSUPERSCRIPT - italic_β roman_ℓ ( square-root start_ARG roman_Ψ end_ARG italic_x + square-root start_ARG roman_Δ end_ARG italic_λ ) end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ∫ italic_D italic_s roman_ℓ ( divide start_ARG italic_U end_ARG start_ARG square-root start_ARG roman_Ψ end_ARG end_ARG italic_x + divide start_ARG roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Δ end_ARG end_ARG italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + divide start_ARG roman_Ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Δ end_ARG end_ARG italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + square-root start_ARG roman_Ξ - divide start_ARG roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_Δ end_ARG - divide start_ARG roman_Ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_Δ end_ARG + italic_T - divide start_ARG italic_U start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_Ψ end_ARG end_ARG italic_s ) end_CELL end_ROW (62)

The previous equation is the one that we have used for the theoretical curve presented in Figure 6 of the main text. If one is interested in measuring the training error we have

E(γ)=DxDλ1Dλ2eβ(Ψx+Δλ1)eβ(Ψx+Δλ2)[Dλeβ(Ψx+Δλ)]2H(UΨx+Ω1Δλ1+Ω2Δλ2ΞΩ12ΔΩ22Δ+TU2Ψ)𝐸𝛾𝐷𝑥𝐷subscript𝜆1𝐷subscript𝜆2superscript𝑒𝛽Ψ𝑥Δsubscript𝜆1superscript𝑒𝛽Ψ𝑥Δsubscript𝜆2superscriptdelimited-[]𝐷𝜆superscript𝑒𝛽Ψ𝑥Δ𝜆2𝐻𝑈Ψ𝑥subscriptΩ1Δsubscript𝜆1subscriptΩ2Δsubscript𝜆2ΞsuperscriptsubscriptΩ12ΔsuperscriptsubscriptΩ22Δ𝑇superscript𝑈2Ψ\begin{split}E(\gamma)&=\int DxD\lambda_{1}D\lambda_{2}\,\frac{e^{-\beta\ell% \left(\sqrt{\Psi}x+\sqrt{\Delta}\lambda_{1}\right)}\,e^{-\beta\ell\left(\sqrt{% \Psi}x+\sqrt{\Delta}\lambda_{2}\right)}}{\left[\int D\lambda\,e^{-\beta\ell% \left(\sqrt{\Psi}x+\sqrt{\Delta}\lambda\right)}\right]^{2}}\,H\left(\frac{% \frac{U}{\sqrt{\Psi}}x+\frac{\Omega_{1}}{\sqrt{\Delta}}\lambda_{1}+\frac{% \Omega_{2}}{\sqrt{\Delta}}\lambda_{2}}{\sqrt{\Xi-\frac{\Omega_{1}^{2}}{\Delta}% -\frac{\Omega_{2}^{2}}{\Delta}+T-\frac{U^{2}}{\Psi}}}\right)\end{split}start_ROW start_CELL italic_E ( italic_γ ) end_CELL start_CELL = ∫ italic_D italic_x italic_D italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_D italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT divide start_ARG italic_e start_POSTSUPERSCRIPT - italic_β roman_ℓ ( square-root start_ARG roman_Ψ end_ARG italic_x + square-root start_ARG roman_Δ end_ARG italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - italic_β roman_ℓ ( square-root start_ARG roman_Ψ end_ARG italic_x + square-root start_ARG roman_Δ end_ARG italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT end_ARG start_ARG [ ∫ italic_D italic_λ italic_e start_POSTSUPERSCRIPT - italic_β roman_ℓ ( square-root start_ARG roman_Ψ end_ARG italic_x + square-root start_ARG roman_Δ end_ARG italic_λ ) end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG italic_H ( divide start_ARG divide start_ARG italic_U end_ARG start_ARG square-root start_ARG roman_Ψ end_ARG end_ARG italic_x + divide start_ARG roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Δ end_ARG end_ARG italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + divide start_ARG roman_Ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Δ end_ARG end_ARG italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Ξ - divide start_ARG roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_Δ end_ARG - divide start_ARG roman_Ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_Δ end_ARG + italic_T - divide start_ARG italic_U start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_Ψ end_ARG end_ARG end_ARG ) end_CELL end_ROW (63)

where H(x)12Erfc(x2)𝐻𝑥12Erfc𝑥2H(x)\equiv\frac{1}{2}\text{Erfc}\left(\frac{x}{\sqrt{2}}\right)italic_H ( italic_x ) ≡ divide start_ARG 1 end_ARG start_ARG 2 end_ARG Erfc ( divide start_ARG italic_x end_ARG start_ARG square-root start_ARG 2 end_ARG end_ARG ). In the previous formulas T𝑇Titalic_T reduces to T=ΔQ(qcγ2)ΔQ(0)𝑇subscriptΔ𝑄𝑞superscriptsubscript𝑐𝛾2subscriptΔ𝑄0T=\Delta_{Q}\left(\frac{q}{c_{\gamma}^{2}}\right)-\Delta_{Q}(0)italic_T = roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( divide start_ARG italic_q end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( 0 ).

B.4.3 Generic loss – error counting loss with a margin

Refer to caption
Refer to caption
Figure 11: Training loss (left panel) and training error (right) on the geodesic path connecting a solution extracted from the loss function 2(x)=Θ(x+κ)subscript2𝑥Θ𝑥𝜅\ell_{2}(x)=\Theta(-x+\kappa)roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ) = roman_Θ ( - italic_x + italic_κ ) (located at γ=0𝛾0\gamma=0italic_γ = 0) and a solution of the cross entropy loss at β𝛽\beta\to\inftyitalic_β → ∞ (located at γ=1𝛾1\gamma=1italic_γ = 1). Both endpoints and the configurations on the path are at a fixed squared norm Q=1𝑄1Q=1italic_Q = 1. Both observables are plotted in the ReLU activation function case and for several values of the margin κ𝜅\kappaitalic_κ. Despite the training error presents a very small barrier near γ=0𝛾0\gamma=0italic_γ = 0, which is appreciable for very low margin κ𝜅\kappaitalic_κ, the training loss is remarkably larger there. This suggests that the minimizers of the cross entropy are located deep in the bulk of the solution space manifold Baldassi et al. (2020a).

The last case we consider is 2(x)=Θ(x+κ)subscript2𝑥Θ𝑥𝜅\ell_{2}(x)=\Theta(-x+\kappa)roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ) = roman_Θ ( - italic_x + italic_κ ), but 1(x)subscript1𝑥\ell_{1}(x)roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) is considered to be a generic convex loss function. We will again consider the infinite β𝛽\betaitalic_β limit. This imposes a scaling on the overlap q1subscript𝑞1q_{1}italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT that reads

q1=1δq1βsubscript𝑞11𝛿subscript𝑞1𝛽q_{1}=1-\frac{\delta q_{1}}{\beta}italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 - divide start_ARG italic_δ italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_β end_ARG (64)

This induces a non-trivial scaling on some of the effective order parameters (56), in particular Δ1subscriptΔ1\Delta_{1}roman_Δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and Ω1subscriptΩ1\Omega_{1}roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT will be vanishingly small

Δ1subscriptΔ1\displaystyle\Delta_{1}roman_Δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT =ΔQ(Q)ΔQ(q1)ΔQ(Q)δq1βabsentsubscriptΔ𝑄𝑄subscriptΔ𝑄subscript𝑞1similar-to-or-equalssuperscriptsubscriptΔ𝑄𝑄𝛿subscript𝑞1𝛽\displaystyle=\Delta_{Q}(Q)-\Delta_{Q}(q_{1})\simeq\frac{\Delta_{Q}^{\prime}(Q% )\delta q_{1}}{\beta}= roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_Q ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ≃ divide start_ARG roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_Q ) italic_δ italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_β end_ARG (65a)
Ω1subscriptΩ1\displaystyle\Omega_{1}roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT =ΔQ(γQ+(1γ)pcγ)ΔQ(γq1+(1γ)pcγ)ΔQ(γQ+(1γ)pcγ)γδq1cγβδΩ1δq1βabsentsubscriptΔ𝑄𝛾𝑄1𝛾𝑝subscript𝑐𝛾subscriptΔ𝑄𝛾subscript𝑞11𝛾𝑝subscript𝑐𝛾similar-to-or-equalssuperscriptsubscriptΔ𝑄𝛾𝑄1𝛾𝑝subscript𝑐𝛾𝛾𝛿subscript𝑞1subscript𝑐𝛾𝛽𝛿subscriptΩ1𝛿subscript𝑞1𝛽\displaystyle=\Delta_{Q}\left(\frac{\gamma Q+(1-\gamma)p}{c_{\gamma}}\right)-% \Delta_{Q}\left(\frac{\gamma q_{1}+(1-\gamma)p}{c_{\gamma}}\right)\simeq\Delta% _{Q}^{\prime}\left(\frac{\gamma Q+(1-\gamma)p}{c_{\gamma}}\right)\frac{\gamma% \delta q_{1}}{c_{\gamma}\beta}\equiv\delta\Omega_{1}\frac{\delta q_{1}}{\beta}= roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( divide start_ARG italic_γ italic_Q + ( 1 - italic_γ ) italic_p end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( divide start_ARG italic_γ italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ( 1 - italic_γ ) italic_p end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG ) ≃ roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( divide start_ARG italic_γ italic_Q + ( 1 - italic_γ ) italic_p end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG ) divide start_ARG italic_γ italic_δ italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT italic_β end_ARG ≡ italic_δ roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT divide start_ARG italic_δ italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_β end_ARG (65b)

where we have introduced the quantity

ΔQ(q)ΔQq=Dx[Dyφ(qx+Qqy)]2.superscriptsubscriptΔ𝑄𝑞subscriptΔ𝑄𝑞𝐷𝑥superscriptdelimited-[]𝐷𝑦superscript𝜑𝑞𝑥𝑄𝑞𝑦2\Delta_{Q}^{\prime}(q)\equiv\frac{\partial\Delta_{Q}}{\partial q}=\int Dx\left% [\int Dy\,\varphi^{\prime}\left(\sqrt{q}x+\sqrt{Q-q}y\right)\right]^{2}\,.roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_q ) ≡ divide start_ARG ∂ roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_q end_ARG = ∫ italic_D italic_x [ ∫ italic_D italic_y italic_φ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( square-root start_ARG italic_q end_ARG italic_x + square-root start_ARG italic_Q - italic_q end_ARG italic_y ) ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (66)

and

δΩ1ΔQ(γQ+(1γ)pcγ)γcγ𝛿subscriptΩ1superscriptsubscriptΔ𝑄𝛾𝑄1𝛾𝑝subscript𝑐𝛾𝛾subscript𝑐𝛾\delta\Omega_{1}\equiv\Delta_{Q}^{\prime}\left(\frac{\gamma Q+(1-\gamma)p}{c_{% \gamma}}\right)\frac{\gamma}{c_{\gamma}}italic_δ roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≡ roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( divide start_ARG italic_γ italic_Q + ( 1 - italic_γ ) italic_p end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG ) divide start_ARG italic_γ end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG (67)

Furthermore, we have that Ψ11=ΔQ(Q)ΔQ(0)subscriptΨ11subscriptΔ𝑄𝑄subscriptΔ𝑄0\Psi_{11}=\Delta_{Q}(Q)-\Delta_{Q}(0)roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT = roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_Q ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( 0 ), U1=ΔQ(γQ+(1γ)pcγ)ΔQ(0)subscript𝑈1subscriptΔ𝑄𝛾𝑄1𝛾𝑝subscript𝑐𝛾subscriptΔ𝑄0U_{1}=\Delta_{Q}\left(\frac{\gamma Q+(1-\gamma)p}{c_{\gamma}}\right)-\Delta_{Q% }(0)italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( divide start_ARG italic_γ italic_Q + ( 1 - italic_γ ) italic_p end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( 0 ), Λ11Δ11subscriptΛ11subscriptΔ11\frac{\Lambda_{11}}{\Delta_{1}}\to 1divide start_ARG roman_Λ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG → 1 and Λ120subscriptΛ120\Lambda_{12}\to 0roman_Λ start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT → 0 so that detΛΛ11Λ22ΛsubscriptΛ11subscriptΛ22\frac{\det\Lambda}{\Lambda_{11}}\to\Lambda_{22}divide start_ARG roman_det roman_Λ end_ARG start_ARG roman_Λ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG → roman_Λ start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT. Using those relations inside (58), rescaling λ1βλ1subscript𝜆1𝛽subscript𝜆1\lambda_{1}\to\beta\lambda_{1}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT → italic_β italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and using a saddle point over λ1subscript𝜆1\lambda_{1}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, one gets

ϵt(γ)=DxDy1Dy2Ds~(U1Ψ11x+U2Ψ11U1Ψ12Ψ11detΨy1+TdetHdetΨy2+δΩ1δq1ΔQ(Q)z(x)+Ξs)×H(κΨ12Ψ11xdetΨΨ11y1Ω2ΞsΛ22)H(κΨ12Ψ11xdetΨΨ11y1Δ2)subscriptitalic-ϵ𝑡𝛾𝐷𝑥𝐷subscript𝑦1𝐷subscript𝑦2𝐷𝑠~subscript𝑈1subscriptΨ11𝑥subscript𝑈2subscriptΨ11subscript𝑈1subscriptΨ12subscriptΨ11Ψsubscript𝑦1𝑇𝐻Ψsubscript𝑦2𝛿subscriptΩ1𝛿subscript𝑞1superscriptsubscriptΔ𝑄𝑄subscript𝑧𝑥Ξ𝑠𝐻𝜅subscriptΨ12subscriptΨ11𝑥ΨsubscriptΨ11subscript𝑦1subscriptΩ2Ξ𝑠subscriptΛ22𝐻𝜅subscriptΨ12subscriptΨ11𝑥ΨsubscriptΨ11subscript𝑦1subscriptΔ2\begin{split}\epsilon_{t}(\gamma)&=\int DxDy_{1}Dy_{2}\int Ds\,\tilde{\ell}% \left(\frac{U_{1}}{\sqrt{\Psi_{11}}}x+\frac{U_{2}\Psi_{11}-U_{1}\Psi_{12}}{% \sqrt{\Psi_{11}\det\Psi}}y_{1}+\sqrt{\frac{T\det H}{\det\Psi}}y_{2}+\frac{% \delta\Omega_{1}\sqrt{\delta q_{1}}}{\sqrt{\Delta_{Q}^{\prime}(Q)}}z_{\star}(x% )+\sqrt{\Xi}s\right)\\ &\times\frac{H\left(\frac{\kappa-\frac{\Psi_{12}}{\sqrt{\Psi_{11}}}x-\sqrt{% \frac{\det\Psi}{\Psi_{11}}}y_{1}-\frac{\Omega_{2}}{\sqrt{\Xi}}s}{\sqrt{\Lambda% _{22}}}\right)}{H\left(\frac{\kappa-\frac{\Psi_{12}}{\sqrt{\Psi_{11}}}x-\sqrt{% \frac{\det\Psi}{\Psi_{11}}}y_{1}}{\sqrt{\Delta_{2}}}\right)}\end{split}start_ROW start_CELL italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_γ ) end_CELL start_CELL = ∫ italic_D italic_x italic_D italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_D italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∫ italic_D italic_s over~ start_ARG roman_ℓ end_ARG ( divide start_ARG italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG end_ARG italic_x + divide start_ARG italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT - italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_Ψ start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT roman_det roman_Ψ end_ARG end_ARG italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + square-root start_ARG divide start_ARG italic_T roman_det italic_H end_ARG start_ARG roman_det roman_Ψ end_ARG end_ARG italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + divide start_ARG italic_δ roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT square-root start_ARG italic_δ italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG square-root start_ARG roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_Q ) end_ARG end_ARG italic_z start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ( italic_x ) + square-root start_ARG roman_Ξ end_ARG italic_s ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL × divide start_ARG italic_H ( divide start_ARG italic_κ - divide start_ARG roman_Ψ start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG end_ARG italic_x - square-root start_ARG divide start_ARG roman_det roman_Ψ end_ARG start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG end_ARG italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - divide start_ARG roman_Ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Ξ end_ARG end_ARG italic_s end_ARG start_ARG square-root start_ARG roman_Λ start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT end_ARG end_ARG ) end_ARG start_ARG italic_H ( divide start_ARG italic_κ - divide start_ARG roman_Ψ start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG end_ARG italic_x - square-root start_ARG divide start_ARG roman_det roman_Ψ end_ARG start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG end_ARG italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG ) end_ARG end_CELL end_ROW (68)

where z(x)subscript𝑧𝑥z_{\star}(x)italic_z start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ( italic_x ) is the function defined in (38) that also appears in the equilibrium computation Baldassi et al. (2023). In the case we want to measure the training error, i.e. ~(x)=Θ(x)~𝑥Θ𝑥\tilde{\ell}(x)=\Theta(-x)over~ start_ARG roman_ℓ end_ARG ( italic_x ) = roman_Θ ( - italic_x ) we have

ϵt(γ)=DxDy1DsH(U1Ψ11x+U2Ψ11U1Ψ12Ψ11detΨy1+δΩ1δq1ΔQ(Q)z(x)+ΞsTdetHdetΨ)H(κΨ12Ψ11xdetΨΨ11y1Ω2ΞsΛ22)H(κΨ12Ψ11xdetΨΨ11y1Δ2)subscriptitalic-ϵ𝑡𝛾𝐷𝑥𝐷subscript𝑦1𝐷𝑠𝐻subscript𝑈1subscriptΨ11𝑥subscript𝑈2subscriptΨ11subscript𝑈1subscriptΨ12subscriptΨ11Ψsubscript𝑦1𝛿subscriptΩ1𝛿subscript𝑞1superscriptsubscriptΔ𝑄𝑄subscript𝑧𝑥Ξ𝑠𝑇𝐻Ψ𝐻𝜅subscriptΨ12subscriptΨ11𝑥ΨsubscriptΨ11subscript𝑦1subscriptΩ2Ξ𝑠subscriptΛ22𝐻𝜅subscriptΨ12subscriptΨ11𝑥ΨsubscriptΨ11subscript𝑦1subscriptΔ2\begin{split}\epsilon_{t}(\gamma)&=\int DxDy_{1}Ds\,H\left(\frac{\frac{U_{1}}{% \sqrt{\Psi_{11}}}x+\frac{U_{2}\Psi_{11}-U_{1}\Psi_{12}}{\sqrt{\Psi_{11}\det% \Psi}}y_{1}+\frac{\delta\Omega_{1}\sqrt{\delta q_{1}}}{\sqrt{\Delta_{Q}^{% \prime}(Q)}}z_{\star}(x)+\sqrt{\Xi}s}{\sqrt{\frac{T\det H}{\det\Psi}}}\right)% \frac{H\left(\frac{\kappa-\frac{\Psi_{12}}{\sqrt{\Psi_{11}}}x-\sqrt{\frac{\det% \Psi}{\Psi_{11}}}y_{1}-\frac{\Omega_{2}}{\sqrt{\Xi}}s}{\sqrt{\Lambda_{22}}}% \right)}{H\left(\frac{\kappa-\frac{\Psi_{12}}{\sqrt{\Psi_{11}}}x-\sqrt{\frac{% \det\Psi}{\Psi_{11}}}y_{1}}{\sqrt{\Delta_{2}}}\right)}\end{split}start_ROW start_CELL italic_ϵ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_γ ) end_CELL start_CELL = ∫ italic_D italic_x italic_D italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_D italic_s italic_H ( divide start_ARG divide start_ARG italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG end_ARG italic_x + divide start_ARG italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT - italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_Ψ start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT roman_det roman_Ψ end_ARG end_ARG italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + divide start_ARG italic_δ roman_Ω start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT square-root start_ARG italic_δ italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG square-root start_ARG roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_Q ) end_ARG end_ARG italic_z start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ( italic_x ) + square-root start_ARG roman_Ξ end_ARG italic_s end_ARG start_ARG square-root start_ARG divide start_ARG italic_T roman_det italic_H end_ARG start_ARG roman_det roman_Ψ end_ARG end_ARG end_ARG ) divide start_ARG italic_H ( divide start_ARG italic_κ - divide start_ARG roman_Ψ start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG end_ARG italic_x - square-root start_ARG divide start_ARG roman_det roman_Ψ end_ARG start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG end_ARG italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - divide start_ARG roman_Ω start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Ξ end_ARG end_ARG italic_s end_ARG start_ARG square-root start_ARG roman_Λ start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT end_ARG end_ARG ) end_ARG start_ARG italic_H ( divide start_ARG italic_κ - divide start_ARG roman_Ψ start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG end_ARG italic_x - square-root start_ARG divide start_ARG roman_det roman_Ψ end_ARG start_ARG roman_Ψ start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG end_ARG italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Δ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG ) end_ARG end_CELL end_ROW (69)

We show in Figure 11 the training loss and error along the geodesics connecting solutions extracted from the error counting loss with a margin and the typical cross-entropy minimizer. Despite the training error is very small along the path, the loss is much larger in the neighborhood of the endpoint corresponding to the solution extracted from the error counting loss with a margin. As the margin is increased the training loss decreases.

Appendix C Effective order parameters

In this section we show that the effective order parameters defined in Eq. (54) reduce to the expressions given in (56).

We remind the notation

φr(λr,x)subscript𝜑𝑟subscript𝜆𝑟𝑥\displaystyle\varphi_{r}(\lambda_{r},x)italic_φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT , italic_x ) φ(Qqrλrs𝒯rsxs)absent𝜑𝑄subscript𝑞𝑟subscript𝜆𝑟subscript𝑠subscript𝒯𝑟𝑠subscript𝑥𝑠\displaystyle\equiv\varphi\left(\sqrt{Q-q_{r}}\lambda_{r}-\sum_{s}\mathcal{T}_% {rs}x_{s}\right)≡ italic_φ ( square-root start_ARG italic_Q - italic_q start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT caligraphic_T start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) (70a)
φ~(γ;λ1,λ2,x1,x2)~𝜑𝛾subscript𝜆1subscript𝜆2subscript𝑥1subscript𝑥2\displaystyle\tilde{\varphi}(\gamma;\lambda_{1},\lambda_{2},x_{1},x_{2})over~ start_ARG italic_φ end_ARG ( italic_γ ; italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) φ(rγr(Qqrλrs𝒯rsxscγ))absent𝜑subscript𝑟subscript𝛾𝑟𝑄subscript𝑞𝑟subscript𝜆𝑟subscript𝑠subscript𝒯𝑟𝑠subscript𝑥𝑠subscript𝑐𝛾\displaystyle\equiv\varphi\left(\sum_{r}\gamma_{r}\left(\frac{\sqrt{Q-q_{r}}% \lambda_{r}-\sum_{s}\mathcal{T}_{rs}x_{s}}{c_{\gamma}}\right)\right)≡ italic_φ ( ∑ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( divide start_ARG square-root start_ARG italic_Q - italic_q start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT caligraphic_T start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG ) ) (70b)

Remember also that 𝒯𝒯\mathcal{T}caligraphic_T is the square root matrix of the matrix trs=qrδrs+(1δrs)psubscript𝑡𝑟𝑠subscript𝑞𝑟subscript𝛿𝑟𝑠1subscript𝛿𝑟𝑠𝑝t_{rs}=q_{r}\delta_{rs}+(1-\delta_{rs})pitalic_t start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT = italic_q start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_δ start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT + ( 1 - italic_δ start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT ) italic_p and therefore we have the following identities q1=𝒯112+𝒯122subscript𝑞1superscriptsubscript𝒯112superscriptsubscript𝒯122q_{1}=\mathcal{T}_{11}^{2}+\mathcal{T}_{12}^{2}italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = caligraphic_T start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + caligraphic_T start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, q2=𝒯222+𝒯122subscript𝑞2superscriptsubscript𝒯222superscriptsubscript𝒯122q_{2}=\mathcal{T}_{22}^{2}+\mathcal{T}_{12}^{2}italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = caligraphic_T start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + caligraphic_T start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, p=𝒯12(𝒯11+𝒯22)𝑝subscript𝒯12subscript𝒯11subscript𝒯22p=\mathcal{T}_{12}(\mathcal{T}_{11}+\mathcal{T}_{22})italic_p = caligraphic_T start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT ( caligraphic_T start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT + caligraphic_T start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT ) and (𝒯11𝒯22𝒯122)2=q1q2p2superscriptsubscript𝒯11subscript𝒯22superscriptsubscript𝒯1222subscript𝑞1subscript𝑞2superscript𝑝2(\mathcal{T}_{11}\mathcal{T}_{22}-\mathcal{T}_{12}^{2})^{2}=q_{1}q_{2}-p^{2}( caligraphic_T start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT caligraphic_T start_POSTSUBSCRIPT 22 end_POSTSUBSCRIPT - caligraphic_T start_POSTSUBSCRIPT 12 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT.

Let’s start by analyzing the terms Ψrsφrλφsλxφrλ,xφsλ,xsubscriptΨ𝑟𝑠subscriptdelimited-⟨⟩subscriptdelimited-⟨⟩subscript𝜑𝑟𝜆subscriptdelimited-⟨⟩subscript𝜑𝑠𝜆𝑥subscriptdelimited-⟨⟩subscript𝜑𝑟𝜆𝑥subscriptdelimited-⟨⟩subscript𝜑𝑠𝜆𝑥\Psi_{rs}\equiv\langle\langle\varphi_{r}\rangle_{\lambda}\langle\varphi_{s}% \rangle_{\lambda}\rangle_{x}-\langle\varphi_{r}\rangle_{\lambda,x}\langle% \varphi_{s}\rangle_{\lambda,x}roman_Ψ start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT ≡ ⟨ ⟨ italic_φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ⟨ italic_φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT - ⟨ italic_φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ , italic_x end_POSTSUBSCRIPT ⟨ italic_φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ , italic_x end_POSTSUBSCRIPT and Δrφ2λ,xφrλ2xsubscriptΔ𝑟subscriptdelimited-⟨⟩superscript𝜑2𝜆𝑥subscriptdelimited-⟨⟩superscriptsubscriptdelimited-⟨⟩subscript𝜑𝑟𝜆2𝑥\Delta_{r}\equiv\langle\varphi^{2}\rangle_{\lambda,x}-\langle\langle\varphi_{r% }\rangle_{\lambda}^{2}\rangle_{x}roman_Δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ≡ ⟨ italic_φ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ , italic_x end_POSTSUBSCRIPT - ⟨ ⟨ italic_φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⟩ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT, which only involve φrsubscript𝜑𝑟\varphi_{r}italic_φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT. First notice that

φrλ,xsubscriptdelimited-⟨⟩subscript𝜑𝑟𝜆𝑥\displaystyle\langle\varphi_{r}\rangle_{\lambda,x}⟨ italic_φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ , italic_x end_POSTSUBSCRIPT =Dyφ(Qy)=ΔQ(0),r=1,2formulae-sequenceabsent𝐷𝑦𝜑𝑄𝑦subscriptΔ𝑄0𝑟12\displaystyle=\int Dy\,\varphi(\sqrt{Q}y)=\sqrt{\Delta_{Q}(0)}\,,\qquad r=1,2= ∫ italic_D italic_y italic_φ ( square-root start_ARG italic_Q end_ARG italic_y ) = square-root start_ARG roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( 0 ) end_ARG , italic_r = 1 , 2 (71a)
φr2λ,xsubscriptdelimited-⟨⟩superscriptsubscript𝜑𝑟2𝜆𝑥\displaystyle\langle\varphi_{r}^{2}\rangle_{\lambda,x}⟨ italic_φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ , italic_x end_POSTSUBSCRIPT =Dyφ2(Qy)=ΔQ(Q),r=1,2formulae-sequenceabsent𝐷𝑦superscript𝜑2𝑄𝑦subscriptΔ𝑄𝑄𝑟12\displaystyle=\int Dy\,\varphi^{2}(\sqrt{Q}y)=\Delta_{Q}(Q)\,,\qquad r=1,2= ∫ italic_D italic_y italic_φ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( square-root start_ARG italic_Q end_ARG italic_y ) = roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_Q ) , italic_r = 1 , 2 (71b)

and

φ1λφ2λx=Dx1Dx2Dλ1Dλ2φ(Qq1λ1+s𝒯1sxs)φ(Qq2λ2+s𝒯2sxs)=Dx1Dx2Dλ1Dλ2φ(Qq1λ1+q1x1)φ(Qq2λ2+pq1x1+q2p2q1x2)=Dx1Dx2φ(Qx1)φ(pQx1+Qp2Qx2)=ΔQ(p)subscriptdelimited-⟨⟩subscriptdelimited-⟨⟩subscript𝜑1𝜆subscriptdelimited-⟨⟩subscript𝜑2𝜆𝑥𝐷subscript𝑥1𝐷subscript𝑥2𝐷subscript𝜆1𝐷subscript𝜆2𝜑𝑄subscript𝑞1subscript𝜆1subscript𝑠subscript𝒯1𝑠subscript𝑥𝑠𝜑𝑄subscript𝑞2subscript𝜆2subscript𝑠subscript𝒯2𝑠subscript𝑥𝑠𝐷subscript𝑥1𝐷subscript𝑥2𝐷subscript𝜆1𝐷subscript𝜆2𝜑𝑄subscript𝑞1subscript𝜆1subscript𝑞1subscript𝑥1𝜑𝑄subscript𝑞2subscript𝜆2𝑝subscript𝑞1subscript𝑥1subscript𝑞2superscript𝑝2subscript𝑞1subscript𝑥2𝐷subscript𝑥1𝐷subscript𝑥2𝜑𝑄subscript𝑥1𝜑𝑝𝑄subscript𝑥1𝑄superscript𝑝2𝑄subscript𝑥2subscriptΔ𝑄𝑝\begin{split}\langle\langle\varphi_{1}\rangle_{\lambda}\langle\varphi_{2}% \rangle_{\lambda}\rangle_{x}&=\int Dx_{1}Dx_{2}D\lambda_{1}D\lambda_{2}\,% \varphi\left(\sqrt{Q-q_{1}}\lambda_{1}+\sum_{s}\mathcal{T}_{1s}x_{s}\right)\,% \varphi\left(\sqrt{Q-q_{2}}\lambda_{2}+\sum_{s}\mathcal{T}_{2s}x_{s}\right)\\ &=\int Dx_{1}Dx_{2}D\lambda_{1}D\lambda_{2}\,\varphi\left(\sqrt{Q-q_{1}}% \lambda_{1}+\sqrt{q_{1}}x_{1}\right)\,\varphi\left(\sqrt{Q-q_{2}}\lambda_{2}+% \frac{p}{\sqrt{q_{1}}}x_{1}+\sqrt{q_{2}-\frac{p^{2}}{q_{1}}}x_{2}\right)\\ &=\int Dx_{1}Dx_{2}\,\varphi\left(\sqrt{Q}x_{1}\right)\,\varphi\left(\frac{p}{% \sqrt{Q}}\,x_{1}+\sqrt{Q-\frac{p^{2}}{Q}}x_{2}\right)=\Delta_{Q}(p)\\ \end{split}start_ROW start_CELL ⟨ ⟨ italic_φ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ⟨ italic_φ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_CELL start_CELL = ∫ italic_D italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_D italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_D italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_D italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_φ ( square-root start_ARG italic_Q - italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT caligraphic_T start_POSTSUBSCRIPT 1 italic_s end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) italic_φ ( square-root start_ARG italic_Q - italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT caligraphic_T start_POSTSUBSCRIPT 2 italic_s end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = ∫ italic_D italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_D italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_D italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_D italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_φ ( square-root start_ARG italic_Q - italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + square-root start_ARG italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) italic_φ ( square-root start_ARG italic_Q - italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + divide start_ARG italic_p end_ARG start_ARG square-root start_ARG italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG end_ARG italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + square-root start_ARG italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - divide start_ARG italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG end_ARG italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = ∫ italic_D italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_D italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_φ ( square-root start_ARG italic_Q end_ARG italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) italic_φ ( divide start_ARG italic_p end_ARG start_ARG square-root start_ARG italic_Q end_ARG end_ARG italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + square-root start_ARG italic_Q - divide start_ARG italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_Q end_ARG end_ARG italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_p ) end_CELL end_ROW (72)

Similarly, if r=s𝑟𝑠r=sitalic_r = italic_s we have

φrλφrλx=φrλ2x=ΔQ(qr)subscriptdelimited-⟨⟩subscriptdelimited-⟨⟩subscript𝜑𝑟𝜆subscriptdelimited-⟨⟩subscript𝜑𝑟𝜆𝑥subscriptdelimited-⟨⟩superscriptsubscriptdelimited-⟨⟩subscript𝜑𝑟𝜆2𝑥subscriptΔ𝑄subscript𝑞𝑟\begin{split}\langle\langle\varphi_{r}\rangle_{\lambda}\langle\varphi_{r}% \rangle_{\lambda}\rangle_{x}=\langle\langle\varphi_{r}\rangle_{\lambda}^{2}% \rangle_{x}=\Delta_{Q}(q_{r})\end{split}start_ROW start_CELL ⟨ ⟨ italic_φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ⟨ italic_φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT = ⟨ ⟨ italic_φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⟩ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT = roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) end_CELL end_ROW (73)

so that Ψrs=ΔQ(trs)ΔQ(0)subscriptΨ𝑟𝑠subscriptΔ𝑄subscript𝑡𝑟𝑠subscriptΔ𝑄0\Psi_{rs}=\Delta_{Q}(t_{rs})-\Delta_{Q}(0)roman_Ψ start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT = roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( 0 ) and Δr=ΔQ(Q)ΔQ(trr)subscriptΔ𝑟subscriptΔ𝑄𝑄subscriptΔ𝑄subscript𝑡𝑟𝑟\Delta_{r}=\Delta_{Q}(Q)-\Delta_{Q}(t_{rr})roman_Δ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT = roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_Q ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_r italic_r end_POSTSUBSCRIPT ).

Secondly let’s analyze the terms which contain only φ~~𝜑\tilde{\varphi}over~ start_ARG italic_φ end_ARG, i.e. Tφ~λ2xφ~λ,x2𝑇subscriptdelimited-⟨⟩subscriptsuperscriptdelimited-⟨⟩~𝜑2𝜆𝑥superscriptsubscriptdelimited-⟨⟩~𝜑𝜆𝑥2T\equiv\langle\langle\tilde{\varphi}\rangle^{2}_{\lambda}\rangle_{x}-\langle% \tilde{\varphi}\rangle_{\lambda,x}^{2}italic_T ≡ ⟨ ⟨ over~ start_ARG italic_φ end_ARG ⟩ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT - ⟨ over~ start_ARG italic_φ end_ARG ⟩ start_POSTSUBSCRIPT italic_λ , italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and Ξφ~2λ,xφ~λ2xΞsubscriptdelimited-⟨⟩superscript~𝜑2𝜆𝑥subscriptdelimited-⟨⟩subscriptsuperscriptdelimited-⟨⟩~𝜑2𝜆𝑥\Xi\equiv\langle\tilde{\varphi}^{2}\rangle_{\lambda,x}-\langle\langle\tilde{% \varphi}\rangle^{2}_{\lambda}\rangle_{x}roman_Ξ ≡ ⟨ over~ start_ARG italic_φ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ , italic_x end_POSTSUBSCRIPT - ⟨ ⟨ over~ start_ARG italic_φ end_ARG ⟩ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT. The terms φ~λ,xsubscriptdelimited-⟨⟩~𝜑𝜆𝑥\langle\tilde{\varphi}\rangle_{\lambda,x}⟨ over~ start_ARG italic_φ end_ARG ⟩ start_POSTSUBSCRIPT italic_λ , italic_x end_POSTSUBSCRIPT and φ~2λ,xsubscriptdelimited-⟨⟩superscript~𝜑2𝜆𝑥\langle\tilde{\varphi}^{2}\rangle_{\lambda,x}⟨ over~ start_ARG italic_φ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ , italic_x end_POSTSUBSCRIPT are easy to analyze since the integrand involves a sum of 4 uncorrelated Gaussian variables, which is Gaussian. We therefore get

φ~λ,xsubscriptdelimited-⟨⟩~𝜑𝜆𝑥\displaystyle\langle\tilde{\varphi}\rangle_{\lambda,x}⟨ over~ start_ARG italic_φ end_ARG ⟩ start_POSTSUBSCRIPT italic_λ , italic_x end_POSTSUBSCRIPT =Dyφ(Qy)=ΔQ(0),absent𝐷𝑦𝜑𝑄𝑦subscriptΔ𝑄0\displaystyle=\int Dy\,\varphi(\sqrt{Q}y)=\sqrt{\Delta_{Q}(0)}\,,= ∫ italic_D italic_y italic_φ ( square-root start_ARG italic_Q end_ARG italic_y ) = square-root start_ARG roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( 0 ) end_ARG , (74a)
φ~2λ,xsubscriptdelimited-⟨⟩superscript~𝜑2𝜆𝑥\displaystyle\langle\tilde{\varphi}^{2}\rangle_{\lambda,x}⟨ over~ start_ARG italic_φ end_ARG start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ , italic_x end_POSTSUBSCRIPT =Dyφ2(Qy)=ΔQ(Q).absent𝐷𝑦superscript𝜑2𝑄𝑦subscriptΔ𝑄𝑄\displaystyle=\int Dy\,\varphi^{2}(\sqrt{Q}y)=\Delta_{Q}(Q)\,.= ∫ italic_D italic_y italic_φ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( square-root start_ARG italic_Q end_ARG italic_y ) = roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_Q ) . (74b)

Finally

φ~λ2x=Dx1Dx2[Dλ1Dλ2φ(rγr(Qqrλr+s𝒯rsxscγ))]2=Dx[Dyφ(Q2Qγ(1γ)γ2q1(1γ)2q2cγy+γ2q1+(1γ)2q2+2γ(1γ)pcγx)]2=ΔQ(γ2q1+(1γ)2q2+2γ(1γ)pcγ2)subscriptdelimited-⟨⟩subscriptsuperscriptdelimited-⟨⟩~𝜑2𝜆𝑥𝐷subscript𝑥1𝐷subscript𝑥2superscriptdelimited-[]𝐷subscript𝜆1𝐷subscript𝜆2𝜑subscript𝑟subscript𝛾𝑟𝑄subscript𝑞𝑟subscript𝜆𝑟subscript𝑠subscript𝒯𝑟𝑠subscript𝑥𝑠subscript𝑐𝛾2𝐷𝑥superscriptdelimited-[]𝐷𝑦𝜑𝑄2𝑄𝛾1𝛾superscript𝛾2subscript𝑞1superscript1𝛾2subscript𝑞2subscript𝑐𝛾𝑦superscript𝛾2subscript𝑞1superscript1𝛾2subscript𝑞22𝛾1𝛾𝑝subscript𝑐𝛾𝑥2subscriptΔ𝑄superscript𝛾2subscript𝑞1superscript1𝛾2subscript𝑞22𝛾1𝛾𝑝superscriptsubscript𝑐𝛾2\begin{split}\langle\langle\tilde{\varphi}\rangle^{2}_{\lambda}\rangle_{x}&=% \int Dx_{1}Dx_{2}\left[\int D\lambda_{1}D\lambda_{2}\,\varphi\left(\sum_{r}% \gamma_{r}\left(\frac{\sqrt{Q-q_{r}}\lambda_{r}+\sum_{s}\mathcal{T}_{rs}x_{s}}% {c_{\gamma}}\right)\right)\right]^{2}\\ &=\int Dx\left[\int Dy\,\varphi\left(\frac{\sqrt{Q-2Q\gamma(1-\gamma)-\gamma^{% 2}q_{1}-(1-\gamma)^{2}q_{2}}}{c_{\gamma}}y+\frac{\sqrt{\gamma^{2}q_{1}+(1-% \gamma)^{2}q_{2}+2\gamma(1-\gamma)p}}{c_{\gamma}}\,x\right)\right]^{2}\\ &=\Delta_{Q}\left(\frac{\gamma^{2}q_{1}+(1-\gamma)^{2}q_{2}+2\gamma(1-\gamma)p% }{c_{\gamma}^{2}}\right)\end{split}start_ROW start_CELL ⟨ ⟨ over~ start_ARG italic_φ end_ARG ⟩ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_CELL start_CELL = ∫ italic_D italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_D italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT [ ∫ italic_D italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_D italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_φ ( ∑ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( divide start_ARG square-root start_ARG italic_Q - italic_q start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT caligraphic_T start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG ) ) ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = ∫ italic_D italic_x [ ∫ italic_D italic_y italic_φ ( divide start_ARG square-root start_ARG italic_Q - 2 italic_Q italic_γ ( 1 - italic_γ ) - italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - ( 1 - italic_γ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG italic_y + divide start_ARG square-root start_ARG italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ( 1 - italic_γ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + 2 italic_γ ( 1 - italic_γ ) italic_p end_ARG end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG italic_x ) ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( divide start_ARG italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ( 1 - italic_γ ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + 2 italic_γ ( 1 - italic_γ ) italic_p end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) end_CELL end_ROW (75)

The last computation concerns the correlations between the function φrsubscript𝜑𝑟\varphi_{r}italic_φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT and φ~~𝜑\tilde{\varphi}over~ start_ARG italic_φ end_ARG, which appear in the variables Urφrλφ~λxφλ,xφ~λ,xsubscript𝑈𝑟subscriptdelimited-⟨⟩subscriptdelimited-⟨⟩subscript𝜑𝑟𝜆subscriptdelimited-⟨⟩~𝜑𝜆𝑥subscriptdelimited-⟨⟩𝜑𝜆𝑥subscriptdelimited-⟨⟩~𝜑𝜆𝑥U_{r}\equiv\langle\langle\varphi_{r}\rangle_{\lambda}\langle\tilde{\varphi}% \rangle_{\lambda}\rangle_{x}-\langle\varphi\rangle_{\lambda,x}\langle\tilde{% \varphi}\rangle_{\lambda,x}italic_U start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ≡ ⟨ ⟨ italic_φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ⟨ over~ start_ARG italic_φ end_ARG ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT - ⟨ italic_φ ⟩ start_POSTSUBSCRIPT italic_λ , italic_x end_POSTSUBSCRIPT ⟨ over~ start_ARG italic_φ end_ARG ⟩ start_POSTSUBSCRIPT italic_λ , italic_x end_POSTSUBSCRIPT, Ωrφ~φrλxφ~λφrλxsubscriptΩ𝑟subscriptdelimited-⟨⟩subscriptdelimited-⟨⟩~𝜑subscript𝜑𝑟𝜆𝑥subscriptdelimited-⟨⟩subscriptdelimited-⟨⟩~𝜑𝜆subscriptdelimited-⟨⟩subscript𝜑𝑟𝜆𝑥\Omega_{r}\equiv\langle\langle\tilde{\varphi}\varphi_{r}\rangle_{\lambda}% \rangle_{x}-\langle\langle\tilde{\varphi}\rangle_{\lambda}\langle\varphi_{r}% \rangle_{\lambda}\rangle_{x}roman_Ω start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ≡ ⟨ ⟨ over~ start_ARG italic_φ end_ARG italic_φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT - ⟨ ⟨ over~ start_ARG italic_φ end_ARG ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ⟨ italic_φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT. We need therefore to evaluate the following two quantities φ~φrλxsubscriptdelimited-⟨⟩subscriptdelimited-⟨⟩~𝜑subscript𝜑𝑟𝜆𝑥\langle\langle\tilde{\varphi}\varphi_{r}\rangle_{\lambda}\rangle_{x}⟨ ⟨ over~ start_ARG italic_φ end_ARG italic_φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT and φ~λφrλxsubscriptdelimited-⟨⟩subscriptdelimited-⟨⟩~𝜑𝜆subscriptdelimited-⟨⟩subscript𝜑𝑟𝜆𝑥\langle\langle\tilde{\varphi}\rangle_{\lambda}\langle\varphi_{r}\rangle_{% \lambda}\rangle_{x}⟨ ⟨ over~ start_ARG italic_φ end_ARG ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ⟨ italic_φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT, for r=1,2𝑟12r=1,2italic_r = 1 , 2. We are going to analyze the case r=1𝑟1r=1italic_r = 1, the other can be obtained by symmetry. We have

φ~φ1λx=Dx1Dx2Dλ1Dλ2φ(rγr(Qqrλr+s𝒯rsxscγ))φ(Qq1λ1+s𝒯1sxs)=Dx1Dx2Dλ1Dλ2φ(x1)×φ(γQ+(1γ)pQcγx1(1γ)pQq1q1Qcγλ1+(1γ)Qq2cγλ2+1γcγq2p2q1x2)=Dx1Dx2φ(Qx1)φ(γQ+(1γ)pQcγx1+1γcγQp2Qx2)=ΔQ(γQ+(1γ)pcγ)subscriptdelimited-⟨⟩subscriptdelimited-⟨⟩~𝜑subscript𝜑1𝜆𝑥𝐷subscript𝑥1𝐷subscript𝑥2𝐷subscript𝜆1𝐷subscript𝜆2𝜑subscript𝑟subscript𝛾𝑟𝑄subscript𝑞𝑟subscript𝜆𝑟subscript𝑠subscript𝒯𝑟𝑠subscript𝑥𝑠subscript𝑐𝛾𝜑𝑄subscript𝑞1subscript𝜆1subscript𝑠subscript𝒯1𝑠subscript𝑥𝑠𝐷subscript𝑥1𝐷subscript𝑥2𝐷subscript𝜆1𝐷subscript𝜆2𝜑subscript𝑥1𝜑𝛾𝑄1𝛾𝑝𝑄subscript𝑐𝛾subscript𝑥11𝛾𝑝𝑄subscript𝑞1subscript𝑞1𝑄subscript𝑐𝛾subscript𝜆11𝛾𝑄subscript𝑞2subscript𝑐𝛾subscript𝜆21𝛾subscript𝑐𝛾subscript𝑞2superscript𝑝2subscript𝑞1subscript𝑥2𝐷subscript𝑥1𝐷subscript𝑥2𝜑𝑄subscript𝑥1𝜑𝛾𝑄1𝛾𝑝𝑄subscript𝑐𝛾subscript𝑥11𝛾subscript𝑐𝛾𝑄superscript𝑝2𝑄subscript𝑥2subscriptΔ𝑄𝛾𝑄1𝛾𝑝subscript𝑐𝛾\begin{split}\langle\langle\tilde{\varphi}\varphi_{1}\rangle_{\lambda}\rangle_% {x}&=\int Dx_{1}Dx_{2}D\lambda_{1}D\lambda_{2}\,\varphi\left(\sum_{r}\gamma_{r% }\left(\frac{\sqrt{Q-q_{r}}\lambda_{r}+\sum_{s}\mathcal{T}_{rs}x_{s}}{c_{% \gamma}}\right)\right)\varphi\left(\sqrt{Q-q_{1}}\lambda_{1}+\sum_{s}\mathcal{% T}_{1s}x_{s}\right)\\ &=\int Dx_{1}Dx_{2}D\lambda_{1}D\lambda_{2}\,\varphi(x_{1})\\ &\times\varphi\left(\frac{\gamma Q+(1-\gamma)p}{\sqrt{Q}c_{\gamma}}x_{1}-\frac% {(1-\gamma)p\sqrt{Q-q_{1}}}{\sqrt{q_{1}Q}c_{\gamma}}\lambda_{1}+\frac{(1-% \gamma)\sqrt{Q-q_{2}}}{c_{\gamma}}\lambda_{2}+\frac{1-\gamma}{c_{\gamma}}\sqrt% {q_{2}-\frac{p^{2}}{q_{1}}}x_{2}\right)\\ &=\int Dx_{1}Dx_{2}\,\varphi(\sqrt{Q}x_{1})\varphi\left(\frac{\gamma Q+(1-% \gamma)p}{\sqrt{Q}c_{\gamma}}x_{1}+\frac{1-\gamma}{c_{\gamma}}\sqrt{Q-\frac{p^% {2}}{Q}}x_{2}\right)=\Delta_{Q}\left(\frac{\gamma Q+(1-\gamma)p}{c_{\gamma}}% \right)\end{split}start_ROW start_CELL ⟨ ⟨ over~ start_ARG italic_φ end_ARG italic_φ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_CELL start_CELL = ∫ italic_D italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_D italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_D italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_D italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_φ ( ∑ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( divide start_ARG square-root start_ARG italic_Q - italic_q start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT caligraphic_T start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG ) ) italic_φ ( square-root start_ARG italic_Q - italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT caligraphic_T start_POSTSUBSCRIPT 1 italic_s end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = ∫ italic_D italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_D italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_D italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_D italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_φ ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL × italic_φ ( divide start_ARG italic_γ italic_Q + ( 1 - italic_γ ) italic_p end_ARG start_ARG square-root start_ARG italic_Q end_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - divide start_ARG ( 1 - italic_γ ) italic_p square-root start_ARG italic_Q - italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG square-root start_ARG italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_Q end_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + divide start_ARG ( 1 - italic_γ ) square-root start_ARG italic_Q - italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + divide start_ARG 1 - italic_γ end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG square-root start_ARG italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - divide start_ARG italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG end_ARG italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = ∫ italic_D italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_D italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_φ ( square-root start_ARG italic_Q end_ARG italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) italic_φ ( divide start_ARG italic_γ italic_Q + ( 1 - italic_γ ) italic_p end_ARG start_ARG square-root start_ARG italic_Q end_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + divide start_ARG 1 - italic_γ end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG square-root start_ARG italic_Q - divide start_ARG italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_Q end_ARG end_ARG italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( divide start_ARG italic_γ italic_Q + ( 1 - italic_γ ) italic_p end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG ) end_CELL end_ROW (76)

Notice that this does not depend on either q1subscript𝑞1q_{1}italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT nor q2subscript𝑞2q_{2}italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. The case r=2𝑟2r=2italic_r = 2 can be obtained by sending γ1γ𝛾1𝛾\gamma\to 1-\gammaitalic_γ → 1 - italic_γ and q1,q2q1,q2formulae-sequencesubscript𝑞1subscript𝑞2subscript𝑞1subscript𝑞2q_{1},q_{2}\to q_{1},q_{2}italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT → italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Let’s analyze the term φ~λφrλxsubscriptdelimited-⟨⟩subscriptdelimited-⟨⟩~𝜑𝜆subscriptdelimited-⟨⟩subscript𝜑𝑟𝜆𝑥\langle\langle\tilde{\varphi}\rangle_{\lambda}\langle\varphi_{r}\rangle_{% \lambda}\rangle_{x}⟨ ⟨ over~ start_ARG italic_φ end_ARG ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ⟨ italic_φ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT in the r=1𝑟1r=1italic_r = 1 case

φ~λφ1λx=Dx1Dx2Dλ1Dλ2Dλ3φ(Qq1λ3+s𝒯1sxs)φ(rγr(Qqrλr+s𝒯rsxscγ))=Dx1Dx2φ(Qx1)φ(γq1+(1γ)pQcγx1+Q(γq1+(1γ)p)2Qcγ2x2)=ΔQ(γq1+(1γ)pcγ)subscriptdelimited-⟨⟩subscriptdelimited-⟨⟩~𝜑𝜆subscriptdelimited-⟨⟩subscript𝜑1𝜆𝑥𝐷subscript𝑥1𝐷subscript𝑥2𝐷subscript𝜆1𝐷subscript𝜆2𝐷subscript𝜆3𝜑𝑄subscript𝑞1subscript𝜆3subscript𝑠subscript𝒯1𝑠subscript𝑥𝑠𝜑subscript𝑟subscript𝛾𝑟𝑄subscript𝑞𝑟subscript𝜆𝑟subscript𝑠subscript𝒯𝑟𝑠subscript𝑥𝑠subscript𝑐𝛾𝐷subscript𝑥1𝐷subscript𝑥2𝜑𝑄subscript𝑥1𝜑𝛾subscript𝑞11𝛾𝑝𝑄subscript𝑐𝛾subscript𝑥1𝑄superscript𝛾subscript𝑞11𝛾𝑝2𝑄superscriptsubscript𝑐𝛾2subscript𝑥2subscriptΔ𝑄𝛾subscript𝑞11𝛾𝑝subscript𝑐𝛾\begin{split}\langle\langle\tilde{\varphi}\rangle_{\lambda}\langle\varphi_{1}% \rangle_{\lambda}\rangle_{x}&=\int Dx_{1}Dx_{2}D\lambda_{1}D\lambda_{2}D% \lambda_{3}\,\varphi\left(\sqrt{Q-q_{1}}\lambda_{3}+\sum_{s}\mathcal{T}_{1s}x_% {s}\right)\,\varphi\left(\sum_{r}\gamma_{r}\left(\frac{\sqrt{Q-q_{r}}\lambda_{% r}+\sum_{s}\mathcal{T}_{rs}x_{s}}{c_{\gamma}}\right)\right)\\ &=\int Dx_{1}Dx_{2}\,\varphi(\sqrt{Q}x_{1})\varphi\left(\frac{\gamma q_{1}+(1-% \gamma)p}{\sqrt{Q}c_{\gamma}}x_{1}+\sqrt{Q-\frac{(\gamma q_{1}+(1-\gamma)p)^{2% }}{Qc_{\gamma}^{2}}}x_{2}\right)\\ &=\Delta_{Q}\left(\frac{\gamma q_{1}+(1-\gamma)p}{c_{\gamma}}\right)\end{split}start_ROW start_CELL ⟨ ⟨ over~ start_ARG italic_φ end_ARG ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ⟨ italic_φ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_CELL start_CELL = ∫ italic_D italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_D italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_D italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_D italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_D italic_λ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_φ ( square-root start_ARG italic_Q - italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG italic_λ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT caligraphic_T start_POSTSUBSCRIPT 1 italic_s end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ) italic_φ ( ∑ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ( divide start_ARG square-root start_ARG italic_Q - italic_q start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT end_ARG italic_λ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT caligraphic_T start_POSTSUBSCRIPT italic_r italic_s end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG ) ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = ∫ italic_D italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_D italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_φ ( square-root start_ARG italic_Q end_ARG italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) italic_φ ( divide start_ARG italic_γ italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ( 1 - italic_γ ) italic_p end_ARG start_ARG square-root start_ARG italic_Q end_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + square-root start_ARG italic_Q - divide start_ARG ( italic_γ italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ( 1 - italic_γ ) italic_p ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_Q italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_ARG italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( divide start_ARG italic_γ italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ( 1 - italic_γ ) italic_p end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG ) end_CELL end_ROW (77)

The case r=2𝑟2r=2italic_r = 2 can be obtained by sending γ1γ𝛾1𝛾\gamma\to 1-\gammaitalic_γ → 1 - italic_γ and q1q2subscript𝑞1subscript𝑞2q_{1}\to q_{2}italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT → italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, i.e.

φ~λφ2λx=ΔQ((1γ)q2+γpcγ)subscriptdelimited-⟨⟩subscriptdelimited-⟨⟩~𝜑𝜆subscriptdelimited-⟨⟩subscript𝜑2𝜆𝑥subscriptΔ𝑄1𝛾subscript𝑞2𝛾𝑝subscript𝑐𝛾\begin{split}\langle\langle\tilde{\varphi}\rangle_{\lambda}\langle\varphi_{2}% \rangle_{\lambda}\rangle_{x}=\Delta_{Q}\left(\frac{(1-\gamma)q_{2}+\gamma p}{c% _{\gamma}}\right)\end{split}start_ROW start_CELL ⟨ ⟨ over~ start_ARG italic_φ end_ARG ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ⟨ italic_φ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT = roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( divide start_ARG ( 1 - italic_γ ) italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_γ italic_p end_ARG start_ARG italic_c start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT end_ARG ) end_CELL end_ROW (78)

Appendix D Computing the overlap between differently sampled solutions

The scope of this section is to find the typical overlap between two configurations 𝒘1subscript𝒘1\boldsymbol{w}_{1}bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 𝒘2subscript𝒘2\boldsymbol{w}_{2}bold_italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT that are sampled from two (in principle different) distribution p1(;𝒟)subscript𝑝1𝒟p_{1}(\bullet\,;\mathcal{D})italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( ∙ ; caligraphic_D ) and p2(;𝒟)subscript𝑝2𝒟p_{2}(\bullet\,;\mathcal{D})italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( ∙ ; caligraphic_D ), see the definition in equation (22). A way of computing this overlap has been sketched in Annesi et al. (2023). Here we adopt a different approach, based on the Franz-Parisi entropy Franz and Parisi (1995). The Franz-Parisi entropy is defined as the average log of the number of configurations 𝒘2p2(;𝒟)similar-tosubscript𝒘2subscript𝑝2𝒟\boldsymbol{w}_{2}\sim p_{2}(\bullet;\mathcal{D})bold_italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∼ italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( ∙ ; caligraphic_D ) that are at a fixed overlap p𝑝pitalic_p from the 𝒘1p1(;𝒟)similar-tosubscript𝒘1subscript𝑝1𝒟\boldsymbol{w}_{1}\sim p_{1}(\bullet;\mathcal{D})bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∼ italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( ∙ ; caligraphic_D ). In formulas

ϕFP(S)subscriptitalic-ϕ𝐹𝑃𝑆\displaystyle\phi_{FP}(S)italic_ϕ start_POSTSUBSCRIPT italic_F italic_P end_POSTSUBSCRIPT ( italic_S ) 𝔼𝒟𝑑𝒘1p1(𝒘1;𝒟)ln𝒩𝒟(𝒘1;S)absentsubscript𝔼𝒟differential-dsubscript𝒘1subscript𝑝1subscript𝒘1𝒟subscript𝒩𝒟subscript𝒘1𝑆\displaystyle\equiv\mathbb{E}_{\mathcal{D}}\int d\boldsymbol{w}_{1}\,p_{1}(% \boldsymbol{w}_{1};\mathcal{D})\ln\mathcal{N}_{\mathcal{D}}(\boldsymbol{w}_{1}% ;S)≡ blackboard_E start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT ∫ italic_d bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; caligraphic_D ) roman_ln caligraphic_N start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT ( bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; italic_S ) (79a)
𝒩𝒟(𝒘1;S)subscript𝒩𝒟subscript𝒘1𝑆\displaystyle\mathcal{N}_{\mathcal{D}}(\boldsymbol{w}_{1};S)caligraphic_N start_POSTSUBSCRIPT caligraphic_D end_POSTSUBSCRIPT ( bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; italic_S ) 𝑑𝒘2p2(𝒘2;𝒟)δ(NS𝒘1𝒘2)absentdifferential-dsubscript𝒘2subscript𝑝2subscript𝒘2𝒟𝛿𝑁𝑆subscript𝒘1subscript𝒘2\displaystyle\equiv\int d\boldsymbol{w}_{2}\,p_{2}(\boldsymbol{w}_{2};\mathcal% {D})\,\delta\left(NS-\boldsymbol{w}_{1}\cdot\boldsymbol{w}_{2}\right)≡ ∫ italic_d bold_italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( bold_italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ; caligraphic_D ) italic_δ ( italic_N italic_S - bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ bold_italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) (79b)

In the following we will call 𝒘1subscript𝒘1\boldsymbol{w}_{1}bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT the “reference” weight and 𝒘2subscript𝒘2\boldsymbol{w}_{2}bold_italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT the “slaved” weight as it is constrained to stay at a distance given by the reference 𝒘1subscript𝒘1\boldsymbol{w}_{1}bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. In the following we will also suppose (as done in the main text), that the configuration 𝒘2subscript𝒘2\boldsymbol{w}_{2}bold_italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT sampled from p2subscript𝑝2p_{2}italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, possesses the same squared norm Q𝑄Qitalic_Q as the reference 𝒘1subscript𝒘1\boldsymbol{w}_{1}bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT; this can be achieved by properly choosing the Lagrande multiplier λ2subscript𝜆2\lambda_{2}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

The typical (i.e. the most probable) overlap is the one that maximizes the Franz-Parisi entropy

p=argmaxSϕFP(S)𝑝subscriptargmax𝑆subscriptitalic-ϕ𝐹𝑃𝑆p=\operatorname*{argmax}_{S}\phi_{FP}(S)italic_p = roman_argmax start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT italic_ϕ start_POSTSUBSCRIPT italic_F italic_P end_POSTSUBSCRIPT ( italic_S ) (80)

The Franz-Parisi can be computed with standard methods using a double replica trick. Here we refer to Baldassi et al. (2023, 2021) for the derivation in the case of the perceptron. In the tree committee machine in the large width limit one gets

ϕFP(S)=extrq2,t[𝒢S(q2,t,S)+α𝒢E(q2,t,S)]subscriptitalic-ϕ𝐹𝑃𝑆subscriptextrsubscript𝑞2𝑡delimited-[]subscript𝒢𝑆subscript𝑞2𝑡𝑆𝛼subscript𝒢𝐸subscript𝑞2𝑡𝑆\phi_{FP}(S)=\text{extr}_{q_{2},\,t}\left[\mathcal{G}_{S}(q_{2},t,S)+\alpha% \mathcal{G}_{E}(q_{2},t,S)\right]italic_ϕ start_POSTSUBSCRIPT italic_F italic_P end_POSTSUBSCRIPT ( italic_S ) = extr start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT [ caligraphic_G start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_t , italic_S ) + italic_α caligraphic_G start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_t , italic_S ) ] (81)

where

𝒢S=(Q2S2)(Q2q1)+Qq122q1St+Qt22(Qq2)(Qq1)2+12ln(2π)+12ln(Qq2)subscript𝒢𝑆superscript𝑄2superscript𝑆2𝑄2subscript𝑞1𝑄superscriptsubscript𝑞122subscript𝑞1𝑆𝑡𝑄superscript𝑡22𝑄subscript𝑞2superscript𝑄subscript𝑞12122𝜋12𝑄subscript𝑞2\begin{split}\mathcal{G}_{S}&=\frac{(Q^{2}-S^{2})(Q-2q_{1})+Qq_{1}^{2}-2q_{1}% St+Qt^{2}}{2(Q-q_{2})(Q-q_{1})^{2}}+\frac{1}{2}\ln(2\pi)+\frac{1}{2}\ln\left(Q% -q_{2}\right)\end{split}start_ROW start_CELL caligraphic_G start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT end_CELL start_CELL = divide start_ARG ( italic_Q start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_S start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ( italic_Q - 2 italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) + italic_Q italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 2 italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_S italic_t + italic_Q italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 ( italic_Q - italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ( italic_Q - italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_ln ( 2 italic_π ) + divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_ln ( italic_Q - italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_CELL end_ROW (82)

and

𝒢Esubscript𝒢𝐸\displaystyle\mathcal{G}_{E}caligraphic_G start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT =Dz0Dz1Dz2eβ1(ΔQ(q1)ΔQ(0)z0+ΔQ(S)ΔQ(t)Γz1+ηz2)Dz1eβ1(ΔQ(q1)ΔQ(0)z0+ΔQ(Q)ΔQ(q1)z1)absent𝐷subscript𝑧0𝐷subscript𝑧1𝐷subscript𝑧2superscript𝑒𝛽subscript1subscriptΔ𝑄subscript𝑞1subscriptΔ𝑄0subscript𝑧0subscriptΔ𝑄𝑆subscriptΔ𝑄𝑡Γsubscript𝑧1𝜂subscript𝑧2𝐷subscript𝑧1superscript𝑒𝛽subscript1subscriptΔ𝑄subscript𝑞1subscriptΔ𝑄0subscript𝑧0subscriptΔ𝑄𝑄subscriptΔ𝑄subscript𝑞1subscript𝑧1\displaystyle=\int Dz_{0}\,\frac{\int Dz_{1}Dz_{2}\,e^{-\beta\ell_{1}\left(% \sqrt{\Delta_{Q}(q_{1})-\Delta_{Q}(0)}z_{0}+\frac{\Delta_{Q}(S)-\Delta_{Q}(t)}% {\sqrt{\Gamma}}z_{1}+\sqrt{\eta}z_{2}\right)}}{\int Dz_{1}\,e^{-\beta\ell_{1}% \left(\sqrt{\Delta_{Q}(q_{1})-\Delta_{Q}(0)}z_{0}+\sqrt{\Delta_{Q}(Q)-\Delta_{% Q}(q_{1})}z_{1}\right)}}= ∫ italic_D italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT divide start_ARG ∫ italic_D italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_D italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - italic_β roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( square-root start_ARG roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( 0 ) end_ARG italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + divide start_ARG roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_S ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) end_ARG start_ARG square-root start_ARG roman_Γ end_ARG end_ARG italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + square-root start_ARG italic_η end_ARG italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT end_ARG start_ARG ∫ italic_D italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - italic_β roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( square-root start_ARG roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( 0 ) end_ARG italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + square-root start_ARG roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_Q ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_ARG italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT end_ARG (83a)
×lnDz3eβ2(ΔQ(Q)ΔQ(q2)z3+ΔQ(t)ΔQ(0)ΔQ(q1)ΔQ(0)z0+Γz1)absent𝐷subscript𝑧3superscript𝑒𝛽subscript2subscriptΔ𝑄𝑄subscriptΔ𝑄subscript𝑞2subscript𝑧3subscriptΔ𝑄𝑡subscriptΔ𝑄0subscriptΔ𝑄subscript𝑞1subscriptΔ𝑄0subscript𝑧0Γsubscript𝑧1\displaystyle\times\ln\int Dz_{3}\,e^{-\beta\ell_{2}\left(\sqrt{\Delta_{Q}(Q)-% \Delta_{Q}(q_{2})}z_{3}+\frac{\Delta_{Q}(t)-\Delta_{Q}(0)}{\sqrt{\Delta_{Q}(q_% {1})-\Delta_{Q}(0)}}z_{0}+\sqrt{\Gamma}z_{1}\right)}× roman_ln ∫ italic_D italic_z start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - italic_β roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( square-root start_ARG roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_Q ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG italic_z start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT + divide start_ARG roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( 0 ) end_ARG start_ARG square-root start_ARG roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( 0 ) end_ARG end_ARG italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + square-root start_ARG roman_Γ end_ARG italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT (83b)
η𝜂\displaystyle\etaitalic_η ΔQ(Q)ΔQ(q1)(ΔQ(S)ΔQ(t))2ΓabsentsubscriptΔ𝑄𝑄subscriptΔ𝑄subscript𝑞1superscriptsubscriptΔ𝑄𝑆subscriptΔ𝑄𝑡2Γ\displaystyle\equiv\Delta_{Q}(Q)-\Delta_{Q}(q_{1})-\frac{(\Delta_{Q}(S)-\Delta% _{Q}(t))^{2}}{\Gamma}≡ roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_Q ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - divide start_ARG ( roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_S ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_Γ end_ARG (83c)
ΓΓ\displaystyle\Gammaroman_Γ =ΔQ(q2)ΔQ(0)(ΔQ(t)ΔQ(0))2ΔQ(q1)ΔQ(0)absentsubscriptΔ𝑄subscript𝑞2subscriptΔ𝑄0superscriptsubscriptΔ𝑄𝑡subscriptΔ𝑄02subscriptΔ𝑄subscript𝑞1subscriptΔ𝑄0\displaystyle=\Delta_{Q}(q_{2})-\Delta_{Q}(0)-\frac{(\Delta_{Q}(t)-\Delta_{Q}(% 0))^{2}}{\Delta_{Q}(q_{1})-\Delta_{Q}(0)}= roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( 0 ) - divide start_ARG ( roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_t ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( 0 ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( 0 ) end_ARG (83d)

Notice that q1subscript𝑞1q_{1}italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT represents the typical overlap between reference configurations 𝒘1p1(𝒘1;𝒟)similar-tosubscript𝒘1subscript𝑝1subscript𝒘1𝒟\boldsymbol{w}_{1}\sim p_{1}(\boldsymbol{w}_{1};\mathcal{D})bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∼ italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ; caligraphic_D ). Imposing that at the typical distance the Franz-Parisi presents a maximum we have

ϕFPS=0QS2q1S+q1t(Qq2)(Qq1)2S(Qq1)(Qq2)=α𝒢ESsubscriptitalic-ϕ𝐹𝑃𝑆0𝑄𝑆2subscript𝑞1𝑆subscript𝑞1𝑡𝑄subscript𝑞2superscript𝑄subscript𝑞12similar-to-or-equals𝑆𝑄subscript𝑞1𝑄subscript𝑞2𝛼subscript𝒢𝐸𝑆\frac{\partial\phi_{FP}}{\partial S}=0\implies\frac{QS-2q_{1}S+q_{1}t}{(Q-q_{2% })(Q-q_{1})^{2}}\simeq\frac{S}{(Q-q_{1})(Q-q_{2})}=\alpha\frac{\partial% \mathcal{G}_{E}}{\partial S}divide start_ARG ∂ italic_ϕ start_POSTSUBSCRIPT italic_F italic_P end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_S end_ARG = 0 ⟹ divide start_ARG italic_Q italic_S - 2 italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_S + italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_t end_ARG start_ARG ( italic_Q - italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ( italic_Q - italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ≃ divide start_ARG italic_S end_ARG start_ARG ( italic_Q - italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ( italic_Q - italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG = italic_α divide start_ARG ∂ caligraphic_G start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_S end_ARG (84)

The first equality follows from the fact that at the maximum of the Franz-Parisi entropy one can verify that the saddle point equation impose t=S𝑡𝑆t=Sitalic_t = italic_S. One can then compute the right hand side explicitly, expanding the expression for tS𝑡𝑆t\to Sitalic_t → italic_S. The typical overlap p𝑝pitalic_p can be obtained finally by solving this implicit equation for p𝑝pitalic_p

pΔQ(p)(Qq1)(Qq2)=αΔQ(p)ΔQ(0)Dz0[z0lnDz1eβ1(ΔQ(q1)ΔQ(0)z0+ΔQ(Q)ΔQ(q1)z1)]×[Dz1z0lnDz3eβ2(ΔQ(Q)ΔQ(q2)z3+ΔQ(p)ΔQ(0)ΔQ(q1)ΔQ(0)z0+Γz1)]𝑝superscriptsubscriptΔ𝑄𝑝𝑄subscript𝑞1𝑄subscript𝑞2𝛼subscriptΔ𝑄𝑝subscriptΔ𝑄0𝐷subscript𝑧0delimited-[]subscript𝑧0𝐷subscript𝑧1superscript𝑒𝛽subscript1subscriptΔ𝑄subscript𝑞1subscriptΔ𝑄0subscript𝑧0subscriptΔ𝑄𝑄subscriptΔ𝑄subscript𝑞1subscript𝑧1delimited-[]𝐷subscript𝑧1subscript𝑧0𝐷subscript𝑧3superscript𝑒𝛽subscript2subscriptΔ𝑄𝑄subscriptΔ𝑄subscript𝑞2subscript𝑧3subscriptΔ𝑄𝑝subscriptΔ𝑄0subscriptΔ𝑄subscript𝑞1subscriptΔ𝑄0subscript𝑧0Γsubscript𝑧1\begin{split}\frac{p}{\Delta_{Q}^{\prime}(p)(Q-q_{1})(Q-q_{2})}&=\frac{\alpha}% {\Delta_{Q}(p)-\Delta_{Q}(0)}\int Dz_{0}\,\left[\frac{\partial}{\partial z_{0}% }\,\ln\int Dz_{1}e^{-\beta\ell_{1}\left(\sqrt{\Delta_{Q}(q_{1})-\Delta_{Q}(0)}% z_{0}+\sqrt{\Delta_{Q}(Q)-\Delta_{Q}(q_{1})}z_{1}\right)}\right]\\ &\times\left[\int Dz_{1}\frac{\partial}{\partial z_{0}}\ln\int Dz_{3}\,e^{-% \beta\ell_{2}\left(\sqrt{\Delta_{Q}(Q)-\Delta_{Q}(q_{2})}z_{3}+\frac{\Delta_{Q% }(p)-\Delta_{Q}(0)}{\sqrt{\Delta_{Q}(q_{1})-\Delta_{Q}(0)}}z_{0}+\sqrt{\Gamma}% z_{1}\right)}\right]\end{split}start_ROW start_CELL divide start_ARG italic_p end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_p ) ( italic_Q - italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ( italic_Q - italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG end_CELL start_CELL = divide start_ARG italic_α end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_p ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( 0 ) end_ARG ∫ italic_D italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT [ divide start_ARG ∂ end_ARG start_ARG ∂ italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG roman_ln ∫ italic_D italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - italic_β roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( square-root start_ARG roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( 0 ) end_ARG italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + square-root start_ARG roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_Q ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_ARG italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ] end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL × [ ∫ italic_D italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT divide start_ARG ∂ end_ARG start_ARG ∂ italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG roman_ln ∫ italic_D italic_z start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT - italic_β roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( square-root start_ARG roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_Q ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG italic_z start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT + divide start_ARG roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_p ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( 0 ) end_ARG start_ARG square-root start_ARG roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( 0 ) end_ARG end_ARG italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + square-root start_ARG roman_Γ end_ARG italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ] end_CELL end_ROW (85)

where we have introduced the quantity ΔQ(q)superscriptsubscriptΔ𝑄𝑞\Delta_{Q}^{\prime}(q)roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_q ) as in equation (36) and we have redefined ΓΓ\Gammaroman_Γ to be Γ=ΔQ(q2)ΔQ(0)(ΔQ(p)ΔQ(0))2ΔQ(q1)ΔQ(0)ΓsubscriptΔ𝑄subscript𝑞2subscriptΔ𝑄0superscriptsubscriptΔ𝑄𝑝subscriptΔ𝑄02subscriptΔ𝑄subscript𝑞1subscriptΔ𝑄0\Gamma=\Delta_{Q}(q_{2})-\Delta_{Q}(0)-\frac{(\Delta_{Q}(p)-\Delta_{Q}(0))^{2}% }{\Delta_{Q}(q_{1})-\Delta_{Q}(0)}roman_Γ = roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( 0 ) - divide start_ARG ( roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_p ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( 0 ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( 0 ) end_ARG. Notice that equation (85) depends non-trivially on q1subscript𝑞1q_{1}italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and q2subscript𝑞2q_{2}italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT which represent respectively the typical overlap of configurations 𝒘1subscript𝒘1\boldsymbol{w}_{1}bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 𝒘2subscript𝒘2\boldsymbol{w}_{2}bold_italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT that are extracted from 𝒘1p1(;𝒟)similar-tosubscript𝒘1subscript𝑝1𝒟\boldsymbol{w}_{1}\sim p_{1}(\bullet;\mathcal{D})bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∼ italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( ∙ ; caligraphic_D ) and 𝒘2p2(;𝒟)similar-tosubscript𝒘2subscript𝑝2𝒟\boldsymbol{w}_{2}\sim p_{2}(\bullet;\mathcal{D})bold_italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∼ italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( ∙ ; caligraphic_D ) and that can be obtained by a standard equilibrium computation of the partition function in equation (8). In particular, when the solutions are sampled from the same distribution, then q1=q2subscript𝑞1subscript𝑞2q_{1}=q_{2}italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, and one can verify that equation (85) is trivially satisfied by p=q1=q2𝑝subscript𝑞1subscript𝑞2p=q_{1}=q_{2}italic_p = italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

In the following subsection we specialize equation (85) to several interesting sub-cases, in the large β𝛽\betaitalic_β limit.

D.1 The error counting loss with a margin

In the case one is interested in the theta loss, i.e. 1(x)=Θ(κ1x)subscript1𝑥Θsubscript𝜅1𝑥\ell_{1}(x)=\Theta(\kappa_{1}-x)roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x ) = roman_Θ ( italic_κ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_x ) and 2(x)=Θ(κ2x)subscript2𝑥Θsubscript𝜅2𝑥\ell_{2}(x)=\Theta(\kappa_{2}-x)roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ) = roman_Θ ( italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_x ) the integrals in (85) inside the logs can be performed and in the infinite β𝛽\betaitalic_β limit one gets

pΔQ(p)(Qq1)(Qq2)=αDz0Dz1GH(κ1+ΔQ(q1)ΔQ(0)z0ΔQ(Q)ΔQ(q1))GH(κ2+ΔQ(p)ΔQ(0)ΔQ(q1)ΔQ(0)z0+Γz1ΔQ(Q)ΔQ(q2))(ΔQ(Q)ΔQ(q1))(ΔQ(Q)ΔQ(q2)).𝑝superscriptsubscriptΔ𝑄𝑝𝑄subscript𝑞1𝑄subscript𝑞2𝛼𝐷subscript𝑧0𝐷subscript𝑧1𝐺𝐻subscript𝜅1subscriptΔ𝑄subscript𝑞1subscriptΔ𝑄0subscript𝑧0subscriptΔ𝑄𝑄subscriptΔ𝑄subscript𝑞1𝐺𝐻subscript𝜅2subscriptΔ𝑄𝑝subscriptΔ𝑄0subscriptΔ𝑄subscript𝑞1subscriptΔ𝑄0subscript𝑧0Γsubscript𝑧1subscriptΔ𝑄𝑄subscriptΔ𝑄subscript𝑞2subscriptΔ𝑄𝑄subscriptΔ𝑄subscript𝑞1subscriptΔ𝑄𝑄subscriptΔ𝑄subscript𝑞2\begin{split}\frac{p}{\Delta_{Q}^{\prime}(p)(Q-q_{1})(Q-q_{2})}&=\alpha\int Dz% _{0}Dz_{1}\,\frac{GH\left(\frac{\kappa_{1}+\sqrt{\Delta_{Q}(q_{1})-\Delta_{Q}(% 0)}z_{0}}{\sqrt{\Delta_{Q}(Q)-\Delta_{Q}(q_{1})}}\right)GH\left(\frac{\kappa_{% 2}+\frac{\Delta_{Q}(p)-\Delta_{Q}(0)}{\sqrt{\Delta_{Q}(q_{1})-\Delta_{Q}(0)}}z% _{0}+\sqrt{\Gamma}z_{1}}{\sqrt{\Delta_{Q}(Q)-\Delta_{Q}(q_{2})}}\right)}{\sqrt% {(\Delta_{Q}(Q)-\Delta_{Q}(q_{1}))(\Delta_{Q}(Q)-\Delta_{Q}(q_{2}))}}\,.\end{split}start_ROW start_CELL divide start_ARG italic_p end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_p ) ( italic_Q - italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ( italic_Q - italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG end_CELL start_CELL = italic_α ∫ italic_D italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_D italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT divide start_ARG italic_G italic_H ( divide start_ARG italic_κ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + square-root start_ARG roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( 0 ) end_ARG italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_Q ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_ARG end_ARG ) italic_G italic_H ( divide start_ARG italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + divide start_ARG roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_p ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( 0 ) end_ARG start_ARG square-root start_ARG roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( 0 ) end_ARG end_ARG italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + square-root start_ARG roman_Γ end_ARG italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_Q ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG end_ARG ) end_ARG start_ARG square-root start_ARG ( roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_Q ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ) ( roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_Q ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) end_ARG end_ARG . end_CELL end_ROW (86)

This expression reduces to the one of the perceptron case computed in Annesi et al. (2023) if one specializes it to the identity activation function φ(x)=x𝜑𝑥𝑥\varphi(x)=xitalic_φ ( italic_x ) = italic_x where ΔQ(q)=qsubscriptΔ𝑄𝑞𝑞\Delta_{Q}(q)=qroman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_q ) = italic_q.

D.2 Large β𝛽\betaitalic_β limit: generic loss – error counting loss with a margin

We consider here that the reference solution is sampled from a generic convex loss function, whereas 2(x)=Θ(κ2x)subscript2𝑥Θsubscript𝜅2𝑥\ell_{2}(x)=\Theta(\kappa_{2}-x)roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ) = roman_Θ ( italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_x ). In the large β𝛽\betaitalic_β limit we have that q1subscript𝑞1q_{1}italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT scales as q1=Qδq1βsubscript𝑞1𝑄𝛿subscript𝑞1𝛽q_{1}=Q-\frac{\delta q_{1}}{\beta}italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_Q - divide start_ARG italic_δ italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_β end_ARG and correspondingly ΔQ(Q)ΔQ(q1)ΔQ(Q)δq1βsimilar-to-or-equalssubscriptΔ𝑄𝑄subscriptΔ𝑄subscript𝑞1superscriptsubscriptΔ𝑄𝑄𝛿subscript𝑞1𝛽\Delta_{Q}(Q)-\Delta_{Q}(q_{1})\simeq\frac{\Delta_{Q}^{\prime}(Q)\delta q_{1}}% {\beta}roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_Q ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ≃ divide start_ARG roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_Q ) italic_δ italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_β end_ARG. Scaling z1βz1subscript𝑧1𝛽subscript𝑧1z_{1}\to\beta z_{1}italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT → italic_β italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and using saddle point method, we therefore have

pΔQ(p)(Qq2)=αδq1(ΔQ(Q)ΔQ(q2))ΔQ(Q)Dz0z(z0)Dz1GH(κ2ΔQ(p)ΔQ(0)ΔQ(Q)ΔQ(0)z0Γz1ΔQ(Q)ΔQ(q2))𝑝superscriptsubscriptΔ𝑄𝑝𝑄subscript𝑞2𝛼𝛿subscript𝑞1subscriptΔ𝑄𝑄subscriptΔ𝑄subscript𝑞2subscriptsuperscriptΔ𝑄𝑄𝐷subscript𝑧0subscript𝑧subscript𝑧0𝐷subscript𝑧1𝐺𝐻subscript𝜅2subscriptΔ𝑄𝑝subscriptΔ𝑄0subscriptΔ𝑄𝑄subscriptΔ𝑄0subscript𝑧0Γsubscript𝑧1subscriptΔ𝑄𝑄subscriptΔ𝑄subscript𝑞2\begin{split}\frac{p}{\Delta_{Q}^{\prime}(p)(Q-q_{2})}&=\frac{\alpha\sqrt{% \delta q_{1}}}{\sqrt{(\Delta_{Q}(Q)-\Delta_{Q}(q_{2}))\Delta^{\prime}_{Q}(Q)}}% \int Dz_{0}\,z_{\star}(z_{0})\int Dz_{1}\,GH\left(\frac{\kappa_{2}-\frac{% \Delta_{Q}(p)-\Delta_{Q}(0)}{\sqrt{\Delta_{Q}(Q)-\Delta_{Q}(0)}}z_{0}-\sqrt{% \Gamma}z_{1}}{\sqrt{\Delta_{Q}(Q)-\Delta_{Q}(q_{2})}}\right)\end{split}start_ROW start_CELL divide start_ARG italic_p end_ARG start_ARG roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_p ) ( italic_Q - italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG end_CELL start_CELL = divide start_ARG italic_α square-root start_ARG italic_δ italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG end_ARG start_ARG square-root start_ARG ( roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_Q ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) roman_Δ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_Q ) end_ARG end_ARG ∫ italic_D italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_z start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ∫ italic_D italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_G italic_H ( divide start_ARG italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - divide start_ARG roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_p ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( 0 ) end_ARG start_ARG square-root start_ARG roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_Q ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( 0 ) end_ARG end_ARG italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - square-root start_ARG roman_Γ end_ARG italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG square-root start_ARG roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_Q ) - roman_Δ start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) end_ARG end_ARG ) end_CELL end_ROW (87)

where z(z0)subscript𝑧subscript𝑧0z_{\star}(z_{0})italic_z start_POSTSUBSCRIPT ⋆ end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) is the same function defined in (38).