Fast and Scalable Evaluation of Unbiased Atomic Forces in ab initio Variational Monte Carlo via the Lagrangian Technique

Kousuke Nakano [email protected] Center for Basic Research on Materials (CBRM), National Institute for Materials Science (NIMS), 1-2-1 Sengen, Tsukuba, Ibaraki 305-0047, Japan    Stefano Battaglia [email protected] Department of Chemistry, University of Zurich (UZH), Winterthurerstrasse 190, 8057, Zurich, Switzerland    Jürg Hutter Department of Chemistry, University of Zurich (UZH), Winterthurerstrasse 190, 8057, Zurich, Switzerland
(November 7, 2025)
Abstract

Ab initio quantum Monte Carlo (QMC) methods are state-of-the-art electronic structure calculations based on highly parallelizable stochastic frameworks for accurate solutions of the many-body Schrödinger equation, suitable for modern many-core supercomputer architectures. Despite its potential, one of the major drawbacks that still hinders QMC applications, especially when targeting dynamical properties of large systems or large amounts of configurations, is the lack of an affordable method to compute atomic forces that are consistent with the corresponding potential energy surfaces (PESs), also known as unbiased atomic forces. Recently, one of the authors in the present paper proposed a way to obtain unbiased forces with the Jastrow-correlated Slater determinant ansatz, where the determinant part is frozen to the values obtained by a mean-field method, such as Density Functional Theory [Phys. Rev. B 109, 205151 (2024)]. However, the proposed method has a significant drawback for its applications: for a system with NN nuclei, one requires 3N3N additional DFT calculations to get unbiased forces, which is not negligible as the system size increases. This paper presents a way to replace the 3N3N DFT calculations with a single coupled-perturbed Kohn-Sham calculation, following the so-called Lagrangian technique established in quantum chemistry. This improves the computational cost and scalability of the method. We also demonstrate that the developed unbiased VMC force calculation improves not only the consistency with PESs, but also its accuracy, by investigating three molecules from the rMD17 benchmark set, and comparing the corrected VMC forces with those obtained by the Coupled-Cluster Singles and Doubles with perturbative Triples [CCSD(T)] calculations. We found that the bare VMC forces are significantly biased from the CCSD(T) ones, while the unbiased ones give values much closer to those of the CCSD(T) ones. Our finding paves the way to generate machine learning interatomic potentials based on VMC forces more efficiently and accurately.

I Introduction

Ab initio quantum Monte Carlo (ab initio QMC) methods have been increasingly studied as highly scalable and parallelizable electronic structure methods that can fully leverage the capabilities of current and near-future exascale supercomputers [shinde2025shifting]. While density functional theory (DFT) remains the most widely adopted ab initio method, it is known to face significant challenges in accurately describing certain classes of systems, such as strongly correlated materials and multireference electronic states in molecules. QMC methods are attracting attention as promising alternatives capable of addressing these limitations. Despite the maturity of energy calculations within QMC, the computation of atomic forces — i.e., first derivatives of the total energy with respect to atomic positions — has long remained an open challenge, limiting the broader applicability of QMC. There are two main real-space QMC frameworks: variational Monte Carlo (VMC) and fixed-node diffusion Monte Carlo (FN-DMC) [2001FOU]. In this study, we focus on VMC because the calculation of forces within the FN-DMC framework [Flaviano2025_DMC_comparison] is much more complicated, and thus, its force calculation is challenging [1986REY, 2005CHI_Chiesa_scheme, 2008BAD2, 2011ASS, 2014MOR, 2021VAN]. The methodological development for forces evaluation in VMC has a long history, dating back to the 1980s. An intrinsic difficulty in computing atomic forces within VMC is the so-called infinite-variance problem: when the force is evaluated by Monte Carlo sampling, the variance of the force estimator diverges [2000ASS_VMC_infinite_variance]. Over the past decades, this difficulty has been addressed by several complementary strategies. One of the earliest attempts was the space-warp coordinate transformation (SWCT) originally introduced by Umrigar [Umrigar1989_SWCT]. This was generalized by Filippi and Umrigar within a correlated sampling framework [Filippi2000CorrelatedSamplingForces] and then differential formulations were developed by Assaraf and Caffarel [Assaraf2003_ZVZB_force] and Sorella and Capriotti [Sorella2010_AAD]. Although these advances cure the divergence associated with the electron–ion Coulomb singularity, the other divergence arising from the nodal surface also had to be controlled [Attaccalite2008_AS]. Successful remedies include reweighting of the sampling distribution [Attaccalite2008_AS, Pathak2008_PW_reweight] and the tail regression estimator [Trail2008HeavyTails, Trail2008AltSampling, LopezRios2019TailRegression]. Related to variance control, it was later clarified that one must consider linear dependence issues (i.e., ill-conditioned overlap matrices) in periodic boundary condition calculations with localized Gaussian basis sets  [Nakano2021PRB_finite_variance_of_forces_for_PBC]. From an implementation point of view, the introduction of adjoint algorithmic differentiation for atomic force evaluation [Sorella2010_AAD] was a major milestone: it enables atomic force calculations without hand-coded analytic derivatives and dramatically reduces computational cost [Sorella2010_AAD]. Subsequently, a fast scheme for derivatives of multi-determinant wave functions was formulated [Filippi2016_fastderivatives_multideterminant], which has facilitated the generation of high-quality training data for machine learning force fields [slootman2024accurate]. These developments have collectively brought VMC force evaluations closer to practical utility.

Nevertheless, force calculations in VMC have been limited to small molecules [2005CHI_Chiesa_scheme, 2013ZEN_water, 2014MOR, 2014LUO_QMC_MD, 2019LIU_qmc_force_onstant, 2021VAN, Tiihonen2021JCP, Nakano2022JCP, slootman2024accurate], clusters [2017MOU, mouhat2023thermal], and crystals with light elements such as solid/liquid hydrogen [2018MAZ_Hydrogen, 2022TIR_MLP_hydrogen, 2022LY_qmc_hydrogen_phonon, 2023NIU_qmc_hydrogen, 2024CEP_qmc_hydrogen, 2025Giacomo, Kevin2025_hydrogen_RQMC, Goswami2025_hydrogen_MLP] and diamond [Nakano2021PRB_finite_variance_of_forces_for_PBC]. This restriction likely stems from a practical issue underlying VMC-based force evaluations, which we will describe hereafter: Let 𝐑α{\bf R}_{\alpha} be the atomic position of the nucleus α\alpha. The atomic force acting on α\alpha is defined as the negative gradient of the energy with respect to 𝐑α{\bf R}_{\alpha}:

𝐅αVMC=dEd𝐑α=\displaystyle{\bf F}_{\alpha}^{\text{VMC}}=-\frac{{\rm d}E}{{\rm d}{\bf R}_{\alpha}}= 𝐑αEL\displaystyle-\Braket{\frac{\partial}{\partial{\bf R}_{\alpha}}E_{\rm L}} (1a)
2(ELE)logΨ𝐑α\displaystyle-2\Braket{(E_{\rm L}-E)\frac{\partial\log\Psi}{\partial{\bf R}_{\alpha}}} (1b)
i=1NpEpipi𝐑α,\displaystyle-\sum_{i=1}^{N_{p}}{\frac{\partial E}{\partial p_{i}}}{\frac{\partial p_{i}}{\partial{\bf R}_{\alpha}}}, (1c)

where Ψ\Psi is an optimized wavefunction, A\braket{A} indicates the quantum average of the local operator AA over the sampling of |Ψ|2|\Psi|^{2}, ELE_{\rm L} is the so-called local energy (ELH^Ψ/ΨE_{\rm L}\equiv\hat{H}\Psi/\Psi), with EELE\equiv\braket{E_{L}}, and {p1,,pNp}\{p_{1},\cdots,p_{N_{p}}\} is the set of NpN_{p} variational parameters included in the Ψ\Psi ansatz. Eqs. (1a), (1b), and (1c) are called the Hellmann–Feynman (HF), Pulay, and non-variational (NV) terms, respectively. When the wavefunction includes all relevant variational parameters and is fully optimized, the NV term vanishes [Tiihonen2021JCP, Nakano2024]. However, in practice — especially for large systems or high-throughput datasets — complete optimization of all wavefunction parameters is often infeasible. Therefore, the so-called Jastrow-Slater determinant approach is typically employed, where the Slater determinant obtained from mean-field methods such as DFT is kept fixed, and only the Jastrow factor is variationally optimized. Historically, even in this case, the NV term was often neglected on the assumption that its contribution was small, and forces were computed using only the HF and Pulay terms. However, recent studies have demonstrated that ignoring the NV term can lead to significant biases [Tiihonen2021JCP, Nakano2022JCP, Nakano2024]. The bias arising from the NV term is referred to as the self-consistency error (SCE) [Tiihonen2021JCP]. In 2021, Tihonen et al. [Tiihonen2021JCP] reported that the SCE becomes more severe for heavier elements. Follow-up studies confirmed that, for example, the SCE for liquid hydrogen is on the order of  1 GPa [2024GIA_qmc_hugonioit], while for compounds such as cubic boron nitride (c-BN), it can reach up to 5 GPa [Nakano2024]. This implies that applying VMC to practical, realistic systems beyond model compounds necessitates addressing the SCE — otherwise, it becomes a critical bottleneck in VMC applications.

Nakano et al. (including one of the authors of the present paper) recently formulated a method to eliminate the SCE [Nakano2024]. The idea is conceptually straightforward: direct evaluation of the NV term using a hybrid VMC-DFT approach. Specifically, the partial derivative E/pi\partial E/\partial p_{i} is computed via standard VMC, while the wavefunction response term dpi/d𝐑αdp_{i}/d{\bf R}_{\alpha} is evaluated using DFT with the finite difference method (FDM). Using this scheme, consistent forces have been successfully obtained for various systems including H2, Cl2, c-BN [Nakano2024], and liquid hydrogen [2025Giacomo]. Although this methodology successfully demonstrated a proof-of-concept for SCE elimination [Nakano2024, 2025Giacomo], a significant practical limitation remains: to compute corrected forces on all NN atoms, one requires 3NN additional DFT calculations [Nakano2024]. While this is manageable for single-point force evaluations on small systems, it becomes computationally prohibitive when dealing with large systems or generating extensive datasets for machine-learning potential (MLP) training.

This paper presents a method to eliminate the 3NN additional DFT calculations when correcting the NV term to get unbiased VMC forces. The key idea is the following. The bias arising from the NV term is not a specific problem in VMC calculations, but also in wavefunction theory (WFT) calculations when non-variational wavefunctions — such as those used in perturbative methods — are used to compute forces. One well-established approach in WFT is the so-called Lagrangian technique [helgaker1989configuration, 2017Jensen_textbook]. In this work, we extend the Lagrangian technique to the VMC context, enabling the calculation of atomic forces and pressure in both molecular and periodic systems using just a single VMC and a single coupled-perturbed Hartree-Fock (CPHF) or coupled-perturbed Kohn-Sham (CPKS) calculation. This eliminates the need for 3NN DFT evaluations and improves the scalability of the method. Furthermore, we demonstrate that the developed consistent force evaluation improves not only the consistency, but also the accuracy of the VMC forces with the Jastrow Slater determinant anstaz. We test our implementation on three molecules, ethanol, malonaldehyde and benzene, from rMD17 benchmark set [Christensen2020_MLST, Christensen2020_rMD17_Figshare], and compare biased and unbiased VMC forces to forces obtained with the Coupled-Cluster Singles and Doubles with perturbative Triples [CCSD(T)] [Raghavachari1989_CCSDT] method, known as the gold-standard of quantum chemistry.

This paper is organized as follows. In Sec. II we describe the details of the Lagrangian technique in the context of VMC. In Sec. III, we show the validation of the formalism and our implementation in CP2K [Kuehne2020_CP2K_JCP] and TurboRVB [2020NAK2] software suites. In Sec. IV, we report a benchmark result of the unbiased VMC force calculations for the three molecules taken from rMD17 benchmark set. In Sec. V, we conclude our work and discuss the current limitation and future works of the developed method.

II Methods: Lagrangian technique extended to VMC

The key idea of the Lagrangian technique [helgaker1989configuration, 2017Jensen_textbook] is to construct a function, the Lagrangian, that has the same energy as the VMC one, but that incorporates all the constraints imposed during the optimization of the wavefunction. By making the Lagrangian stationary with respect to all its parameters, we then obtain a fully variational expression for which we can compute the nuclear gradient directly. The VMC energy depends on a set of Jastrow variational parameters {pi}i=1NJ\{p_{i}\}_{i=1}^{N_{J}}, collected in the vector 𝐩\mathbf{p}, and the occupied molecular orbital (MO) coefficients of the Slater determinant (SD) denoted by the tensor 𝐂\mathbf{C}, with elements CμiσC_{\mu i}^{\sigma}. The indices μ,ν,λ\mu,\nu,\lambda and δ\delta label atomic orbitals (AOs), the indices i,j,ki,j,k and ll label occupied MOs, and σ,τ\sigma,\tau label the spin. Sums over these indices automatically imply complete sums over the total number of occupied orbitals NMON_{\text{MO}}, atomic orbitals NAON_{\text{AO}}, and the number of spin channels NspinN_{\text{spin}}, respectively. The overlap matrix between AOs or MOs (depending on its subscripts) is denoted by SS. In the following, we always label the atomic and molecular orbitals as subscripts, and the spin index as a superscript, where possible.

Let us start by introducing the VMC Lagrangian:

LVMC(𝐩,𝐂,𝐙,𝐖)=EVMC(𝐩,𝐂)+λkτZλkτLSCF(𝐂)CλkτklτWklτ(Sklτδkl).\displaystyle L_{\rm VMC}({\bf p,C,Z,W})=E_{\rm VMC}(\mathbf{p},\mathbf{C})+\sum_{\lambda k\tau}Z_{\lambda k}^{\tau}\frac{\partial L_{\rm SCF}(\mathbf{C})}{\partial{C_{\lambda k}^{\tau}}}-\sum_{kl\tau}W_{kl}^{\tau}(S_{kl}^{\tau}-\delta_{kl})\;. (2)

The first term is the VMC energy, the second term corresponds to the stationary conditions of the underlying SCF reference calculation, and the third term enforces the orthonormality of the occupied MOs. The tensors 𝐙{\bf Z} and 𝐖{\bf W} contain the Lagrange multipliers. We do not give the explicit form of LSCFL_{\rm SCF} here because only its derivative (defined later) is needed. In the most general case, all boldfaced quantities represent tensors of rank 3, with dimension NAO×NMO×NspinN_{\text{AO}}\times N_{\text{MO}}\times N_{\text{spin}} for 𝐂\mathbf{C} and 𝐙\mathbf{Z}, and dimension NMO×NMO×NspinN_{\text{MO}}\times N_{\text{MO}}\times N_{\text{spin}} for 𝐖\mathbf{W}. In case of restricted calculations, where the MO coefficients are the same for both spin channels, Nspin=1N_{\text{spin}}=1 and the tensors reduce to matrices. Several equivalent formulations of the SCF term of the Lagrangian exist. In this work, we closely follow the atomic orbital formulation used for the implementation of density-corrected DFT gradients in CP2K [2023BEL]. This results in the Roothaan-Hall equations for the occupied MOs:

LSCFCλkτ=νFλντCνkτSλνCνkτεkτ,\frac{\partial L_{\rm SCF}}{\partial C_{\lambda k}^{\tau}}=\sum_{\nu}F_{\lambda\nu}^{\tau}C_{\nu k}^{\tau}-S_{\lambda\nu}C_{\nu k}^{\tau}\varepsilon_{k}^{\tau}, (3)

where FλντF_{\lambda\nu}^{\tau} is the Fock/Kohn-Sham matrix and εkτ\varepsilon_{k}^{\tau} is the Kohn-Sham eigenvalue of kk-th MO with the spin τ\tau. The explicit form of the Fock/Kohn-Sham matrix is introduced later. When the Lagrangian is stationary with respect to all its parameters, we can rewrite 𝐅αVMC\mathbf{F}^{\text{VMC}}_{\alpha} with LVMCL_{\rm VMC} as:

𝐅αVMC=dEVMCdRα=dLVMCdRα=LVMCRα.\displaystyle\mathbf{F}^{\rm{VMC}}_{\alpha}=-\cfrac{{\rm d}E_{\rm VMC}}{{\rm d}R_{\alpha}}=-\cfrac{{\rm d}L_{\rm VMC}}{{\rm d}R_{\alpha}}=-\cfrac{\partial L_{\rm VMC}}{\partial R_{\alpha}}\;. (4)

To make the Lagrangian stationary with respect to all its parameters, we have to ensure that the following set of equations is satisfied:

LVMCpi\displaystyle\frac{\partial L_{\rm VMC}}{\partial p_{i}} =EVMCpi=0,\displaystyle=\frac{\partial E_{\rm VMC}}{\partial p_{i}}=0\;, (5a)
LVMCCμiσ\displaystyle\frac{\partial L_{\rm VMC}}{\partial C_{\mu i}^{\sigma}} =EVMCCμiσ+λkτZλkτ2LSCFCμiσCλkτklτWklτSklτCμiσ=0,\displaystyle=\frac{\partial E_{\rm VMC}}{\partial C_{\mu i}^{\sigma}}+\sum_{\lambda k\tau}Z_{\lambda k}^{\tau}\frac{\partial^{2}L_{\rm SCF}}{\partial{C_{\mu i}^{\sigma}}\partial{C_{\lambda k}^{\tau}}}-\sum_{kl\tau}W_{kl}^{\tau}\frac{\partial S_{kl}^{\tau}}{\partial C_{\mu i}^{\sigma}}=0\;, (5b)
LVMCZλkτ\displaystyle\frac{\partial L_{\rm VMC}}{\partial Z_{\lambda k}^{\tau}} =νFλντCνkτSλνCνkτεkτ=0,\displaystyle=\sum_{\nu}F_{\lambda\nu}^{\tau}C_{\nu k}^{\tau}-S_{\lambda\nu}C_{\nu k}^{\tau}\varepsilon_{k}^{\tau}=0\;, (5c)
LVMCWklτ\displaystyle\frac{\partial L_{\rm VMC}}{\partial W_{kl}^{\tau}} =Sklτδkl=0.\displaystyle=S_{kl}^{\tau}-\delta_{kl}=0\;. (5d)

Eq. 5a is the variational condition on the VMC energy with respect to the Jastrow parameters. For an optimized Jastrow wave function, this condition is automatically fulfilled. Eq. 5b is zero by virtue of the Lagrange multipliers; that is, we choose the elements ZλkτZ_{\lambda k}^{\tau} and WklτW_{kl}^{\tau} such that LVMCCμiσ=0\tfrac{\partial L_{\rm VMC}}{\partial C_{\mu i}^{\sigma}}=0 for μ,i,σ\forall\mu,i,\sigma. Eqs. 5c and 5d are automatically fulfilled if we use canonical orbitals from the underlying SCF calculation.

At first sight, it seems that Eq. 5b is not sufficient to determine both 𝐙\mathbf{Z} and 𝐖\mathbf{W}. However, only certain subblocks are different from zero, such that by projecting the stationary conditions onto occupied and virtual orbital spaces, we are able to obtain enough equations for their determination. To this end, let us introduce the projection operator 𝐐\mathbf{Q}, with elements:

Qμνσ=δμνλkCμkσCλkσSλν.Q_{\mu\nu}^{\sigma}=\delta_{\mu\nu}-\sum_{\lambda k}C_{\mu k}^{\sigma}C_{\lambda k}^{\sigma}S_{\lambda\nu}\;. (6)

The Lagrange multipliers ZλkτZ_{\lambda k}^{\tau} can be best understood as expansion coefficients of the orbital response due to the correlation energy introduced by the Jastrow factor. Hence, only the occupied-virtual blocks contribute to the response, while the occupied-occupied block is assumed to be zero:

Zijσ=μνZμiσSμνCνjσ=0,Z_{ij}^{\sigma}=\sum_{\mu\nu}Z_{\mu i}^{\sigma}S_{\mu\nu}C_{\nu j}^{\sigma}=0\;, (7)

as unitary rotations among occupied orbitals leave the energy unchanged. It is convenient at this stage to introduce the explicit form of the Fock/Kohn-Sham matrix:

Fμνσ[𝐏]=hμν+λδτPλδτ[(μν|λδ)cxδστ(μλ|νδ)]+Vμνxc,σ,F_{\mu\nu}^{\sigma}[\mathbf{P}]=h_{\mu\nu}+\sum_{\lambda\delta\tau}P_{\lambda\delta}^{\tau}[(\mu\nu|\lambda\delta)-c_{x}\delta_{\sigma\tau}(\mu\lambda|\nu\delta)]+V_{\mu\nu}^{\text{xc},\sigma}\;, (8)

where (μν|λδ)(\mu\nu|\lambda\delta) are two-electron integrals over primitive atomic orbitals

(μν|λδ)=ψμ(𝐫)ψν(𝐫)1|𝐫𝐫|ψλ(𝐫)ψδ(𝐫)d𝐫d𝐫;(\mu\nu|\lambda\delta)=\int\int\psi_{\mu}(\mathbf{r})\psi_{\nu}(\mathbf{r})\frac{1}{|\mathbf{r}-\mathbf{r}^{\prime}|}\psi_{\lambda}(\mathbf{r}^{\prime})\psi_{\delta}(\mathbf{r}^{\prime})\mathrm{d}\mathbf{r}\mathrm{d}\mathbf{r}^{\prime}\;; (9)

and hμνh_{\mu\nu} are the kinetic energy and nuclear electron attraction integrals. This form is completely general and accommodates Hartree-Fock, pure semilocal functionals, and hybrid functionals using the exact exchange parameter cxc_{x} and the exchange-correlation potential Vμνxc,σV^{\text{xc},\sigma}_{\mu\nu}. The tensor 𝐏\mathbf{P} contains the spin-density matrices in AO basis, with elements:

Pμνσ=iCμiσCνiσ.P_{\mu\nu}^{\sigma}=\sum_{i}C_{\mu i}^{\sigma}C_{\nu i}^{\sigma}\;. (10)

The stationary conditions for the MO coefficients are given by equating to zero the derivatives of LVMCL_{\rm VMC} with respect to CμiσC_{\mu i}^{\sigma}:

LVMCCμiσ=EVMCCμiσ+λkτZλkτνCμiσ(FλντCνkτSλνCνkτεkτ)klτWklτSklτCμiσ=0.\displaystyle\frac{\partial L_{\rm VMC}}{\partial C_{\mu i}^{\sigma}}=\frac{\partial E_{\rm VMC}}{\partial C_{\mu i}^{\sigma}}+\sum_{\lambda k\tau}Z_{\lambda k}^{\tau}\sum_{\nu}\frac{\partial}{\partial C_{\mu i}^{\sigma}}(F_{\lambda\nu}^{\tau}C_{\nu k}^{\tau}-S_{\lambda\nu}C_{\nu k}^{\tau}\varepsilon_{k}^{\tau})-\sum_{kl\tau}W_{kl}^{\tau}\frac{\partial S_{kl}^{\tau}}{\partial C_{\mu i}^{\sigma}}=0\;. (11)

To arrive at working equations to determine the multipliers, we project Eq. 11 onto the occupied orbital space using 𝐂\mathbf{C}, and onto the unoccupied orbital space using 𝐐\mathbf{Q}:

μLVMCCμiσQμνσ\displaystyle\sum_{\mu}\frac{\partial L_{\rm VMC}}{\partial C_{\mu i}^{\sigma}}Q_{\mu\nu}^{\sigma} =0Zνiσν,i,σ,\displaystyle=0\to Z_{\nu i}^{\sigma}\quad\forall\,\nu,i,\sigma\;, (12a)
μLVMCCμiσCμjσ\displaystyle\sum_{\mu}\frac{\partial L_{\rm VMC}}{\partial C_{\mu i}^{\sigma}}C_{\mu j}^{\sigma} =0Wijσi,j,σ.\displaystyle=0\to W_{ij}^{\sigma}\quad\forall\,i,j,\sigma\;. (12b)

In the following, we report only the final expressions obtained from Eqs. 12a and 12b; their explicit derivation is given in Appendix B. The projection onto the virtual space, Eq. 12a, results in a linear system of equations 𝐀𝐙=𝐁\mathbf{A}\mathbf{Z}=-\mathbf{B} known as the Z-vector equations, from which the Lagrange multipliers 𝐙\mathbf{Z} can be determined:

λZλiσ(FλμσSλμεiσ)+μHμiσ[𝐏¯]Qμνσ=μBμiσQμνσ.\sum_{\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\mu}^{\sigma}-S_{\lambda\mu}\varepsilon_{i}^{\sigma})+\sum_{\mu}H_{\mu i}^{\sigma}\big[\bar{\mathbf{P}}\big]Q_{\mu\nu}^{\sigma}=-\sum_{\mu}B_{\mu i}^{\sigma}Q_{\mu\nu}^{\sigma}\;. (13)

The elements of the tensor 𝐁\mathbf{B} are the derivatives of the VMC energy with respect to the MO coefficients

Bμiσ=EVMCCμiσ,B_{\mu i}^{\sigma}=\frac{\partial E_{\rm VMC}}{\partial C_{\mu i}^{\sigma}}\;, (14)

while the tensor 𝐀\mathbf{A} contains quantities related to the SCF stationary conditions and the MOs orthonormalization constraints; hence it does not depend on the VMC calculation. The elements Hμiσ[𝐏¯]H_{\mu i}^{\sigma}\big[\bar{\mathbf{P}}\big] are derivatives of the Fock matrix with respect to the MO coefficients,

Hμiσ[𝐏¯]=λντP¯λντ(2(λν|μiσ)cxδτσ[(λμ|νiσ)+(λiσ|νμ)])+2fλντ,μiσxc;H_{\mu i}^{\sigma}\big[\bar{\mathbf{P}}\big]=\sum_{\lambda\nu\tau}\bar{P}_{\lambda\nu}^{\tau}(2(\lambda\nu|\mu i_{\sigma})-c_{x}\delta_{\tau\sigma}[(\lambda\mu|\nu i_{\sigma})+(\lambda i_{\sigma}|\nu\mu)])+2f_{\lambda\nu\tau,\mu i\sigma}^{\text{xc}}\;; (15)

where fλντ,μiσxcf^{\text{xc}}_{\lambda\nu\tau,\mu i\sigma} is the exchange-correlation kernel with an index contracted with a molecular orbital, and the tensor 𝐏¯\bar{\mathbf{P}} is a symmetrized form of the response density matrix (for increased numerical stability), given by

P¯μνσ=12i(ZμiσCνiσ+CμiσZνiσ).\bar{P}_{\mu\nu}^{\sigma}=\frac{1}{2}\sum_{i}(Z_{\mu i}^{\sigma}C_{\nu i}^{\sigma}+C_{\mu i}^{\sigma}Z_{\nu i}^{\sigma})\;. (16)

Once the response tensor 𝐙\mathbf{Z} is determined by solving Eq. 13, we can also determine the second set of Lagrange multipliers 𝐖\mathbf{W}, which are given by

Wijσ=12Hijσ[𝐏¯]+12μBμiσCμjσ.W_{ij}^{\sigma}=\frac{1}{2}H_{ij}^{\sigma}\big[\bar{\mathbf{P}}\big]+\frac{1}{2}\sum_{\mu}B_{\mu i}^{\sigma}C_{\mu j}^{\sigma}. (17)

Finally, the atomic force is given by the negative gradient of the Lagrangian with respect to the nuclear positions, reading

LVMC𝐑α=EVMC𝐑αμνσΛμνσSμν𝐑α+μνjσZμjσFμνσ𝐑αCνjσ,\displaystyle\frac{\partial L_{\rm VMC}}{\partial\mathbf{R}_{\alpha}}=\frac{\partial E_{\rm VMC}}{\partial\mathbf{R}_{\alpha}}-\sum_{\mu\nu\sigma}\Lambda_{\mu\nu}^{\sigma}\frac{\partial S_{\mu\nu}}{\partial\mathbf{R}_{\alpha}}+\sum_{\mu\nu j\sigma}Z_{\mu j}^{\sigma}\frac{\partial F_{\mu\nu}^{\sigma}}{\partial\mathbf{R}_{\alpha}}C_{\nu j}^{\sigma}\;, (18)

with the intermediate quantity Λμνσ\Lambda_{\mu\nu}^{\sigma} defined as

Λμνσ=ijCμiσWijσCνjσ+12iεiσ(ZμiσCνiσ+CμiσZνiσ),\Lambda_{\mu\nu}^{\sigma}=\sum_{ij}C_{\mu i}^{\sigma}W_{ij}^{\sigma}C_{\nu j}^{\sigma}+\frac{1}{2}\sum_{i}\varepsilon_{i}^{\sigma}(Z_{\mu i}^{\sigma}C_{\nu i}^{\sigma}+C_{\mu i}^{\sigma}Z_{\nu i}^{\sigma})\;, (19)

embodying the conditions of orthonormality between the response MOs, and the so-called energy-weighted response density matrix.

An important aspect of the above Lagrangian approach in the context of VMC is that the elements of the right-hand side tensor 𝐁\mathbf{B} of Eq. 13 (i.e., the energy derivative with respect to the MO coefficients) come with statistical noise. It is important to know how this noise propagates from the VMC energy derivative to the response vectors, and ultimately to the forces. There are several ways to estimate the errors: the easiest way is to use a resampling method such as the Jackknife approach. As an outcome of a VMC calculation, we can reblock samples of BμiσB^{\sigma}_{\mu i} and estimate the error bars of the corrections by iteratively solving the Z-vector equations. Another way to estimate the errors is based on the analytical error propagation of BμiσB^{\sigma}_{\mu i}, for which the variance-covariance tensor of BμiσB^{\sigma}_{\mu i} is required. In this paper, we employed the former approach, i.e., the error bar estimations were done using the Jackknife method. Further details about the VMC calculation of energy and gradients, as well as the derivation of the analytical error propagation is shown in Appendix A.

III Validation of the method

In this section, we validate the Lagrangian technique described above, which we have implemented by interfacing the linear response module of CP2K [Kuehne2020_CP2K_JCP] with TurboRVB [2020NAK2] and exchanging the data through the TREX-IO library [2023POS]. By validation, we mean demonstrating two points: (i) forces and pressures obtained with the Lagrangian technique agree with numerical derivatives of the potential energy surface (i.e., the results are unbiased), and (ii) these forces and pressures are consistent with those computed by the finite-difference method (FDM) developed in Ref. Nakano2024. The validation was performed for a Cl2 molecule and crystalline c-BN. For Cl2, we evaluated forces; for c-BN, we evaluated both forces and pressure. For Cl2, we used the ccECP pseudopotentials [2018BEN] together with their accompanied cc-pVDZ basis sets. For c-BN we used the ccECP [2017BEN] pseudopotentials combined with DZVP-MOLOPT-SR-GTH basis sets [Goedecker1996_GTH, VandeVondele2007_MOLOPT] to avoid spuriously large force error bars by GTOs with large exponents [Nakano2021PRB_finite_variance_of_forces_for_PBC]. The cBN calculations employed a conventional 1 ×\times 1 ×\times 1 supercell with Γ\Gamma-point. To estimate the statistical error of the linear-response (LR) correction, we used the jackknife approach [2017BEC]: we repeatedly applied the LR procedure to samples of E/Pμν\partial E/\partial P_{\mu\nu}^{\uparrow\downarrow} drawn from TurboRVB and computed the jackknife mean and variance. Figures 1 and 2 summarize the results for the Cl2 forces and for the c-BN forces and pressure, addressing points (i) and (ii). As shown, the forces and pressures from the Lagrangian technique coincide with the numerical derivatives of the potential energy surface (i.e., they are unbiased) and match the values obtained with the FDM [Nakano2024]. Taken together, these results confirm that the Lagrangian technique behaves as expected. Notice that we do not compare the obtained PESs and EOS with the experimental bond lengths and volume, since the used basis sets, the supercell size, and the kk-point (Γ\Gamma only) are suitable only for numerically checking (i) and (ii).

Refer to caption
Figure 1: Validation of the Lagrangian technique for Cl2. (a) Potential energy surface (PES) as a function of bond length (green circles) and its fit (green dotted line). The reference force on the right-hand Cl atom (Cl2 oriented along the xx axis), obtained from the derivative of the fitted PES, is shown as a blue dashed line. Red diamonds, purple squares, and cyan triangles denote forces evaluated with the HF + Pulay terms, HF + Pulay terms + NV correction (via FDM), and HF + Pulay terms + NV correction (via LR), respectively. (b) Comparison between the PES-derived (reference) forces and those obtained from the estimators. Inset: direct comparison between HF + Pulay terms + NV correction (via FDM) and HF + Pulay terms + NV correction (via LR).
Refer to caption
Figure 2: Validation of the Lagrangian technique for cBN. (a) Equation of state (EOS) as a function of volume (green circles) and its fit (green dotted line). The reference pressure obtained from the derivative of the fitted EOS, is shown as a blue dashed line. Red diamonds, purple squares, and cyan triangles denote pressures evaluated with the HF + Pulay terms, HF + Pulay terms + NV correction (via FDM), and HF + Pulay terms + NV correction (via LR), respectively. (b) Comparison between the EOS-derived (reference) pressures and those obtained from the estimators. Inset: direct comparison between HF + Pulay terms + NV correction (via FDM) and HF + Pulay terms + NV correction (via LR). (c) PES as a function of a displacement of the B atom located at the origin (0,0,0) of the unit cell in the xx direction (green circles) and its fit (green dotted line). The reference force on the B atom, obtained from the derivative of the fitted PES, is shown as a blue dashed line. Red diamonds, purple squares, and cyan triangles denote forces evaluated with the HF + Pulay terms, HF + Pulay terms + NV correction (via FDM), and HF + Pulay terms + NV correction (via LR), respectively. (d) Comparison between the PES-derived (reference) forces and those obtained from the estimators. Inset: direct comparison between HF + Pulay terms + NV correction (via FDM) and HF + Pulay terms + NV correction (via LR).

We next discuss computational cost. The primary advantage of the Lagrangian technique is efficiency: unlike the FDM approach [Nakano2024], which requires 3N3N separate DFT calculations to obtain the forces, the Lagrangian technique needs only a single solution of the CPHF/CPKS equations via the LR module. The complexity of DFT is typically O(N3)O(N^{3}), and together with the 3N3N required calculations for the FDM approach, results in a O(N4)O(N^{4}) asymptotic scaling. On the other hand, solving the CPHF/CPKS equations has the same asymptotic scaling as a single HF/KS-SCF calculation. Therefore, the asymptotic complexity of the Lagrangian approach remains O(N3)O(N^{3}). As the system size increases, the Lagrangian approach is the more affordable and likely the only practical solution for larger systems.

From a technical point of view, the exchange of wavefunctions between CP2K [Kuehne2020_CP2K_JCP] and TurboRVB [2020NAK2] required careful attention to differences in conventions, which is a common challenge across electronic structure packages. Examples include the use of spherical-harmonic versus Cartesian Gaussian Type Orbitals (GTOs), normalization constants and their application stage (pre- vs post-contraction, shell-wise vs AO-level), and related notational choices. In this work, we adopted the TREX-IO [2023POS] format, a flexible library that standardizes conventions for exchanging wavefunctions, structures, and pseudopotentials among electronic structure packages. TREX-IO effectively absorbed inter-code convention mismatches and played a key role in robust data exchange between CP2k and TurboRVB in this work.

IV Application to organic molecules in rMD17 dataset

Now that we have established an unbiased VMC force evaluation based on the Lagrangian technique, a systematic assessment of VMC force accuracy is feasible across a wide range of compounds. VMC and DMC results are increasingly being used as training dataset for MLPs [2022TIR_MLP_hydrogen, 2023NIU_qmc_hydrogen, 2024CEP_qmc_hydrogen, 2024GIA_qmc_hugonioit, slootman2024accurate, 2025Giacomo, Goswami2025_hydrogen_MLP, Kevin2025_hydrogen_RQMC]. In this study, we selected ethanol, benzene, and malonaldehyde from the rMD17 dataset [Christensen2020_MLST, Christensen2020_rMD17_Figshare], and computed (i) unbiased VMC forces established in this work, (ii) DFT forces, and (iii) CCSD(T) forces as references. Specifically, we took 100 structures from molecular dynamics trajectories distributed as rMD17 dataset [Christensen2020_MLST, Christensen2020_rMD17_Figshare]. In a previous study that constructed MLPs with CCSD(T) references [2018Stefan], the Dunning cc-pVTZ [1989DUN] was used for ethanol, and cc-pVDZ [1989DUN] for benzene and malonaldehyde to generate a training dataset for MLPs. Here, we used cc-pVQZ for the three molecules for the CCSD(T) calculations (1ss shells were kept frozen) to obtain results closer to the complete basis-set limit. For comparison, we also carried out WFT calculations — HF, second-order Møller–Plesset perturbation theory (MP2) [1934MOL_MP2], and CCSD [Purvis1982_CCSD] with the cc-pVQZ basis — and DFT calculations with the def2-QZVPPD [Weigend2005_def2] basis set using several exchange–correlation functionals with the D3BJ dispersion correction [Grimme2010_D3, Grimme2011_D3BJ]. Specifically, we chose SVWN [Vosko1980_VWN], PBE [1996PER], PBE+D3BJ, PBE0 [Adamo1999_PBE0]+D3BJ, ω\omegaB97X+D3BJ [Mardirossian2014_wB97X_V, Najibi2018_wB97X_wB97M_D3BJ], and ω\omegaB97M+D3BJ [Mardirossian2016_wB97M_V, Najibi2018_wB97X_wB97M_D3BJ]. Also in the case of MP2 and CCSD calculations, the 1ss shells were kept frozen. All of these calculations were performed with Psi4 software suite [Smith2020_PSI4_1_4]. SVWN and PBE were selected as computationally inexpensive exchange–correlation functionals. PBE0 was included to allow comparison with a previous VMC/DMC benchmark study [slootman2024accurate], which combined Tkatchenko-Scheffler (TS) [2009ALEX-TS] and the many-body dispersion (MBD) [2012ALEX-MBD] corrections with PBE and PBE0, respectively. Since the Psi4 software suite does not implement TS or MBD, we used D3BJ dispersion correction in this work. The ω\omegaB97 family of functionals was chosen due to its widespread use in popular MLP datasets (e.g., SPICE [Eastman2023_SPICE_SciData]) and its reported good performance. For the VMC calculations, we employed ccECP pseudopotentials [2017BEN] together with the accompanied cc-pVTZ basis set. Following the previous VMC/DMC benchmark study by Slootman et al. [slootman2024accurate], we also benchmarked ccECP-based CCSD against all-electron CCSD using PySCF [2018SUN_pyscf, 2020SUN_pyscf] and confirmed that the force errors for the three molecules are << 1 kcal/mol/Å in RMSEs. In the VMC calculations, the Slater determinant part was fixed to orbitals obtained from DFT (B3LYP), and only the Jastrow part was variationally optimized using stochastic reconfiguration [1998SOR]. The Jastrow factor comprised a two-body term enforcing the electron–electron cusp [2020NAK2] and a three-body term parametrized using pairings of atomic orbitals [2020NAK2]. For the three-body Jastrow basis, we used uncontracted primitive GTOs with [3s] for H and [3s1p] for C, N, and O; the corresponding Gaussian exponents were optimized for each configuration during the optimization procedure. Although matrix elements coupling different atomic pairs (often referred to as four-body Jastrow terms [2020NAK2]) are frequently set to zero in practice [2020NAK2], we retained them here because they were observed to have a significant impact on the forces. The total numbers of optimized variational parameters were 715, 958, and 1548 for ethanol, malonaldehyde, and benzene, respectively. Trial wavefunctions were generated with CP2K [Kuehne2020_CP2K_JCP]; VMC simulations were performed with TurboRVB [2020NAK2]; and the force-bias correction based on the Lagrangian technique was applied via CP2K [Kuehne2020_CP2K_JCP].

Table 1: The RMSE of atomic forces of all the configurations for the selected methods relative to CCSD(T). The unit is kcal/mol/Å.
 Method ethanol benzene malonaldehyde
 VMC(biased) 5.29 4.78 14.16
VMC(unbiased) 3.64 2.69 3.81
 HF 9.45 5.57 23.44
MP2 1.39 1.64 2.13
CCSD 1.62 2.00 5.01
SVWN 7.37 8.32 8.86
PBE 4.63 3.09 4.72
PBE-D3BJ 4.60 3.06 4.68
PBE0-D3BJ 2.21 3.75 7.80
ω\omegaB97X-D3BJ 0.98 3.13 5.40
ω\omegaB97M-D3BJ 1.24 3.08 6.31
 
Refer to caption
Figure 3: Violin plots showing, for each configuration of (a) Ethanol, (b) Benzene, and (c) Malonaldehyde, the RMSE of atomic forces for the selected methods relative to CCSD(T). The unit is kcal/mol/Å.

Table 1 reports, for each molecule, the root-mean-square error (RMSE, in kcal mol-1 Å-1) of all force components — i.e., Natoms×3N_{\mathrm{atoms}}\times 3 directions ×\times 100 structures — relative to CCSD(T). Figure 3 reports, for each molecule, the violin plot of the distribution of RMSE in 100 configurations for selected methods. We first emphasize that the benefit of the Lagrangian technique is evident from Table 1. Comparing the CCSD(T) reference with biased VMC forces (i.e., without the Lagrangian-technique correction), the RMSEs are 5.29, 4.78, and 14.16 kcal mol-1 Å-1 for ethanol, benzene, malonaldehyde, respectively — better than HF and SVWN, yet substantially worse than PBE for all systems. In contrast, the unbiased VMC forces developed in this study yield RMSEs of 3.64, 2.69, and 3.81 kcal mol-1 Å-1 for ethanol, benzene, and malonaldehyde, respectively, against CCSD(T). Thus, when accurate forces are sought with only partially optimized wavefunctions, our correction scheme is essential.

For ethanol, Slootman et al. [slootman2024accurate] performed an extensive benchmark for a set of 200 configurations and reported an RMSE of 3.055 kcal mol-1 Å-1 for VMC forces based on a fully optimized Jastrow Slater wavefunction. Although our ethanol configurations are not identical to theirs and a strict one-to-one comparison is not possible, our ethanol RMSE, 3.64 kcal mol-1 Å-1, is close to their value. This demonstrates that, even without optimizing the determinant part, correcting the non-variational term can achieve force accuracies comparable to those obtained with fully optimized wavefunctions, thereby reducing computational cost. Benzene and malonaldehyde similarly show RMSEs in the order of 3 kcal mol-1 Å-1. Slootman et al. [slootman2024accurate] found that achieving force accuracy of \sim 1 kcal mol-1 Å-1 relative to CCSD(T) generally requires DMC. While the present Lagrangian technique is, in principle, applicable to DMC, a full DMC implementation lies beyond the scope of this paper and is left for future work.

Table 1 also lists the HF, MP2, CCSD, and DFT results. For ethanol, ω\omegaB97X-D3BJ and ω\omegaB97M-D3BJ perform exceptionally well, yielding << 1 kcal mol-1 Å-1 errors relative to CCSD(T). Interestingly, this trend is not universal across molecules. For benzene, the VMC RMSE is 2.69 kcal mol-1 Å-1, whereas ω\omegaB97X-D3BJ and ω\omegaB97M-D3BJ exceed 3 kcal mol-1 Å-1. For malonaldehyde, VMC yields 3.81 kcal mol-1 Å-1, while the ω\omegaB97 functional family gives \sim 6 kcal mol-1 Å-1. For ethanol, Slootman et al. [slootman2024accurate] compared VMC with dispersion-corrected PBE-TS and PBE0-MB, finding RMSE(PBE0-MBD) << RMSE(VMC) << RMSE(PBE-TS); that is, VMC performed between a GGA and a hybrid GGA. Our results reproduce this trend. For all the three compounds, HF shows the worst RMSE among the tested method, while MP2 shows a very good performance.

These results suggest that our unbiased VMC appears advantageous over DFT when one must treat chemically diverse sets of compounds (e.g., for generating general purpose machine-learning datasets) because VMC accuracy shows less system-dependence. However, in terms of closeness to the CCSD(T) reference values, VMC is outperformed by MP2 for the three compounds. Although VMC offers a more favorable scaling with the number of electrons (typically, O(N3)O(N^{3}) [esler2008quantum] and O(N45)O(N^{4-5}) for VMC and MP2, respectively) and is naturally suitable for parallelization, the large prefactor in practical VMC implementations complicates any conclusion about superior wall time efficiency. Consequently, the question of which method can deliver forces close to CCSD(T) accuracy at a cost lower than the CCSD(T) reference remains unresolved in the present study due to the limited number of test cases. A broader benchmark — covering more chemically varied systems and also including DMC benchmark — is a promising future work.

V Concluding remarks and future perspectives

In this paper, we introduce an application of the Lagrangian technique in Variational Monte Carlo (VMC) calculations to get atomic forces and pressures consistent with potential energy surfaces (also known as unbiased forces and pressures). The previous study to get unbiased forces and pressures with the Jastrow Slater determinant ansatz requires 3NN times additional Density Functional Theory (DFT) calculations (NN refers to the number of nuclei in a system) [Nakano2024], which hinders the applications of unbiased VMC force/pressure calculations in large molecules/crystal or in generations of training data for machine-learning potentials (MLPs). The formulation we report in this paper replaces the additional 3NN-times DFT calculations with a single coupled-perturbed Hartree-Fock or coupled-perturbed Kohn-Sham calculation. This improves the scalability of the VMC force calculations with the conventional Jastrow Slater determinant ansatz. As an application of our developed method, we demonstrate that the unbiased force evaluation improves not only the consistency but also the accuracy of the VMC forces. We selected three molecules from the rMD17 benchmark set [Christensen2020_MLST, Christensen2020_rMD17_Figshare], ethanol, malonaldehyde and benzene, and compared biased VMC forces, unbiased VMC forces, and forces obtained by Coupled-Cluster Singles and Doubles with perturbative Triples [CCSD(T)] [Raghavachari1989_CCSDT] calculations. We found that the biased VMC forces are significantly biased from the CCSD(T) ones, while the unbiased ones give values close to those of the CCSD(T) ones. The unbiased force evaluation described here, for instance, plays a crucial role in generating high-quality training data for MLPs.

Code and Data availability

The ab initio DFT and QMC packages used in this work, CP2K and TurboRVB, are available from their GitHub repositories [https://github.com/cp2k/cp2k] and [https://github.com/sissaschool/turborvb], respectively.

acknowledgments

K.N. is grateful for computational resources from the Numerical Materials Simulator at National Institute for Materials Science (NIMS), from Research Institute for Information Technology at Kyushu University under the category of General Projects on the supercomputer GENKAI, and from supercomputer Fugaku provided by RIKEN through the HPCI System Research Projects (Project ID: hp250031). K.N. acknowledges financial support from MEXT Leading Initiative for Excellent Young Researchers (Grant No. JPMXS0320220025), from Murata Science and Education Foundation (Grant No. M24AN006), and from Japan Science and Technology Agency (JST), PRESTO (Grant No. JPMJPR24J9). This research was also supported by the NCCR MARVEL, a National Center of Competence in Research, funded by the Swiss National Science Foundation (Grant No. 205602). K.N. acknowledges Michele Casula (CNRS) for fruitful discussion about VMC force calculations. K.N. acknowledges the valuable comments on the method the author developed before from Dr. Emmanuel Giner (CNRS) and Dr. David Ceperley (UIUC) in a conference. We thank Guglielmo Mazzola (UZH) for organizing a seminar delivered by one of the authors K.N. at UZH. The seminar and subsequent discussions led to this collaboration.

Appendix A Details of the Variational Monte Carlo calculations

A.1 Variational Monte Carlo Methods

The expectation value of the energy for a given wavefunction, Ψ\Psi, is evaluated as:

E=𝑑𝐱Ψ2(𝐱)^Ψ(𝐱)/Ψ(𝐱)𝑑𝐱Ψ2(𝐱)=𝑑𝐱eL(𝐱)π(𝐱),\left\langle E\right\rangle=\frac{{\int{d{\mathbf{x}}{\Psi^{2}}\left({\mathbf{x}}\right)\cdot\hat{\mathcal{H}}\Psi\left({\mathbf{x}}\right)/\Psi\left({\mathbf{x}}\right)}}}{{\int{d{\mathbf{x}}{\Psi^{2}}\left({\mathbf{x}}\right)}}}=\int{d{\mathbf{x}}{e_{L}}\left({\mathbf{x}}\right)\pi\left({\mathbf{x}}\right)}, (20)

where 𝐱=(𝐫1σ1,𝐫2σ2,𝐫NσN){\mathbf{x}}=\left({{{\mathbf{r}}_{1}}{\sigma_{1}},{{\mathbf{r}}_{2}}{\sigma_{2}},\ldots{{\mathbf{r}}_{N}}{\sigma_{N}}}\right) is a shorthand notation for the NN electron coordinates and their spins, and

eL(𝐱)^Ψ(𝐱)Ψ(𝐱)andπ(𝐱)Ψ2(𝐱)𝑑𝐱Ψ2(𝐱),{e_{L}}\left({\mathbf{x}}\right)\equiv{\hat{\mathcal{H}}\Psi\left({\mathbf{x}}\right)\over\Psi\left({\mathbf{x}}\right)}\quad\text{and}\quad\pi\left({\mathbf{x}}\right)\equiv{{\Psi^{2}}\left({\mathbf{x}}\right)\over\int{d{\mathbf{x^{\prime}}}{\Psi^{2}}\left({\mathbf{x^{\prime}}}\right)}},

are the local energy and the probability distribution, respectively. The 3NN-dimensional integration can be stochastically evaluated by generating a set {𝐱i}\left\{{{{\mathbf{x}}_{i}}}\right\} according to the probability distribution π(𝐱)\pi\left({\mathbf{x}}\right) generated using the Markov-Chain Monte Carlo (MCMC) technique and by averaging the obtained local energies eL(𝐱i){e_{L}}\left({{{\mathbf{x}}_{i}}}\right):

EMCMC=eL(𝐱)π(𝐱)1Mi=1MeL(𝐱i).{E_{{\text{MCMC}}}}={\left\langle{{e_{L}}\left({\mathbf{x}}\right)}\right\rangle_{\pi\left({\mathbf{x}}\right)}}\approx\frac{1}{M}\sum\limits_{i=1}^{M}{{e_{L}}\left({{{\mathbf{x}}_{i}}}\right)}. (21)

EMCMC{E_{{\text{MCMC}}}} is always associated with an error since it is evaluated by a stochastic sampling. We note that the error bar is inversely proportional to the square root of the number of samplings.

Suppose E0E_{0} is the ground state energy, the variational theorem guarantees that the expectation value of EE never becomes smaller than E0E_{0}. In other words, to approach the ground state, one can optimize a given WF by introducing a set of parameters 𝐩(,pi,)\mathbf{p}\equiv\left(\cdots,p_{i},\cdots\right) to the WF Ψ(𝐱,𝐩){\Psi\left({{\mathbf{x}},\mathbf{p}}\right)}:

EVMC(𝐩)=𝑑𝐱eL(𝐱,𝐩)π(𝐱,𝐩)E0.{E_{{\text{VMC}}}}\left(\mathbf{p}\right)=\int{d{\mathbf{x}}{e_{L}}\left({{\mathbf{x}},\mathbf{p}}\right)\pi\left({{\mathbf{x}},\mathbf{p}}\right)}\geqslant E_{0}. (22)

This framework is called Variational Monte Carlo (VMC). VMC is conceptually very simple, but the optimization of a many-body WF is a complex numerical task due to several reasons such as the presence of several local minima in the energy landscape and statistical errors associated with MCMC evaluation. Great improvements in this field have been achieved in the QMC community. For instance, in TurboRVB, the automatic differentiation [Sorella2010_AAD] allows the stable and efficient calculation of the energy derivatives with variational parameters and offer an efficient optimization method, the so-called stochastic reconfiguration method (also known as natural gradient optimization) [1998SOR, 2007SOR].

A.2 Wavefunction Ansatz

TurboRVB employs the Jastrow antisymmetrized geminal power (JAGP) [2003CAS] ansatz. The ansatz is composed of a Jastrow and an antisymmetric part (Ψ=JΨAGP\Psi=J\cdot{\Psi_{{\text{AGP}}}}). The antisymmetric part reads:

ΨAGP(𝐫1,,𝐫N)=det[g(𝐫1,𝐫1)g(𝐫2,𝐫2)g(𝐫N/2,𝐫N/2)],\displaystyle{\Psi_{{\text{AGP}}}}\left({{{\mathbf{r}}_{1}},\ldots,{{\mathbf{r}}_{N}}}\right)={\det}\left[{g\left({{\mathbf{r}}_{1}^{\uparrow},{\mathbf{r}}_{1}^{\downarrow}}\right)g\left({{\mathbf{r}}_{2}^{\uparrow},{\mathbf{r}}_{2}^{\downarrow}}\right)\cdots g\left({{\mathbf{r}}_{N/2}^{\uparrow},{\mathbf{r}}_{N/2}^{\downarrow}}\right)}\right], (23)

where g(𝐫,𝐫)g\left({{\mathbf{r}}^{\uparrow},{\mathbf{r}}^{\downarrow}}\right) is called the pairing function. Although, for the sake of clarity, an unpolarized system with an even number NN of electrons is considered here, it can be generalized to describe a polarized system with unpaired electrons [2020NAK2]. The pairing function, g(𝐫i,𝐫j)g\left({{{\mathbf{r}}_{i}^{\uparrow}},{{\mathbf{r}}_{j}^{\downarrow}}}\right), is expanded over Gaussian-type atomic orbitals (GTOs):

g(𝐫i,𝐫j)=μνPμνψμ(𝐫i)ψν(𝐫j),g\left({{{\mathbf{r}}_{i}^{\uparrow}},{{\mathbf{r}}_{j}^{\downarrow}}}\right)=\sum\limits_{\mu\nu}{P_{\mu\nu}^{\uparrow\downarrow}{\psi_{\mu}}\left({{{\mathbf{r}}_{i}^{\uparrow}}}\right){\psi_{\nu}}\left({{{\mathbf{r}}_{j}^{\downarrow}}}\right)}, (24)

where ψ(𝐫)\psi(\mathbf{r}) are primitive or contracted GTOs. Notice that in Eq. 24, the expansion coefficients of the pairing functions are labeled with PμνστP_{\mu\nu}^{\sigma\tau}, to highlight their similarity with the density matrix of Eq. 10. Depending on the symmetry of Pμν{P_{\mu\nu}^{\uparrow\downarrow}}, the pairing function is dubbed as AGP with spin-singlet pairs (AGPs), AGP with spin-singlet pairs for triplet correlations (AGPu), and the combination of AGPs and AGPu [2020NAK2], which correspond to a symmetric, skew-symmetric, and general Pμν{P_{\mu\nu}^{\uparrow\downarrow}}, respectively 111Any square matrix can be uniquely decomposed into a sum of symmetric and skew-symmetric matrices. The basis sets employed in this study are described in the main text. When the JAGP is expanded over pp = Nel/2N_{\rm el}/2 molecular orbitals (i.e., only the occupied molecular orbitals are considered), the JAGP coincides with the Jastrow-Slater determinant (JSD) ansatz [2020NAK2]. In this way, we composed the JSD anstaz, where the SD part was taken from a DFT calculation. Indeed, the coefficients in the pairing functions (i.e., the variational parameters in the SD part) are obtained by CP2K and fixed during the VMC optimization step. Only the coefficients in the Jastrow factor are optimized at the VMC level.

The Jastrow term in this study is composed of two-body and three-body factors (J=J2J3J={J_{2}}{J_{3}}[2020NAK2]. We employed a spin-independent Jastrow parametrization in this study. The two-body factor is used to satisfy the electron–electron cusp conditions, and the three-/four-body factor is adopted to take the further electron–electron correlation into consideration. The one-body Jastrow factor, which is added to satisfy the electron–nucleus cusp condition, is not used in this study because effective core potentials remove the Coulomb singularity. The two-body Jastrow factor is defined by a long-range function as:

J2(𝐫1,,𝐫N)=exp(i<jv(|𝐫i𝐫j|)),{J_{2}}\left({{{\mathbf{r}}_{1}},\ldots,{{\mathbf{r}}_{N}}}\right)=\exp\left({\sum\limits_{i<j}{v\left({\left|{{{\mathbf{r}}_{i}}-{{\mathbf{r}}_{j}}}\right|}\right)}}\right), (25)

where v(r)v\left(r\right) is a pade-type function:

v(r)=12r(1+Fr)1v\left(r\right)=\frac{1}{2}r\cdot{\left({1+F\cdot r}\right)^{-1}} (26)

and FF is a variational parameter. The three-body Jastrow factors are:

J3(𝐫1,,𝐫N)=exp(i<jΞJas(𝐫i,𝐫j)),{J_{3}}\left({{{\mathbf{r}}_{1}},\ldots,{{\mathbf{r}}_{N}}}\right)=\exp\left({\sum\limits_{i<j}{{\Xi_{{\text{Jas}}}}\left({{{\mathbf{r}}_{i}},{{\mathbf{r}}_{j}}}\right)}}\right), (27)

and

ΞJas(𝐫i,𝐫j)=μ,νξμ,νψμJas(𝐫i)ψνJas(𝐫j),{\Xi_{{\text{Jas}}}}\left({{{\mathbf{r}}_{i}},{{\mathbf{r}}_{j}}}\right)=\sum\limits_{\mu,\nu}{\xi_{\mu,\nu}\psi_{\mu}^{{\text{Jas}}}\left({{{\mathbf{r}}_{i}}}\right)\psi_{\nu}^{{\text{Jas}}}\left({{{\mathbf{r}}_{j}}}\right)}, (28)

where ξμ,ν\xi_{\mu,\nu} are the variational parameters, and the indices μ\mu and ν\nu refer to different orbitals. The basis sets employed in this study are described in the main text. The Jastrow factors are optimized using the stochastic reconfiguration method.

A.3 Computing derivatives

The elements of the right-hand side tensor 𝐁\mathbf{B} of Eq. 13 can be efficiently computed from a VMC sampling:

Bμiσ=2EL(𝐱)[(Oμiσ(𝐱)O¯μiσ)],\displaystyle B_{\mu i}^{\sigma}=-2\left\langle{E_{\rm L}(\mathbf{x})\left[(O_{\mu i}^{\sigma}(\mathbf{x})-\bar{O}_{\mu i}^{\sigma})\right]}\right\rangle\;, (29)

where we made apparent the dependence of the local operators on the total electronic coordinate 𝐱\mathbf{x}, sampled by VMC, to distinguish them from constant values. In Eq. 29, Oμiσ(𝐱)=lnΨ(𝐱)/CμiσO_{\mu i}^{\sigma}(\mathbf{x})=\partial\ln\Psi(\mathbf{x})/\partial C_{\mu i}^{\sigma}, and O¯μiσOμiσ(𝐱)\bar{O}_{\mu i}^{\sigma}\sim\braket{O_{\mu i}^{\sigma}(\mathbf{x})}. We note that Oμiσ(𝐱){O}_{\mu i}^{\sigma}(\mathbf{x}) can be efficiently computed in a VMC calculation using the adjoint algorithmic differentiation [Sorella2010_AAD], and the divergences of the generalized forces can be cured by reweighting methods [Attaccalite2008_AS, Pathak2008_PW_reweight]. It is extremely important that the variational force is evaluated in a covariance form of random variables to reduce its fluctuations [2005SOR, 2005UMR]. In practice, TurboRVB always computes the derivatives of EE with respect to the AGP matrix elements PμνP_{\mu\nu}^{\uparrow\downarrow} in Eq. 24, E/Pμν\partial E/\partial P_{\mu\nu}^{\uparrow\downarrow}. In other words, this is the AO representation of the pairing function. Thus, the chain rule should be applied to obtain E/Cμiσ\partial E/\partial C_{\mu i}^{\sigma} from E/Pμν\partial E/\partial P_{\mu\nu}^{\uparrow\downarrow}. Molecular spin orbitals are defined as

ϕkσ(𝐫)=μCμkσψμ(𝐫).\phi_{k}^{\sigma}(\mathbf{r})=\sum_{\mu}C_{\mu k}^{\sigma}\,\psi_{\mu}(\mathbf{r})\;. (30)

Enforcing JSD \leftrightarrow JAGP equivalence (idempotent geminal) via the occupied-orbital expansion,

g(𝐫i,𝐫j)=koccϕk(𝐫i)ϕk(𝐫j),g(\mathbf{r}_{i}^{\uparrow},\mathbf{r}_{j}^{\downarrow})=\sum_{k\in\mathrm{occ}}\phi_{k}^{\uparrow}(\mathbf{r}_{i}^{\uparrow})\,\phi_{k}^{\downarrow}(\mathbf{r}_{j}^{\downarrow})\;, (31)

implies the geminal coefficients

Pμν=koccCμkCνk.P_{\mu\nu}^{\uparrow\downarrow}=\sum_{k\in\mathrm{occ}}C_{\mu k}^{\uparrow}\,C_{\nu k}^{\downarrow}\;. (32)

Define B~μν:=E/Pμν\tilde{B}_{\mu\nu}:=\partial E/\partial P_{\mu\nu}^{\uparrow\downarrow} (note: B~μν=B~νμ\tilde{B}_{\mu\nu}=\tilde{B}_{\nu\mu} for AGPs). By the chain rule,

Bμiσ:=ECμiσ=λνEPλνPλνCμiσ=2νB~μνCνiσ.B_{\mu i}^{\sigma}:=\frac{\partial E}{\partial C_{\mu i}^{\sigma}}=\sum_{\lambda\nu}\frac{\partial E}{\partial P_{\lambda\nu}^{{\uparrow\downarrow}}}\frac{\partial P_{\lambda\nu}^{{\uparrow\downarrow}}}{\partial C_{\mu i}^{\sigma}}=2\sum_{\nu}\tilde{B}_{\mu\nu}C_{\nu i}^{\sigma}. (33)

The derivatives of the energy with respect to the atomic positions, i.e., the nuclear forces, with the so-called space warp coordinate transformation (SWCT) [Umrigar1989_SWCT] are computed as [Sorella2010_AAD] :

𝐅αVMC=dEd𝐑α=\displaystyle{\bf F}_{\alpha}^{\rm VMC}=-\frac{\text{d}E}{\text{d}{\bf R}_{\alpha}}= \displaystyle- EL𝐑α+iω(𝐑α,𝐫i)EL𝐫i\displaystyle\Braket{\frac{\partial E_{\rm L}}{\partial{\bf R}_{\alpha}}+\sum\limits_{i}{{\omega}\left({\mathbf{R}}_{\alpha},{{{\mathbf{r}}_{i}}}\right)\frac{\partial E_{\rm L}}{{\partial{{\mathbf{r}}_{i}}}}}} (34)
\displaystyle- 2(ELE)(logΨ𝐑α+i[ω(𝐑α,𝐫i)logΨ𝐫i+12ω(𝐑α,𝐫i)𝐫i]),\displaystyle 2\Braket{(E_{\rm L}-E)\left(\frac{\partial\log\Psi}{\partial{\bf R}_{\alpha}}+\sum\limits_{i}{\left[{{\omega}\left({\mathbf{R}}_{\alpha},{{{\mathbf{r}}_{i}}}\right)\frac{\partial\log\Psi}{{\partial{{\mathbf{r}}_{i}}}}+\frac{1}{2}\frac{\partial{\omega}\left({{{\mathbf{R}}_{\alpha},{\mathbf{r}}_{i}}}\right)}{{\partial{{\mathbf{r}}_{i}}}}}\right]}\right)}, (35)

where

ωα(𝐫)=κ(|𝐫𝐑α|)β=1Mκ(|𝐫𝐑β|),{\omega_{\alpha}}\left({\mathbf{r}}\right)=\frac{{\kappa\left({\left|{{\mathbf{r}}-{{\mathbf{R}}_{\alpha}}}\right|}\right)}}{{\sum\nolimits_{{\beta}=1}^{M}{\kappa\left({\left|{{\mathbf{r}}-{{\mathbf{R}}_{\beta}}}\right|}\right)}}}, (36)

with κ(r)=1/r4\kappa\left(r\right)=1/{r^{4}} [Umrigar1989_SWCT]. In order to evaluate these differential expressions, 6NeN_{e} + 6NaN_{\rm a} components have to be evaluated, namely, EL𝐫i\left\langle{\cfrac{\partial{E_{L}}}{{\partial{{\mathbf{r}}_{i}}}}}\right\rangle, logΨ𝐫i\left\langle{\cfrac{\partial\log\Psi}{{\partial{{\mathbf{r}}_{i}}}}}\right\rangle, EL𝐑α\left\langle{\cfrac{\partial{E_{L}}}{{\partial{{\mathbf{R}}_{\alpha}}}}}\right\rangle, logΨ𝐑α\left\langle{\cfrac{\partial\log\Psi}{{\partial{{\mathbf{R}}_{\alpha}}}}}\right\rangle [Sorella2010_AAD]. These values are efficiently computed in TurboRVB,by using the automatic differentiation technique.

When computing the above derivatives, the so-called reweighting method is essential to mitigate the heavy-tailed distribution problem mentioned in the introduction of the main text. In the MCMC, the probability distribution can also differ from π(𝐱)\pi({\mathbf{x}}). Indeed, one can use an arbitrary probability distribution function π(𝐱)=ΨG2(𝐱)/𝑑𝐱ΨG2(𝐱)\pi^{\prime}({\mathbf{x}})=\Psi_{\text{G}}^{2}({\mathbf{x}})/\int{d{\mathbf{x}}\Psi_{\text{G}}^{2}({\mathbf{x}})}, and estimate a generic local observable O(𝐱)O({\mathbf{x}}) either by using O¯MCMC=O(𝐱)π(𝐱){{\bar{O}}_{{\text{MCMC}}}}={\left\langle{O\left({\mathbf{x}}\right)}\right\rangle_{\pi\left({\mathbf{x}}\right)}} or:

O¯MCMC=O(𝐱)𝒲(𝐱)π(𝐱)𝒲(𝐱)π(𝐱)i=1MO(𝐱i)𝒲(𝐱i)i=1M𝒲(𝐱i),{{\bar{O}}_{{\text{MCMC}}}}=\frac{{{{\left\langle{O\left({{\mathbf{x^{\prime}}}}\right)\mathcal{W}\left({{\mathbf{x^{\prime}}}}\right)}\right\rangle}_{\pi^{\prime}\left({{\mathbf{x^{\prime}}}}\right)}}}}{{{{\left\langle{\mathcal{W}\left({{\mathbf{x^{\prime}}}}\right)}\right\rangle}_{\pi^{\prime}\left({\mathbf{x}}\right)}}}}\approx\frac{{\sum\nolimits_{i=1}^{M^{\prime}}{O\left({{{{\mathbf{x^{\prime}}}}_{i}}}\right)\mathcal{W}\left({{{{\mathbf{x}}}^{\prime}_{i}}}\right)}}}{{\sum\nolimits_{i=1}^{M^{\prime}}{\mathcal{W}\left({{{{\mathbf{x}}}^{\prime}_{i}}}\right)}}}, (37)

where 𝒲(𝐱)=|Ψ(𝐱)/ΨG(𝐱)|2\mathcal{W}\left({{{{\mathbf{x}}}}}\right)=|{\Psi}\left({\mathbf{x}}\right)/\Psi_{\text{G}}\left({\mathbf{x}}\right)|^{2}, and the points 𝐱i\mathbf{x}^{\prime}_{i} are distributed according to π\pi^{\prime}. This reweighting scheme is very important when evaluating energy gradients with finite variances. TurboRVB implements the so-called Attaccalite and Sorella (AS) reweighting scheme [Attaccalite2008_AS] to suppress the divergences of derivatives in the vicinity of nodal surface.

A.4 Error propagation

Important for error propagation is the fact that the elements BμiσB_{\mu i}^{\sigma} come with an error bar δBμiσ\delta B_{\mu i}^{\sigma}, whereby these elements are also correlated with each other. In the following, it is most convenient to consider the tensor 𝐁\mathbf{B} as a vector with a single compound index Bμiσ=B(μiσ)=BIB_{\mu i}^{\sigma}=B_{(\mu i\sigma)}=B_{I}, and the elements of 𝐀\mathbf{A} in Eq. 13, Aμi,νjστ=A(μiσ),(νjτ)=AI,JA_{\mu i,\nu j}^{\sigma\tau}=A_{(\mu i\sigma),(\nu j\tau)}=A_{I,J}, as forming a matrix of dimension (NAO×NMO×Nspin)×(NAO×NMO×Nspin)(N_{\text{AO}}\times N_{\text{MO}}\times N_{\text{spin}})\times(N_{\text{AO}}\times N_{\text{MO}}\times N_{\text{spin}}). Let’s denote the covariance matrix (tensor) of XX as Cov(𝐗)\operatorname{Cov}(\mathbf{X}). With the element-wise notation, Eq. 13 is written as:

νjτAμi,νjστZνjτ\displaystyle\sum_{\nu j\tau}A_{\mu i,\nu j}^{\sigma\tau}Z_{\nu j}^{\tau} =Bμiσ\displaystyle=-B_{\mu i}^{\sigma} (38)
μiσ(A1)λk,μiησνjτAμi,νjστZνjτ\displaystyle\sum_{\mu i\sigma}(A^{-1})_{\lambda k,\mu i}^{\eta\sigma}\sum_{\nu j\tau}A_{\mu i,\nu j}^{\sigma\tau}Z_{\nu j}^{\tau} =μiσ(A1)λk,μiησBμiσ\displaystyle=-\sum_{\mu i\sigma}(A^{-1})_{\lambda k,\mu i}^{\eta\sigma}B_{\mu i}^{\sigma} (39)
μiσνjτ(A1)λk,μiησAμi,νjστZνjτ\displaystyle\sum_{\mu i\sigma}\sum_{\nu j\tau}(A^{-1})_{\lambda k,\mu i}^{\eta\sigma}A_{\mu i,\nu j}^{\sigma\tau}Z_{\nu j}^{\tau} =μiσ(A1)λk,μiησBμiσ\displaystyle=-\sum_{\mu i\sigma}(A^{-1})_{\lambda k,\mu i}^{\eta\sigma}B_{\mu i}^{\sigma} (40)
νjτδνλδjkδητZνjτ\displaystyle\sum_{\nu j\tau}\delta_{\nu\lambda}\delta_{jk}\delta_{\eta\tau}Z_{\nu j}^{\tau} =μiσ(A1)λk,μiησBμiσ\displaystyle=-\sum_{\mu i\sigma}(A^{-1})_{\lambda k,\mu i}^{\eta\sigma}B_{\mu i}^{\sigma} (41)
Zλkη\displaystyle Z_{\lambda k}^{\eta} =μiσ(A1)λk,μiησBμiσ.\displaystyle=-\sum_{\mu i\sigma}(A^{-1})_{\lambda k,\mu i}^{\eta\sigma}B_{\mu i}^{\sigma}\;. (42)

In other words, we have

Zμiσ\displaystyle Z_{\mu i}^{\sigma} =νjτGμi,νjστBνjτ\displaystyle=-\sum_{\nu j\tau}G_{\mu i,\nu j}^{\sigma\tau}B_{\nu j}^{\tau} (43)

where GG is an analogue of the matrix inverse:

μiσGλk,μiησAμi,νjστ=δνλδjkδητ,μiσAλk,μiησGμi,νjστ=δνλδjkδητ\displaystyle\sum_{\mu i\sigma}G_{\lambda k,\mu i}^{\eta\sigma}A_{\mu i,\nu j}^{\sigma\tau}=\delta_{\nu\lambda}\delta_{jk}\delta_{\eta\tau}\;,\quad\sum_{\mu i\sigma}A_{\lambda k,\mu i}^{\eta\sigma}G_{\mu i,\nu j}^{\sigma\tau}=\delta_{\nu\lambda}\delta_{jk}\delta_{\eta\tau} (44)

Therefore, Cov(𝐙)\operatorname{Cov}(\mathbf{Z}) reads:

Cov(𝐙)μi,νjστ=λkη,δlζGμi,λkσηCov(𝐁)λk,δlηζGδl,νjζτ.\mathrm{Cov}(\mathbf{Z})_{\mu i,\nu j}^{\sigma\tau}=\sum_{\lambda k\eta,\delta l\zeta}G_{\mu i,\lambda k}^{\sigma\eta}\mathrm{Cov}(\mathbf{B})_{\lambda k,\delta l}^{\eta\zeta}G_{\delta l,\nu j}^{\zeta\tau}. (45)

These Lagrange multipliers enter the formula for the force only linearly and they get contracted with deterministic quantities only; see Eq. 18. So, once we obtain Cov(𝐙)\operatorname{Cov}(\mathbf{Z}), it is trivial to propagate the error to the forces. Indeed, in Eq. 18, the force correction term, FcF^{\rm{c}}, can be written as

Fc=const.+μiσRμiσZμiσ,F^{\rm{c}}=const.+\sum_{\mu i\sigma}R_{\mu i}^{\sigma}Z_{\mu i}^{\sigma}\;, (46)

where const.const. aggregates all terms that are not multiplied by ZμiσZ_{\mu i}^{\sigma}, and

Rμiσ=εiσνSμνRCνiσ+νFμνσRCνiσjkλντIμi,jkστCλjτSλνRCνkτ,R_{\mu i}^{\sigma}=\varepsilon_{i\sigma}\sum_{\nu}\frac{\partial S_{\mu\nu}}{\partial{R}}C_{\nu i\sigma}+\sum_{\nu}\frac{\partial F_{\mu\nu\sigma}}{\partial{R}}C_{\nu i\sigma}-\sum_{jk}\sum_{\lambda\nu\tau}I_{\mu i,jk}^{\sigma\tau}C_{\lambda j}^{\tau}\frac{\partial S_{\lambda\nu}}{\partial{R}}C_{\nu k}^{\tau}\;, (47)

with Iμi,jkστI_{\mu i,jk}^{\sigma\tau} containing the two-electron integrals entering in the definition of WijσW_{ij}^{\sigma} through the Fock matrix derivative with respect to the MO coefficients, i.e. Eq. 15. Thus, the variance of FcF^{\rm{c}} is

Var(Fc)=μiσνjτRμiσCov(𝐙)μi,νjστRνjτ.\mathrm{Var}(F^{\rm{c}})=\sum_{\mu i\sigma}\sum_{\nu j\tau}R_{\mu i}^{\sigma}\mathrm{Cov}(\mathbf{Z})_{\mu i,\nu j}^{\sigma\tau}R_{\nu j}^{\tau}\;. (48)

By plugging Cov(𝐁)\operatorname{Cov}(\mathbf{B}) into the above equation, we obtain:

Var(Fc)=μiσλkηδlζνjτRμiσGμi,λkσηCov(𝐁)λk,δlηζGδl,νjζτRνjτ.\mathrm{Var}(F^{\rm{c}})=\sum_{\mu i\sigma}\sum_{\lambda k\eta}\sum_{\delta l\zeta}\sum_{\nu j\tau}R_{\mu i}^{\sigma}G_{\mu i,\lambda k}^{\sigma\eta}\mathrm{Cov}(\mathbf{B})_{\lambda k,\delta l}^{\eta\zeta}G_{\delta l,\nu j}^{\zeta\tau}R_{\nu j}^{\tau}\;. (49)

Dropping the element-wise notation, we get

Var(Fc)=RT𝐆Cov(𝐁)𝐆TR.\mathrm{Var}(F^{\rm{c}})=\vec{R}^{\text{T}}\mathbf{G}\mathrm{Cov}(\mathbf{B})\mathbf{G}^{\text{T}}\vec{R}\;. (50)

The computation of Eq. 50 involves the contraction of matrices of dimension (NAO×NMO×Nspin)×(NAO×NMO×Nspin)(N_{\text{AO}}\times N_{\text{MO}}\times N_{\text{spin}})\times(N_{\text{AO}}\times N_{\text{MO}}\times N_{\text{spin}}), which results in an overall sixth power with respect to system size. This will limit the maximum system size for which we will be able to compute the error estimation on the forces. The computation of the 𝐆\mathbf{G} tensor itself also scales with the sixth power in system size. A possible way to reduce the computational cost is to perform a low-rank decomposition of the 𝐁\mathbf{B} tensor:

𝐁=𝐛𝐛T\mathbf{B}=\mathbf{b}\mathbf{b}^{\text{T}} (51)

where 𝐛(NAO×NMO×Nspin)×Nb\mathbf{b}\in\mathcal{R}^{(N_{\text{AO}}\times N_{\text{MO}}\times N_{\text{spin}})\times N_{\text{b}}}, where NbN_{\text{b}} is very small. Then, for each of the NbN_{\text{b}} vectors, 𝐛k\mathbf{b}_{k}, we have that 𝐛kT𝐆\mathbf{b}_{k}^{\text{T}}\mathbf{G} is the same as solving 𝐀𝐙=𝐁\mathbf{A}\mathbf{Z}=\mathbf{B}, and we can then use a slightly modified version of the force evaluation subroutine to compute 𝐙R\mathbf{Z}\vec{R}.

A.5 Computational details

The DFT calculations (including the LR calculations) and the VMC calculations in this work were performed on cluster machines operated by the National Institute for Materials Science (NIMS), on the Genkai supercomputer at Kyushu University, and on the Fugaku supercomputer at RIKEN. For the VMC optimization for Cl2 dimer and cBN test cases reported in Section III, we adjusted the number of independent MCMC chains and the per-chain MCMC sample count so that the error bar of the per-iteration VMC energy estimate was 5.0 mHa. The wavefunction optimization was then run for 300 iterations. As described in the main text, only the Jastrow factor was optimized variationally. Denoting by f\vec{f} the vector of the parameter gradients {,Epi,}\equiv\{\cdots,-\cfrac{\partial E}{\partial p_{i}}\cdots,\}, where pip_{i} are the variational parameters, we monitored devmax maxi(|fi|/|stdfi|)\equiv\max_{i}\,\bigl(|f_{i}|/\lvert\mathrm{std}\,f_{i}\rvert\bigr) and verified for all parameters converged in a statistical sense (devmax 4\leq 4). The learning-rate and regularization parameters were set to tpar=0.35 and parr=0.001, respectively. See Ref. 2020NAK2 for the implementation detail in TurboRVB. After optimization, we performed fresh MCMC simulations to compute the total energy, atomic forces, and the parameter gradients. The energies/forces and parameter gradients were evaluated in separate MCMC runs. For the energy and forces runs, the number of MCMC steps was adjusted so that the error bar of the energy estimate was 0.1 mHa; the resulting error bars on the force components were of comparable magnitude. For the parameter-gradients runs, after equilibration, the total pre-blocking MCMC sample count was set to Nsample=nchain×nsamples_per_chain=120×30,000=3,600,000.N_{\mathrm{sample}}=n_{\mathrm{chain}}\times n_{\mathrm{samples\_per\_chain}}=120\times 30{,}000=3{,}600{,}000. For the error bar estimations of the force corrections, we set the number of blocks to 50 per chain. For ethanol, malonaldehyde, and benzene calculations reported in Section IV, we adjusted the MCMC sample count such that the error bar of the per-iteration VMC energy estimate was 5.0 mHa and performed 100 iterations of VMC optimization. As in Section III, only the Jastrow factor was optimized and devmax was confirmed for all parameters. We used tpar=0.35 and parr=0.001. After optimization, energies, forces, and the parameter gradients were recomputed by MCMC in separate runs. For the energy and force runs, the MCMC steps was tuned so that the error bar of the energy estimate was 0.1 mHa, leading to force uncertainties of a similar magnitude. For the parameter-gradient runs, the total pre-blocking MCMC steps was set to Nsample=nchain×nsamples_per_chain=128×50,000=6,400,000.N_{\mathrm{sample}}=n_{\mathrm{chain}}\times n_{\mathrm{samples\_per\_chain}}=128\times 50{,}000=6{,}400{,}000. For the calculations in Section IV, we did not evaluate the error bar of the force-correction term, because the reported RMSE values were computed from the mean values only.

Appendix B Derivation of Lagrange multipliers equations

To arrive at the algebraic equations that determine the Lagrange multipliers, we need to explicitly evaluate the derivatives of the VMC Lagrangian with respect to the MO coefficients and project them onto the occupied and virtual orbital manifolds. We start by computing the derivatives appearing in Eq. 11, which we repeat here for convenience:

LVMCCμiσ=EVMCCμiσ+λkτZλkτνCμiσ(FλντCνkτSλνCνkτεkτ)klτWklτSklτCμiσ.\displaystyle\frac{\partial L_{\rm VMC}}{\partial C_{\mu i}^{\sigma}}=\frac{\partial E_{\rm VMC}}{\partial C_{\mu i}^{\sigma}}+\sum_{\lambda k\tau}Z_{\lambda k}^{\tau}\sum_{\nu}\frac{\partial}{\partial C_{\mu i}^{\sigma}}(F_{\lambda\nu}^{\tau}C_{\nu k}^{\tau}-S_{\lambda\nu}C_{\nu k}^{\tau}\varepsilon_{k}^{\tau})-\sum_{kl\tau}W_{kl}^{\tau}\frac{\partial S_{kl}^{\tau}}{\partial C_{\mu i}^{\sigma}}\,. (52)

The first term on the right-hand side is the derivative of the VMC energy with respect to the MO coefficients, which we simply lasbel as BμiσB_{\mu i}^{\sigma} as in the main text. How these derivatives are obtained from the VMC calculation is discussed in Appendix A.3.

The second term in Eq. 52 corresponds to the derivative of the Roothaan-Hall equations, reading

λkτZλkτνCμiσ(FλντCνkτSλνCνkτεkτ).\displaystyle\sum_{\lambda k\tau}Z_{\lambda k}^{\tau}\sum_{\nu}\frac{\partial}{\partial C_{\mu i}^{\sigma}}(F_{\lambda\nu}^{\tau}C_{\nu k}^{\tau}-S_{\lambda\nu}C_{\nu k}^{\tau}\varepsilon_{k}^{\tau})\,. (53)

It is more convenient to work out the action of the derivative operator on the Fock and overlap components separately. By the chain rule, the first part is given by

CμiσFλντCνkτ\displaystyle\frac{\partial}{\partial C_{\mu i}^{\sigma}}F_{\lambda\nu}^{\tau}C_{\nu k}^{\tau} =FλντCμiσCνkτ+FλντCνkτCμiσ,\displaystyle=\frac{\partial F_{\lambda\nu}^{\tau}}{\partial C_{\mu i}^{\sigma}}C_{\nu k}^{\tau}+F_{\lambda\nu}^{\tau}\frac{\partial C_{\nu k}^{\tau}}{\partial C_{\mu i}^{\sigma}}\,, (54)

where the derivative of the Fock matrix with respect to the MO coefficients reads

FλντCμiσ\displaystyle\frac{\partial F_{\lambda\nu}^{\tau}}{\partial C_{\mu i}^{\sigma}} =Cμiσ(hλν+γδηPγδη[(λν|γδ)cxδτη(λγ|νδ)]+Vλνxc,τ)\displaystyle=\frac{\partial}{\partial C_{\mu i}^{\sigma}}\left(h_{\lambda\nu}+\sum_{\gamma\delta\eta}P_{\gamma\delta}^{\eta}[(\lambda\nu|\gamma\delta)-c_{x}\delta_{\tau\eta}(\lambda\gamma|\nu\delta)]+V_{\lambda\nu}^{\text{xc},\tau}\right) (55)
=γδηPγδηCμiσ[(λν|γδ)cxδτη(λγ|νδ)]+Vλνxc,τCμiσ\displaystyle=\sum_{\gamma\delta\eta}\frac{\partial P_{\gamma\delta}^{\eta}}{\partial C_{\mu i}^{\sigma}}[(\lambda\nu|\gamma\delta)-c_{x}\delta_{\tau\eta}(\lambda\gamma|\nu\delta)]+\frac{\partial V_{\lambda\nu}^{\text{xc},\tau}}{\partial C_{\mu i}^{\sigma}}
=γδη(δση[δμγCδiη+δμδCγiη])[(λν|γδ)cxδτη(λγ|νδ)]]+2fλντ,μiσxc\displaystyle=\sum_{\gamma\delta\eta}\left(\delta_{\sigma\eta}[\delta_{\mu\gamma}C_{\delta i}^{\eta}+\delta_{\mu\delta}C_{\gamma i}^{\eta}]\right)\big[(\lambda\nu|\gamma\delta)-c_{x}\delta_{\tau\eta}(\lambda\gamma|\nu\delta)]\big]+2f_{\lambda\nu\tau,\mu i\sigma}^{\rm xc}
=γδηδσηδμγCδiη(λν|γδ)+γδηδσηδμδCγiη(λν|γδ)\displaystyle=\sum_{\gamma\delta\eta}\delta_{\sigma\eta}\delta_{\mu\gamma}C_{\delta i}^{\eta}(\lambda\nu|\gamma\delta)+\sum_{\gamma\delta\eta}\delta_{\sigma\eta}\delta_{\mu\delta}C_{\gamma i}^{\eta}(\lambda\nu|\gamma\delta)
γδηcxδσηδτηδμγCδiη(λγ|νδ)γδηcxδσηδτηδμδCγiη(λγ|νδ)+2fλντ,μiσxc\displaystyle-\sum_{\gamma\delta\eta}c_{x}\delta_{\sigma\eta}\delta_{\tau\eta}\delta_{\mu\gamma}C_{\delta i}^{\eta}(\lambda\gamma|\nu\delta)-\sum_{\gamma\delta\eta}c_{x}\delta_{\sigma\eta}\delta_{\tau\eta}\delta_{\mu\delta}C_{\gamma i}^{\eta}(\lambda\gamma|\nu\delta)+2f_{\lambda\nu\tau,\mu i\sigma}^{\rm xc}
=δCδiσ(λν|μδ)+γCγiσ(λν|γμ)\displaystyle=\sum_{\delta}C_{\delta i}^{\sigma}(\lambda\nu|\mu\delta)+\sum_{\gamma}C_{\gamma i}^{\sigma}(\lambda\nu|\gamma\mu)
δcxδτσCδiσ(λμ|νδ)γcxδτσCγiσ(λγ|νμ)+2fλντ,μiσxc\displaystyle-\sum_{\delta}c_{x}\delta_{\tau\sigma}C_{\delta i}^{\sigma}(\lambda\mu|\nu\delta)-\sum_{\gamma}c_{x}\delta_{\tau\sigma}C_{\gamma i}^{\sigma}(\lambda\gamma|\nu\mu)+2f_{\lambda\nu\tau,\mu i\sigma}^{\rm xc}
=2(λν|μiσ)cxδτσ[(λμ|νiσ)+(λiσ|νμ)]+2fλντ,μiσxc.\displaystyle=2(\lambda\nu|\mu i_{\sigma})-c_{x}\delta_{\tau\sigma}[(\lambda\mu|\nu i_{\sigma})+(\lambda i_{\sigma}|\nu\mu)]+2f_{\lambda\nu\tau,\mu i\sigma}^{\rm xc}\;.

Here, we have introduced the exchange-correlation kernel fλντ,μiσxcf^{\text{xc}}_{\lambda\nu\tau,\mu i\sigma}, defined as

fλντ,μiσxc=δ2Excδρτ(𝐫)δρσ(𝐫)Ωλν(𝐫)Ωμiσ(𝐫)d𝐫d𝐫,\displaystyle f_{\lambda\nu\tau,\mu i\sigma}^{\rm xc}=\int\int\frac{\delta^{2}E^{\text{xc}}}{\delta\rho_{\tau}(\mathbf{r})\delta\rho_{\sigma}(\mathbf{r}^{\prime})}\Omega_{\lambda\nu}(\mathbf{r})\Omega_{\mu i_{\sigma}}(\mathbf{r}^{\prime})\text{d}\mathbf{r}\text{d}\mathbf{r}^{\prime}\;, (56)

where Ω\Omega denotes an orbital-pair function, Ωλν(𝐫)=ψλ(𝐫)ψν(𝐫)\Omega_{\lambda\nu}(\mathbf{r})=\psi_{\lambda}(\mathbf{r})\psi_{\nu}(\mathbf{r}) (for AOs) and Ωμiσ(𝐫)=ϕμσ(𝐫)ϕiσ(𝐫)\Omega_{\mu i_{\sigma}}(\mathbf{r})=\phi_{\mu\sigma}(\mathbf{r})\phi_{i\sigma}(\mathbf{r}) (for MOs). The second term of Eq. 54 is trivial:

FλντCνkτCμiσ=Fλντδτσδνμδik.F_{\lambda\nu}^{\tau}\frac{\partial C_{\nu k}^{\tau}}{\partial C_{\mu i}^{\sigma}}=F_{\lambda\nu}^{\tau}\delta_{\tau\sigma}\delta_{\nu\mu}\delta_{ik}\;. (57)

Inserting Eqs. 55 and 57 into Eq. 54 results in

FλντCμiσCνkτ+FλντCνkτCμiσ\displaystyle\frac{\partial F_{\lambda\nu}^{\tau}}{\partial C_{\mu i}^{\sigma}}C_{\nu k}^{\tau}+F_{\lambda\nu}^{\tau}\frac{\partial C_{\nu k}^{\tau}}{\partial C_{\mu i}^{\sigma}} =[2(λν|μiσ)cxδτσ[(λμ|νiσ)+(λiσ|νμ)]+2fλντ,μiσxc]Cνkτ\displaystyle=\Bigg[2(\lambda\nu|\mu i_{\sigma})-c_{x}\delta_{\tau\sigma}[(\lambda\mu|\nu i_{\sigma})+(\lambda i_{\sigma}|\nu\mu)]+2f_{\lambda\nu\tau,\mu i\sigma}^{\rm xc}\Bigg]C_{\nu k}^{\tau} (58)
+Fλντδτσδνμδik.\displaystyle+F_{\lambda\nu}^{\tau}\delta_{\tau\sigma}\delta_{\nu\mu}\delta_{ik}\;.

The second term of Eq. 53 is given by:

CμiσSλνCνkτεkτ\displaystyle\frac{\partial}{\partial C_{\mu i}^{\sigma}}S_{\lambda\nu}C_{\nu k}^{\tau}\varepsilon_{k}^{\tau} =SλνCνkτCμiσεkτ+SλνCνkτεkτCμiσ\displaystyle=S_{\lambda\nu}\frac{\partial C_{\nu k}^{\tau}}{\partial C_{\mu i}^{\sigma}}\varepsilon_{k}^{\tau}+S_{\lambda\nu}C_{\nu k}^{\tau}\frac{\partial\varepsilon_{k}^{\tau}}{\partial C_{\mu i}^{\sigma}} (59)
=δτσδμνδikSλνεkτ+SλνCνkτεkτCμiσ.\displaystyle=\delta_{\tau\sigma}\delta_{\mu\nu}\delta_{ik}S_{\lambda\nu}\varepsilon_{k}^{\tau}+S_{\lambda\nu}C_{\nu k}^{\tau}\frac{\partial\varepsilon_{k}^{\tau}}{\partial C_{\mu i}^{\sigma}}\;.

Plugging Eqs. 58 and 59 into Eq. 53 finally results in:

λkτZλkτν(FλντCνkτSλνCνkτεkτ)Cμiσ\displaystyle\sum_{\lambda k\tau}Z_{\lambda k}^{\tau}\sum_{\nu}\frac{\partial(F_{\lambda\nu}^{\tau}C_{\nu k}^{\tau}-S_{\lambda\nu}C_{\nu k}^{\tau}\varepsilon_{k}^{\tau})}{\partial C_{\mu i}^{\sigma}} =λkτZλkτν[Cνkτ(2(λν|μiσ)cxδτσ[(λμ|νiσ)+(λiσ|νμ)]+2fλντ,μiσxc)\displaystyle=\sum_{\lambda k\tau}Z_{\lambda k}^{\tau}\sum_{\nu}\Big[C_{\nu k}^{\tau}\left(2(\lambda\nu|\mu i_{\sigma})-c_{x}\delta_{\tau\sigma}[(\lambda\mu|\nu i_{\sigma})+(\lambda i_{\sigma}|\nu\mu)]+2f_{\lambda\nu\tau,\mu i\sigma}^{\rm xc}\right) (60)
+δτσδνμδikFλντδτσδμνδikSλνεkτSλνCνkτεkτCμiσ]\displaystyle+\delta_{\tau\sigma}\delta_{\nu\mu}\delta_{ik}F_{\lambda\nu}^{\tau}-\delta_{\tau\sigma}\delta_{\mu\nu}\delta_{ik}S_{\lambda\nu}\varepsilon_{k}^{\tau}-S_{\lambda\nu}C_{\nu k}^{\tau}\frac{\partial\varepsilon_{k}^{\tau}}{\partial C_{\mu i}^{\sigma}}\Big]
=λνkτZλkτCνkτ(2(λν|μiσ)cxδτσ[(λμ|νiσ)+(λiσ|νμ)]+2fλντ,μiσxc)\displaystyle=\sum_{\lambda\nu k\tau}Z_{\lambda k}^{\tau}C_{\nu k}^{\tau}\left(2(\lambda\nu|\mu i_{\sigma})-c_{x}\delta_{\tau\sigma}[(\lambda\mu|\nu i_{\sigma})+(\lambda i_{\sigma}|\nu\mu)]+2f_{\lambda\nu\tau,\mu i\sigma}^{\rm xc}\right)
+λZλiσ(FλμσSλμεiσ)kτλνZλkτSλνCνkτ=0εkτCμiσ\displaystyle+\sum_{\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\mu}^{\sigma}-S_{\lambda\mu}\varepsilon_{i}^{\sigma})-\sum_{k\tau}\underbrace{\sum_{\lambda\nu}Z_{\lambda k}^{\tau}S_{\lambda\nu}C_{\nu k}^{\tau}}_{=0}\frac{\partial\varepsilon_{k}^{\tau}}{\partial C_{\mu i}^{\sigma}}
=λντP~λντ(2(λν|μiσ)cxδτσ[(λμ|νiσ)+(λiσ|νμ)]+2fλντ,μiσxc)\displaystyle=\sum_{\lambda\nu\tau}\tilde{P}_{\lambda\nu}^{\tau}\left(2(\lambda\nu|\mu i_{\sigma})-c_{x}\delta_{\tau\sigma}[(\lambda\mu|\nu i_{\sigma})+(\lambda i_{\sigma}|\nu\mu)]+2f_{\lambda\nu\tau,\mu i\sigma}^{\rm xc}\right)
+λZλiσ(FλμσSλμεiσ)\displaystyle+\sum_{\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\mu}^{\sigma}-S_{\lambda\mu}\varepsilon_{i}^{\sigma})
=Hμiσ[𝐏~]+λZλiσ(FλμσSλμεiσ),\displaystyle=H_{\mu i}^{\sigma}\big[\tilde{\mathbf{P}}\big]+\sum_{\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\mu}^{\sigma}-S_{\lambda\mu}\varepsilon_{i}^{\sigma})\;,

where we have identified the intermediate tensor 𝐇[𝐏~]\mathbf{H}\big[\tilde{\mathbf{P}}\big] as the derivative of the Fock matrix with respect to the MO coefficients (see Eq. 15 in the main text), traced with the asymmetric response spin-density tensor 𝐏~\tilde{\mathbf{P}}. Notice that contrary to the main text, here we derive all the equations with the asymmetric response spin-density tensor, though, in practice, we use a symmetrized version as it is numerically more stable.

At last, the third term in Eq. 52 is

klτWklτSklτCμiσ\displaystyle\sum_{kl\tau}W_{kl}^{\tau}\frac{\partial S_{kl}^{\tau}}{\partial C_{\mu i}^{\sigma}} =klτWklτ(CμiσλνCλkτSλνCνlτ)\displaystyle=\sum_{kl\tau}W_{kl}^{\tau}\left(\frac{\partial}{\partial C_{\mu i}^{\sigma}}\sum_{\lambda\nu}C_{\lambda k}^{\tau}S_{\lambda\nu}C_{\nu l}^{\tau}\right) (61)
=klτWklτ(λνδλμδikδτσSλνCνlτ+λνδμνδilδτσSλνCλkτ)\displaystyle=\sum_{kl\tau}W_{kl}^{\tau}\left(\sum_{\lambda\nu}\delta_{\lambda\mu}\delta_{ik}\delta_{\tau\sigma}S_{\lambda\nu}C_{\nu l}^{\tau}+\sum_{\lambda\nu}\delta_{\mu\nu}\delta_{il}\delta_{\tau\sigma}S_{\lambda\nu}C_{\lambda k}^{\tau}\right)
=klτδikδτσWklτνSμνCνlτ+klτδilδτσWklτλSλμCλkτ\displaystyle=\sum_{kl\tau}\delta_{ik}\delta_{\tau\sigma}W_{kl}^{\tau}\sum_{\nu}S_{\mu\nu}C_{\nu l}^{\tau}+\sum_{kl\tau}\delta_{il}\delta_{\tau\sigma}W_{kl}^{\tau}\sum_{\lambda}S_{\lambda\mu}C_{\lambda k}^{\tau}
=lνWilσSμνCνlσ+kλWkiσSλνCλkσ\displaystyle=\sum_{l\nu}W_{il}^{\sigma}S_{\mu\nu}C_{\nu l}^{\sigma}+\sum_{k\lambda}W_{ki}^{\sigma}S_{\lambda\nu}C_{\lambda k}^{\sigma}
=2jνWijσSμνCνjσ,\displaystyle=2\sum_{j\nu}W_{ij}^{\sigma}S_{\mu\nu}C_{\nu j}^{\sigma}\;,

where in the last step we simply relabeled the indices and used the fact that both 𝐖\mathbf{W} and 𝐒\mathbf{S} are symmetric.

To determine the Lagrange multipliers, we equate Eq. 52 to zero and project it onto the occupied and virtual MO manifolds (see Eqs. 12b and 12a of the main text), solving for 𝐖\mathbf{W} and 𝐙\mathbf{Z}, respectively. Projection onto the virtual manifold results in a linear system of equations, known as the Z-vector equations, while 𝐖\mathbf{W} drops out. Let us start by explicitly writing out Eq. 12a:

μLVMCCμiσQμνσ=μBμiσQμνσterm I+μ[Hμiσ[𝐏~]+λZλiσ(FλμσSλμεiσ)]Qμνσterm IIμ2jλWijσSμλCλjσQμνσterm III=0,\displaystyle\sum_{\mu}\frac{\partial L_{\rm VMC}}{\partial C_{\mu i}^{\sigma}}Q_{\mu\nu}^{\sigma}=\underbrace{\sum_{\mu}B_{\mu i}^{\sigma}Q_{\mu\nu}^{\sigma}}_{\text{term I}}+\underbrace{\sum_{\mu}\left[H_{\mu i}^{\sigma}[\tilde{\mathbf{P}}]+\sum_{\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\mu}^{\sigma}-S_{\lambda\mu}\varepsilon_{i}^{\sigma})\right]Q_{\mu\nu}^{\sigma}}_{\text{term II}}-\underbrace{\sum_{\mu}2\sum_{j\lambda}W_{ij}^{\sigma}S_{\mu\lambda}C_{\lambda j}^{\sigma}Q_{\mu\nu}^{\sigma}}_{\text{term III}}=0\;, (62)

where we recall here the definition of QμνσQ_{\mu\nu}^{\sigma}:

Qμνσ=δμνλkCμkσCλkσSλν.Q_{\mu\nu}^{\sigma}=\delta_{\mu\nu}-\sum_{\lambda k}C_{\mu k}^{\sigma}C_{\lambda k}^{\sigma}S_{\lambda\nu}\;. (63)

Term I in Eq. 52 remains untouched. Term III vanishes:

μ2jλWijσSμλCλjσQμνσ\displaystyle\sum_{\mu}2\sum_{j\lambda}W_{ij}^{\sigma}S_{\mu\lambda}C_{\lambda j}^{\sigma}Q_{\mu\nu}^{\sigma} =2μjλWijσSμλCλjσδμν2μjλδkWijσSμλCλjσCμkσCδkσSδν\displaystyle=2\sum_{\mu j\lambda}W_{ij}^{\sigma}S_{\mu\lambda}C_{\lambda j}^{\sigma}\delta_{\mu\nu}-2\sum_{\mu j\lambda}\sum_{\delta k}W_{ij}^{\sigma}S_{\mu\lambda}C_{\lambda j}^{\sigma}C_{\mu k}^{\sigma}C_{\delta k}^{\sigma}S_{\delta\nu} (64)
=2jλWijσSνλCλjσ2jkδWijσSkjσ=δkjCδkσSδν\displaystyle=2\sum_{j\lambda}W_{ij}^{\sigma}S_{\nu\lambda}C_{\lambda j}^{\sigma}-2\sum_{jk\delta}W_{ij}^{\sigma}\underbrace{S_{kj}^{\sigma}}_{=\delta_{kj}}C_{\delta k}^{\sigma}S_{\delta\nu}
=2jλWijσSνλCλjσ2jδWijσSδνCδjσ\displaystyle=2\sum_{j\lambda}W_{ij}^{\sigma}S_{\nu\lambda}C_{\lambda j}^{\sigma}-2\sum_{j\delta}W_{ij}^{\sigma}S_{\delta\nu}C_{\delta j}^{\sigma}
=0,\displaystyle=0\;,

where we have used the fact that the overlap matrix is symmetric. Finally, we look at term II, which is the most complicated one:

μ[Hμiσ[𝐏~]+λZλiσ(FλμσSλμεiσ)]Qμνσ\displaystyle\sum_{\mu}\left[H_{\mu i}^{\sigma}\big[\tilde{\mathbf{P}}\big]+\sum_{\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\mu}^{\sigma}-S_{\lambda\mu}\varepsilon_{i}^{\sigma})\right]Q_{\mu\nu}^{\sigma} =μHμiσ[𝐏~]Qμνσterm II.A+μλZλiσ(FλμσSλμεiσ)Qμνσterm II.B.\displaystyle=\underbrace{\sum_{\mu}H_{\mu i}^{\sigma}\big[\tilde{\mathbf{P}}\big]Q_{\mu\nu}^{\sigma}}_{\text{term II.A}}+\underbrace{\sum_{\mu\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\mu}^{\sigma}-S_{\lambda\mu}\varepsilon_{i}^{\sigma})Q_{\mu\nu}^{\sigma}}_{\text{term II.B}}\;. (65)

Term II.A does not simplify, while term II.B simplifies owing to the Roothaan-Hall equation, which vanishes when the Lagrangian is stationary with respect to the MO coefficients:

μλZλiσ(FλμσSλμεiσ)Qμνσ\displaystyle\sum_{\mu\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\mu}^{\sigma}-S_{\lambda\mu}\varepsilon_{i}^{\sigma})Q_{\mu\nu}^{\sigma} =μλZλiσ(FλμσSλμεiσ)δμνμλZλiσ(FλμσSλμεiσ)δkCμkσCδkσSδν\displaystyle=\sum_{\mu\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\mu}^{\sigma}-S_{\lambda\mu}\varepsilon_{i}^{\sigma})\delta_{\mu\nu}-\sum_{\mu\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\mu}^{\sigma}-S_{\lambda\mu}\varepsilon_{i}^{\sigma})\sum_{\delta k}C_{\mu k}^{\sigma}C_{\delta k}^{\sigma}S_{\delta\nu} (66)
=λZλiσ(FλνσSλνεiσ)λδkZλiσμ(FλμσCμkσSλμCμkσεiσ)CδkσSδν\displaystyle=\sum_{\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\nu}^{\sigma}-S_{\lambda\nu}\varepsilon_{i}^{\sigma})-\sum_{\lambda\delta k}Z_{\lambda i}^{\sigma}\sum_{\mu}(F_{\lambda\mu}^{\sigma}C_{\mu k}^{\sigma}-S_{\lambda\mu}C_{\mu k}^{\sigma}\varepsilon_{i}^{\sigma})C_{\delta k}^{\sigma}S_{\delta\nu}
=λZλiσ(FλνσSλνεiσ)δkλμZλiσ(SλμCμkσεkσSλμCμkσεiσ)CδkσSδν\displaystyle=\sum_{\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\nu}^{\sigma}-S_{\lambda\nu}\varepsilon_{i}^{\sigma})-\sum_{\delta k}\sum_{\lambda\mu}Z_{\lambda i}^{\sigma}(S_{\lambda\mu}C_{\mu k}^{\sigma}\varepsilon_{k}^{\sigma}-S_{\lambda\mu}C_{\mu k}^{\sigma}\varepsilon_{i}^{\sigma})C_{\delta k}^{\sigma}S_{\delta\nu}
=λZλiσ(FλνσSλνεiσ)δk[λμZλiσSλμCμkσ=0]εkσCδkσSδν\displaystyle=\sum_{\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\nu}^{\sigma}-S_{\lambda\nu}\varepsilon_{i}^{\sigma})-\sum_{\delta k}\bigg[\underbrace{\sum_{\lambda\mu}Z_{\lambda i}^{\sigma}S_{\lambda\mu}C_{\mu k}^{\sigma}}_{=0}\bigg]\varepsilon_{k}^{\sigma}C_{\delta k}^{\sigma}S_{\delta\nu}
δk[λμZλiσSλμCμkσ=0]εiσCδkσSδν\displaystyle-\sum_{\delta k}\bigg[\underbrace{\sum_{\lambda\mu}Z_{\lambda i}^{\sigma}S_{\lambda\mu}C_{\mu k}^{\sigma}}_{=0}\bigg]\varepsilon_{i}^{\sigma}C_{\delta k}^{\sigma}S_{\delta\nu}
=λZλiσ(FλνσSλνεiσ).\displaystyle=\sum_{\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\nu}^{\sigma}-S_{\lambda\nu}\varepsilon_{i}^{\sigma})\;.

Putting together Eq. 65 and Eq. 66, results in

μHμiσ[𝐏~]Qμνσ+μλZλiσ(FλμσSλμεiσ)Qμνσ=μHμiσ[𝐏~]Qμνσ+λZλiσ(FλνσSλνεiσ).\displaystyle\sum_{\mu}H_{\mu i}^{\sigma}\big[\tilde{\mathbf{P}}\big]Q_{\mu\nu}^{\sigma}+\sum_{\mu\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\mu}^{\sigma}-S_{\lambda\mu}\varepsilon_{i}^{\sigma})Q_{\mu\nu}^{\sigma}=\sum_{\mu}H_{\mu i}^{\sigma}\big[\tilde{\mathbf{P}}\big]Q_{\mu\nu}^{\sigma}+\sum_{\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\nu}^{\sigma}-S_{\lambda\nu}\varepsilon_{i}^{\sigma})\;. (67)

Finally, inserting Eqs. 64 and 67 into Eq. 52 yields:

μLVMCCμiσQμνσ=μBμiσQμνσ+μ\displaystyle\sum_{\mu}\frac{\partial L_{\rm VMC}}{\partial C_{\mu i}^{\sigma}}Q_{\mu\nu}^{\sigma}=\sum_{\mu}B_{\mu i}^{\sigma}Q_{\mu\nu}^{\sigma}+\sum_{\mu} Hμiσ[𝐏~]Qμνσ+λZλiσ(FλνσSλνεiσ)=0\displaystyle H_{\mu i}^{\sigma}\big[\tilde{\mathbf{P}}\big]Q_{\mu\nu}^{\sigma}+\sum_{\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\nu}^{\sigma}-S_{\lambda\nu}\varepsilon_{i}^{\sigma})=0 (68)
\displaystyle\Longleftrightarrow
λZλiσ(FλνσSλνεiσ)+μ\displaystyle\sum_{\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\nu}^{\sigma}-S_{\lambda\nu}\varepsilon_{i}^{\sigma})+\sum_{\mu} Hμiσ[𝐏~]Qμνσ=μBμiσQμνσ,\displaystyle H_{\mu i}^{\sigma}\big[\tilde{\mathbf{P}}\big]Q_{\mu\nu}^{\sigma}=-\sum_{\mu}B_{\mu i}^{\sigma}Q_{\mu\nu}^{\sigma}\;,

which corresponds to Eq. 13 of the main text.

To determine the 𝐖\mathbf{W} Lagrange multipliers, we project the derivative of the Lagrangian onto the occupied orbitals and equate to zero, cf. Eq. 12b. This equation is identical to Eq. 62, the only difference is the projector:

μLVMCCμiσCμjσ=μBμiσCμjσ+μ[Hμiσ[𝐏~]+λZλiσ(FλμσSλμεiσ)]Cμjσμ2jλWijσSμλCλjσCμjσ=0.\displaystyle\sum_{\mu}\frac{\partial L_{\rm VMC}}{\partial C_{\mu i}^{\sigma}}C_{\mu j}^{\sigma}=\sum_{\mu}B_{\mu i}^{\sigma}C_{\mu j}^{\sigma}+\sum_{\mu}\left[H_{\mu i}^{\sigma}[\tilde{\mathbf{P}}]+\sum_{\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\mu}^{\sigma}-S_{\lambda\mu}\varepsilon_{i}^{\sigma})\right]C_{\mu j}^{\sigma}-\sum_{\mu}2\sum_{j\lambda}W_{ij}^{\sigma}S_{\mu\lambda}C_{\lambda j}^{\sigma}C_{\mu j}^{\sigma}=0\;. (69)

No simplifications are possible for the first term, while the third term is simply

μ2jλWijσSμλCλjσCμjσ=2kWikσSkjσ=2Wijσ.\sum_{\mu}2\sum_{j\lambda}W_{ij}^{\sigma}S_{\mu\lambda}C_{\lambda j}^{\sigma}C_{\mu j}^{\sigma}=2\sum_{k}W_{ik}^{\sigma}S_{kj}^{\sigma}=2W_{ij}^{\sigma}\;. (70)

The middle term simplifies to

μ[Hμiσ[𝐏~]+λZλiσ(FλμσSλμεiσ)]Cμjσ\displaystyle\sum_{\mu}\left[H_{\mu i}^{\sigma}[\tilde{\mathbf{P}}]+\sum_{\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\mu}^{\sigma}-S_{\lambda\mu}\varepsilon_{i}^{\sigma})\right]C_{\mu j}^{\sigma} =Hjiσ[𝐏~]+μλZλiσFλμσCμjσμλZλiσSλμCμjσ=0εiσ\displaystyle=H_{ji}^{\sigma}[\tilde{\mathbf{P}}]+\sum_{\mu\lambda}Z_{\lambda i}^{\sigma}F_{\lambda\mu}^{\sigma}C_{\mu j}^{\sigma}-\underbrace{\sum_{\mu\lambda}Z_{\lambda i}^{\sigma}S_{\lambda\mu}C_{\mu j}^{\sigma}}_{=0}\varepsilon_{i}^{\sigma} (71)
=Hijσ[𝐏~]+μλZλiσSλμCμjσ=0εjσ\displaystyle=H_{ij}^{\sigma}[\tilde{\mathbf{P}}]+\underbrace{\sum_{\mu\lambda}Z_{\lambda i}^{\sigma}S_{\lambda\mu}C_{\mu j}^{\sigma}}_{=0}\varepsilon_{j}^{\sigma} (72)
=Hijσ[𝐏~],\displaystyle=H_{ij}^{\sigma}[\tilde{\mathbf{P}}]\;, (73)

using the symmetry of 𝐇\mathbf{H} and Eq. 7. With these results, we can rearrange Eq. 69 to solve for 𝐙\mathbf{Z}, yielding:

Wijσ=12Hijσ[𝐏~]+12μBμiσCμjσ,W_{ij}^{\sigma}=\frac{1}{2}H_{ij}^{\sigma}[\tilde{\mathbf{P}}]+\frac{1}{2}\sum_{\mu}B_{\mu i}^{\sigma}C_{\mu j}^{\sigma}\;, (74)

that is reported as Eq. 17 in the main text.