Fast and Scalable Evaluation of Unbiased Atomic Forces in ab initio Variational Monte Carlo via the Lagrangian Technique

Kousuke Nakano [email protected] Center for Basic Research on Materials (CBRM), National Institute for Materials Science (NIMS), 1-2-1 Sengen, Tsukuba, Ibaraki 305-0047, Japan Stefano Battaglia [email protected] Department of Chemistry, University of Zurich (UZH), Winterthurerstrasse 190, 8057, Zurich, Switzerland Jürg Hutter Department of Chemistry, University of Zurich (UZH), Winterthurerstrasse 190, 8057, Zurich, Switzerland

(November 7, 2025)

Abstract

Ab initio quantum Monte Carlo (QMC) methods are state-of-the-art electronic structure calculations based on highly parallelizable stochastic frameworks for accurate solutions of the many-body Schrödinger equation, suitable for modern many-core supercomputer architectures. Despite its potential, one of the major drawbacks that still hinders QMC applications, especially when targeting dynamical properties of large systems or large amounts of configurations, is the lack of an affordable method to compute atomic forces that are consistent with the corresponding potential energy surfaces (PESs), also known as unbiased atomic forces. Recently, one of the authors in the present paper proposed a way to obtain unbiased forces with the Jastrow-correlated Slater determinant ansatz, where the determinant part is frozen to the values obtained by a mean-field method, such as Density Functional Theory [Phys. Rev. B 109, 205151 (2024)]. However, the proposed method has a significant drawback for its applications: for a system with $N$ nuclei, one requires $3N$ additional DFT calculations to get unbiased forces, which is not negligible as the system size increases. This paper presents a way to replace the $3N$ DFT calculations with a single coupled-perturbed Kohn-Sham calculation, following the so-called Lagrangian technique established in quantum chemistry. This improves the computational cost and scalability of the method. We also demonstrate that the developed unbiased VMC force calculation improves not only the consistency with PESs, but also its accuracy, by investigating three molecules from the rMD17 benchmark set, and comparing the corrected VMC forces with those obtained by the Coupled-Cluster Singles and Doubles with perturbative Triples [CCSD(T)] calculations. We found that the bare VMC forces are significantly biased from the CCSD(T) ones, while the unbiased ones give values much closer to those of the CCSD(T) ones. Our finding paves the way to generate machine learning interatomic potentials based on VMC forces more efficiently and accurately.

I Introduction

Ab initio quantum Monte Carlo (ab initio QMC) methods have been increasingly studied as highly scalable and parallelizable electronic structure methods that can fully leverage the capabilities of current and near-future exascale supercomputers [shinde2025shifting]. While density functional theory (DFT) remains the most widely adopted ab initio method, it is known to face significant challenges in accurately describing certain classes of systems, such as strongly correlated materials and multireference electronic states in molecules. QMC methods are attracting attention as promising alternatives capable of addressing these limitations. Despite the maturity of energy calculations within QMC, the computation of atomic forces — i.e., first derivatives of the total energy with respect to atomic positions — has long remained an open challenge, limiting the broader applicability of QMC. There are two main real-space QMC frameworks: variational Monte Carlo (VMC) and fixed-node diffusion Monte Carlo (FN-DMC) [2001FOU]. In this study, we focus on VMC because the calculation of forces within the FN-DMC framework [Flaviano2025_DMC_comparison] is much more complicated, and thus, its force calculation is challenging [1986REY, 2005CHI_Chiesa_scheme, 2008BAD2, 2011ASS, 2014MOR, 2021VAN]. The methodological development for forces evaluation in VMC has a long history, dating back to the 1980s. An intrinsic difficulty in computing atomic forces within VMC is the so-called infinite-variance problem: when the force is evaluated by Monte Carlo sampling, the variance of the force estimator diverges [2000ASS_VMC_infinite_variance]. Over the past decades, this difficulty has been addressed by several complementary strategies. One of the earliest attempts was the space-warp coordinate transformation (SWCT) originally introduced by Umrigar [Umrigar1989_SWCT]. This was generalized by Filippi and Umrigar within a correlated sampling framework [Filippi2000CorrelatedSamplingForces] and then differential formulations were developed by Assaraf and Caffarel [Assaraf2003_ZVZB_force] and Sorella and Capriotti [Sorella2010_AAD]. Although these advances cure the divergence associated with the electron–ion Coulomb singularity, the other divergence arising from the nodal surface also had to be controlled [Attaccalite2008_AS]. Successful remedies include reweighting of the sampling distribution [Attaccalite2008_AS, Pathak2008_PW_reweight] and the tail regression estimator [Trail2008HeavyTails, Trail2008AltSampling, LopezRios2019TailRegression]. Related to variance control, it was later clarified that one must consider linear dependence issues (i.e., ill-conditioned overlap matrices) in periodic boundary condition calculations with localized Gaussian basis sets [Nakano2021PRB_finite_variance_of_forces_for_PBC]. From an implementation point of view, the introduction of adjoint algorithmic differentiation for atomic force evaluation [Sorella2010_AAD] was a major milestone: it enables atomic force calculations without hand-coded analytic derivatives and dramatically reduces computational cost [Sorella2010_AAD]. Subsequently, a fast scheme for derivatives of multi-determinant wave functions was formulated [Filippi2016_fastderivatives_multideterminant], which has facilitated the generation of high-quality training data for machine learning force fields [slootman2024accurate]. These developments have collectively brought VMC force evaluations closer to practical utility.

Nevertheless, force calculations in VMC have been limited to small molecules [2005CHI_Chiesa_scheme, 2013ZEN_water, 2014MOR, 2014LUO_QMC_MD, 2019LIU_qmc_force_onstant, 2021VAN, Tiihonen2021JCP, Nakano2022JCP, slootman2024accurate], clusters [2017MOU, mouhat2023thermal], and crystals with light elements such as solid/liquid hydrogen [2018MAZ_Hydrogen, 2022TIR_MLP_hydrogen, 2022LY_qmc_hydrogen_phonon, 2023NIU_qmc_hydrogen, 2024CEP_qmc_hydrogen, 2025Giacomo, Kevin2025_hydrogen_RQMC, Goswami2025_hydrogen_MLP] and diamond [Nakano2021PRB_finite_variance_of_forces_for_PBC]. This restriction likely stems from a practical issue underlying VMC-based force evaluations, which we will describe hereafter: Let ${\bf R}_{\alpha}$ be the atomic position of the nucleus $\alpha$ . The atomic force acting on $\alpha$ is defined as the negative gradient of the energy with respect to ${\bf R}_{\alpha}$ :


$\displaystyle{\bf F}_{\alpha}^{\text{VMC}}=-\frac{{\rm d}E}{{\rm d}{\bf R}_{\alpha}}=$	$\displaystyle-\Braket{\frac{\partial}{\partial{\bf R}_{\alpha}}E_{\rm L}}$	(1a)
	$\displaystyle-2\Braket{(E_{\rm L}-E)\frac{\partial\log\Psi}{\partial{\bf R}_{\alpha}}}$	(1b)
	$\displaystyle-\sum_{i=1}^{N_{p}}{\frac{\partial E}{\partial p_{i}}}{\frac{\partial p_{i}}{\partial{\bf R}_{\alpha}}},$	(1c)

where $\Psi$ is an optimized wavefunction, $\braket{A}$ indicates the quantum average of the local operator $A$ over the sampling of $|\Psi|^{2}$ , $E_{\rm L}$ is the so-called local energy ( $E_{\rm L}\equiv\hat{H}\Psi/\Psi$ ), with $E\equiv\braket{E_{L}}$ , and $\{p_{1},\cdots,p_{N_{p}}\}$ is the set of $N_{p}$ variational parameters included in the $\Psi$ ansatz. Eqs. (1a), (1b), and (1c) are called the Hellmann–Feynman (HF), Pulay, and non-variational (NV) terms, respectively. When the wavefunction includes all relevant variational parameters and is fully optimized, the NV term vanishes [Tiihonen2021JCP, Nakano2024]. However, in practice — especially for large systems or high-throughput datasets — complete optimization of all wavefunction parameters is often infeasible. Therefore, the so-called Jastrow-Slater determinant approach is typically employed, where the Slater determinant obtained from mean-field methods such as DFT is kept fixed, and only the Jastrow factor is variationally optimized. Historically, even in this case, the NV term was often neglected on the assumption that its contribution was small, and forces were computed using only the HF and Pulay terms. However, recent studies have demonstrated that ignoring the NV term can lead to significant biases [Tiihonen2021JCP, Nakano2022JCP, Nakano2024]. The bias arising from the NV term is referred to as the self-consistency error (SCE) [Tiihonen2021JCP]. In 2021, Tihonen et al. [Tiihonen2021JCP] reported that the SCE becomes more severe for heavier elements. Follow-up studies confirmed that, for example, the SCE for liquid hydrogen is on the order of 1 GPa [2024GIA_qmc_hugonioit], while for compounds such as cubic boron nitride (c-BN), it can reach up to 5 GPa [Nakano2024]. This implies that applying VMC to practical, realistic systems beyond model compounds necessitates addressing the SCE — otherwise, it becomes a critical bottleneck in VMC applications.

Nakano et al. (including one of the authors of the present paper) recently formulated a method to eliminate the SCE [Nakano2024]. The idea is conceptually straightforward: direct evaluation of the NV term using a hybrid VMC-DFT approach. Specifically, the partial derivative $\partial E/\partial p_{i}$ is computed via standard VMC, while the wavefunction response term $dp_{i}/d{\bf R}_{\alpha}$ is evaluated using DFT with the finite difference method (FDM). Using this scheme, consistent forces have been successfully obtained for various systems including H₂, Cl₂, c-BN [Nakano2024], and liquid hydrogen [2025Giacomo]. Although this methodology successfully demonstrated a proof-of-concept for SCE elimination [Nakano2024, 2025Giacomo], a significant practical limitation remains: to compute corrected forces on all $N$ atoms, one requires 3 $N$ additional DFT calculations [Nakano2024]. While this is manageable for single-point force evaluations on small systems, it becomes computationally prohibitive when dealing with large systems or generating extensive datasets for machine-learning potential (MLP) training.

This paper presents a method to eliminate the 3 $N$ additional DFT calculations when correcting the NV term to get unbiased VMC forces. The key idea is the following. The bias arising from the NV term is not a specific problem in VMC calculations, but also in wavefunction theory (WFT) calculations when non-variational wavefunctions — such as those used in perturbative methods — are used to compute forces. One well-established approach in WFT is the so-called Lagrangian technique [helgaker1989configuration, 2017Jensen_textbook]. In this work, we extend the Lagrangian technique to the VMC context, enabling the calculation of atomic forces and pressure in both molecular and periodic systems using just a single VMC and a single coupled-perturbed Hartree-Fock (CPHF) or coupled-perturbed Kohn-Sham (CPKS) calculation. This eliminates the need for 3 $N$ DFT evaluations and improves the scalability of the method. Furthermore, we demonstrate that the developed consistent force evaluation improves not only the consistency, but also the accuracy of the VMC forces with the Jastrow Slater determinant anstaz. We test our implementation on three molecules, ethanol, malonaldehyde and benzene, from rMD17 benchmark set [Christensen2020_MLST, Christensen2020_rMD17_Figshare], and compare biased and unbiased VMC forces to forces obtained with the Coupled-Cluster Singles and Doubles with perturbative Triples [CCSD(T)] [Raghavachari1989_CCSDT] method, known as the gold-standard of quantum chemistry.

This paper is organized as follows. In Sec. II we describe the details of the Lagrangian technique in the context of VMC. In Sec. III, we show the validation of the formalism and our implementation in CP2K [Kuehne2020_CP2K_JCP] and TurboRVB [2020NAK2] software suites. In Sec. IV, we report a benchmark result of the unbiased VMC force calculations for the three molecules taken from rMD17 benchmark set. In Sec. V, we conclude our work and discuss the current limitation and future works of the developed method.

II Methods: Lagrangian technique extended to VMC

The key idea of the Lagrangian technique [helgaker1989configuration, 2017Jensen_textbook] is to construct a function, the Lagrangian, that has the same energy as the VMC one, but that incorporates all the constraints imposed during the optimization of the wavefunction. By making the Lagrangian stationary with respect to all its parameters, we then obtain a fully variational expression for which we can compute the nuclear gradient directly. The VMC energy depends on a set of Jastrow variational parameters $\{p_{i}\}_{i=1}^{N_{J}}$ , collected in the vector $\mathbf{p}$ , and the occupied molecular orbital (MO) coefficients of the Slater determinant (SD) denoted by the tensor $\mathbf{C}$ , with elements $C_{\mu i}^{\sigma}$ . The indices $\mu,\nu,\lambda$ and $\delta$ label atomic orbitals (AOs), the indices $i,j,k$ and $l$ label occupied MOs, and $\sigma,\tau$ label the spin. Sums over these indices automatically imply complete sums over the total number of occupied orbitals $N_{\text{MO}}$ , atomic orbitals $N_{\text{AO}}$ , and the number of spin channels $N_{\text{spin}}$ , respectively. The overlap matrix between AOs or MOs (depending on its subscripts) is denoted by $S$ . In the following, we always label the atomic and molecular orbitals as subscripts, and the spin index as a superscript, where possible.

Let us start by introducing the VMC Lagrangian:

\displaystyle L_{\rm VMC}({\bf p,C,Z,W})=E_{\rm VMC}(\mathbf{p},\mathbf{C})+\sum_{\lambda k\tau}Z_{\lambda k}^{\tau}\frac{\partial L_{\rm SCF}(\mathbf{C})}{\partial{C_{\lambda k}^{\tau}}}-\sum_{kl\tau}W_{kl}^{\tau}(S_{kl}^{\tau}-\delta_{kl})\;.

(2)

The first term is the VMC energy, the second term corresponds to the stationary conditions of the underlying SCF reference calculation, and the third term enforces the orthonormality of the occupied MOs. The tensors ${\bf Z}$ and ${\bf W}$ contain the Lagrange multipliers. We do not give the explicit form of $L_{\rm SCF}$ here because only its derivative (defined later) is needed. In the most general case, all boldfaced quantities represent tensors of rank 3, with dimension $N_{\text{AO}}\times N_{\text{MO}}\times N_{\text{spin}}$ for $\mathbf{C}$ and $\mathbf{Z}$ , and dimension $N_{\text{MO}}\times N_{\text{MO}}\times N_{\text{spin}}$ for $\mathbf{W}$ . In case of restricted calculations, where the MO coefficients are the same for both spin channels, $N_{\text{spin}}=1$ and the tensors reduce to matrices. Several equivalent formulations of the SCF term of the Lagrangian exist. In this work, we closely follow the atomic orbital formulation used for the implementation of density-corrected DFT gradients in CP2K [2023BEL]. This results in the Roothaan-Hall equations for the occupied MOs:

\frac{\partial L_{\rm SCF}}{\partial C_{\lambda k}^{\tau}}=\sum_{\nu}F_{\lambda\nu}^{\tau}C_{\nu k}^{\tau}-S_{\lambda\nu}C_{\nu k}^{\tau}\varepsilon_{k}^{\tau},

(3)

where $F_{\lambda\nu}^{\tau}$ is the Fock/Kohn-Sham matrix and $\varepsilon_{k}^{\tau}$ is the Kohn-Sham eigenvalue of $k$ -th MO with the spin $\tau$ . The explicit form of the Fock/Kohn-Sham matrix is introduced later. When the Lagrangian is stationary with respect to all its parameters, we can rewrite $\mathbf{F}^{\text{VMC}}_{\alpha}$ with $L_{\rm VMC}$ as:

\displaystyle\mathbf{F}^{\rm{VMC}}_{\alpha}=-\cfrac{{\rm d}E_{\rm VMC}}{{\rm d}R_{\alpha}}=-\cfrac{{\rm d}L_{\rm VMC}}{{\rm d}R_{\alpha}}=-\cfrac{\partial L_{\rm VMC}}{\partial R_{\alpha}}\;.

(4)

To make the Lagrangian stationary with respect to all its parameters, we have to ensure that the following set of equations is satisfied:


$\displaystyle\frac{\partial L_{\rm VMC}}{\partial p_{i}}$	$\displaystyle=\frac{\partial E_{\rm VMC}}{\partial p_{i}}=0\;,$	(5a)
$\displaystyle\frac{\partial L_{\rm VMC}}{\partial C_{\mu i}^{\sigma}}$	$\displaystyle=\frac{\partial E_{\rm VMC}}{\partial C_{\mu i}^{\sigma}}+\sum_{\lambda k\tau}Z_{\lambda k}^{\tau}\frac{\partial^{2}L_{\rm SCF}}{\partial{C_{\mu i}^{\sigma}}\partial{C_{\lambda k}^{\tau}}}-\sum_{kl\tau}W_{kl}^{\tau}\frac{\partial S_{kl}^{\tau}}{\partial C_{\mu i}^{\sigma}}=0\;,$	(5b)
$\displaystyle\frac{\partial L_{\rm VMC}}{\partial Z_{\lambda k}^{\tau}}$	$\displaystyle=\sum_{\nu}F_{\lambda\nu}^{\tau}C_{\nu k}^{\tau}-S_{\lambda\nu}C_{\nu k}^{\tau}\varepsilon_{k}^{\tau}=0\;,$	(5c)
$\displaystyle\frac{\partial L_{\rm VMC}}{\partial W_{kl}^{\tau}}$	$\displaystyle=S_{kl}^{\tau}-\delta_{kl}=0\;.$	(5d)

Eq. 5a is the variational condition on the VMC energy with respect to the Jastrow parameters. For an optimized Jastrow wave function, this condition is automatically fulfilled. Eq. 5b is zero by virtue of the Lagrange multipliers; that is, we choose the elements $Z_{\lambda k}^{\tau}$ and $W_{kl}^{\tau}$ such that $\tfrac{\partial L_{\rm VMC}}{\partial C_{\mu i}^{\sigma}}=0$ for $\forall\mu,i,\sigma$ . Eqs. 5c and 5d are automatically fulfilled if we use canonical orbitals from the underlying SCF calculation.

At first sight, it seems that Eq. 5b is not sufficient to determine both $\mathbf{Z}$ and $\mathbf{W}$ . However, only certain subblocks are different from zero, such that by projecting the stationary conditions onto occupied and virtual orbital spaces, we are able to obtain enough equations for their determination. To this end, let us introduce the projection operator $\mathbf{Q}$ , with elements:

Q_{\mu\nu}^{\sigma}=\delta_{\mu\nu}-\sum_{\lambda k}C_{\mu k}^{\sigma}C_{\lambda k}^{\sigma}S_{\lambda\nu}\;.

(6)

The Lagrange multipliers $Z_{\lambda k}^{\tau}$ can be best understood as expansion coefficients of the orbital response due to the correlation energy introduced by the Jastrow factor. Hence, only the occupied-virtual blocks contribute to the response, while the occupied-occupied block is assumed to be zero:

Z_{ij}^{\sigma}=\sum_{\mu\nu}Z_{\mu i}^{\sigma}S_{\mu\nu}C_{\nu j}^{\sigma}=0\;,

(7)

as unitary rotations among occupied orbitals leave the energy unchanged. It is convenient at this stage to introduce the explicit form of the Fock/Kohn-Sham matrix:

F_{\mu\nu}^{\sigma}[\mathbf{P}]=h_{\mu\nu}+\sum_{\lambda\delta\tau}P_{\lambda\delta}^{\tau}[(\mu\nu|\lambda\delta)-c_{x}\delta_{\sigma\tau}(\mu\lambda|\nu\delta)]+V_{\mu\nu}^{\text{xc},\sigma}\;,

(8)

where $(\mu\nu|\lambda\delta)$ are two-electron integrals over primitive atomic orbitals

(\mu\nu|\lambda\delta)=\int\int\psi_{\mu}(\mathbf{r})\psi_{\nu}(\mathbf{r})\frac{1}{|\mathbf{r}-\mathbf{r}^{\prime}|}\psi_{\lambda}(\mathbf{r}^{\prime})\psi_{\delta}(\mathbf{r}^{\prime})\mathrm{d}\mathbf{r}\mathrm{d}\mathbf{r}^{\prime}\;;

(9)

and $h_{\mu\nu}$ are the kinetic energy and nuclear electron attraction integrals. This form is completely general and accommodates Hartree-Fock, pure semilocal functionals, and hybrid functionals using the exact exchange parameter $c_{x}$ and the exchange-correlation potential $V^{\text{xc},\sigma}_{\mu\nu}$ . The tensor $\mathbf{P}$ contains the spin-density matrices in AO basis, with elements:

P_{\mu\nu}^{\sigma}=\sum_{i}C_{\mu i}^{\sigma}C_{\nu i}^{\sigma}\;.

(10)

The stationary conditions for the MO coefficients are given by equating to zero the derivatives of $L_{\rm VMC}$ with respect to $C_{\mu i}^{\sigma}$ :

\displaystyle\frac{\partial L_{\rm VMC}}{\partial C_{\mu i}^{\sigma}}=\frac{\partial E_{\rm VMC}}{\partial C_{\mu i}^{\sigma}}+\sum_{\lambda k\tau}Z_{\lambda k}^{\tau}\sum_{\nu}\frac{\partial}{\partial C_{\mu i}^{\sigma}}(F_{\lambda\nu}^{\tau}C_{\nu k}^{\tau}-S_{\lambda\nu}C_{\nu k}^{\tau}\varepsilon_{k}^{\tau})-\sum_{kl\tau}W_{kl}^{\tau}\frac{\partial S_{kl}^{\tau}}{\partial C_{\mu i}^{\sigma}}=0\;.

(11)

To arrive at working equations to determine the multipliers, we project Eq. 11 onto the occupied orbital space using $\mathbf{C}$ , and onto the unoccupied orbital space using $\mathbf{Q}$ :


$\displaystyle\sum_{\mu}\frac{\partial L_{\rm VMC}}{\partial C_{\mu i}^{\sigma}}Q_{\mu\nu}^{\sigma}$	$\displaystyle=0\to Z_{\nu i}^{\sigma}\quad\forall\,\nu,i,\sigma\;,$	(12a)
$\displaystyle\sum_{\mu}\frac{\partial L_{\rm VMC}}{\partial C_{\mu i}^{\sigma}}C_{\mu j}^{\sigma}$	$\displaystyle=0\to W_{ij}^{\sigma}\quad\forall\,i,j,\sigma\;.$	(12b)

In the following, we report only the final expressions obtained from Eqs. 12a and 12b; their explicit derivation is given in Appendix B. The projection onto the virtual space, Eq. 12a, results in a linear system of equations $\mathbf{A}\mathbf{Z}=-\mathbf{B}$ known as the Z-vector equations, from which the Lagrange multipliers $\mathbf{Z}$ can be determined:

\sum_{\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\mu}^{\sigma}-S_{\lambda\mu}\varepsilon_{i}^{\sigma})+\sum_{\mu}H_{\mu i}^{\sigma}\big[\bar{\mathbf{P}}\big]Q_{\mu\nu}^{\sigma}=-\sum_{\mu}B_{\mu i}^{\sigma}Q_{\mu\nu}^{\sigma}\;.

(13)

The elements of the tensor $\mathbf{B}$ are the derivatives of the VMC energy with respect to the MO coefficients

B_{\mu i}^{\sigma}=\frac{\partial E_{\rm VMC}}{\partial C_{\mu i}^{\sigma}}\;,

(14)

while the tensor $\mathbf{A}$ contains quantities related to the SCF stationary conditions and the MOs orthonormalization constraints; hence it does not depend on the VMC calculation. The elements $H_{\mu i}^{\sigma}\big[\bar{\mathbf{P}}\big]$ are derivatives of the Fock matrix with respect to the MO coefficients,

H_{\mu i}^{\sigma}\big[\bar{\mathbf{P}}\big]=\sum_{\lambda\nu\tau}\bar{P}_{\lambda\nu}^{\tau}(2(\lambda\nu|\mu i_{\sigma})-c_{x}\delta_{\tau\sigma}[(\lambda\mu|\nu i_{\sigma})+(\lambda i_{\sigma}|\nu\mu)])+2f_{\lambda\nu\tau,\mu i\sigma}^{\text{xc}}\;;

(15)

where $f^{\text{xc}}_{\lambda\nu\tau,\mu i\sigma}$ is the exchange-correlation kernel with an index contracted with a molecular orbital, and the tensor $\bar{\mathbf{P}}$ is a symmetrized form of the response density matrix (for increased numerical stability), given by

\bar{P}_{\mu\nu}^{\sigma}=\frac{1}{2}\sum_{i}(Z_{\mu i}^{\sigma}C_{\nu i}^{\sigma}+C_{\mu i}^{\sigma}Z_{\nu i}^{\sigma})\;.

(16)

Once the response tensor $\mathbf{Z}$ is determined by solving Eq. 13, we can also determine the second set of Lagrange multipliers $\mathbf{W}$ , which are given by

W_{ij}^{\sigma}=\frac{1}{2}H_{ij}^{\sigma}\big[\bar{\mathbf{P}}\big]+\frac{1}{2}\sum_{\mu}B_{\mu i}^{\sigma}C_{\mu j}^{\sigma}.

(17)

Finally, the atomic force is given by the negative gradient of the Lagrangian with respect to the nuclear positions, reading

\displaystyle\frac{\partial L_{\rm VMC}}{\partial\mathbf{R}_{\alpha}}=\frac{\partial E_{\rm VMC}}{\partial\mathbf{R}_{\alpha}}-\sum_{\mu\nu\sigma}\Lambda_{\mu\nu}^{\sigma}\frac{\partial S_{\mu\nu}}{\partial\mathbf{R}_{\alpha}}+\sum_{\mu\nu j\sigma}Z_{\mu j}^{\sigma}\frac{\partial F_{\mu\nu}^{\sigma}}{\partial\mathbf{R}_{\alpha}}C_{\nu j}^{\sigma}\;,

(18)

with the intermediate quantity $\Lambda_{\mu\nu}^{\sigma}$ defined as

\Lambda_{\mu\nu}^{\sigma}=\sum_{ij}C_{\mu i}^{\sigma}W_{ij}^{\sigma}C_{\nu j}^{\sigma}+\frac{1}{2}\sum_{i}\varepsilon_{i}^{\sigma}(Z_{\mu i}^{\sigma}C_{\nu i}^{\sigma}+C_{\mu i}^{\sigma}Z_{\nu i}^{\sigma})\;,

(19)

embodying the conditions of orthonormality between the response MOs, and the so-called energy-weighted response density matrix.

An important aspect of the above Lagrangian approach in the context of VMC is that the elements of the right-hand side tensor $\mathbf{B}$ of Eq. 13 (i.e., the energy derivative with respect to the MO coefficients) come with statistical noise. It is important to know how this noise propagates from the VMC energy derivative to the response vectors, and ultimately to the forces. There are several ways to estimate the errors: the easiest way is to use a resampling method such as the Jackknife approach. As an outcome of a VMC calculation, we can reblock samples of $B^{\sigma}_{\mu i}$ and estimate the error bars of the corrections by iteratively solving the Z-vector equations. Another way to estimate the errors is based on the analytical error propagation of $B^{\sigma}_{\mu i}$ , for which the variance-covariance tensor of $B^{\sigma}_{\mu i}$ is required. In this paper, we employed the former approach, i.e., the error bar estimations were done using the Jackknife method. Further details about the VMC calculation of energy and gradients, as well as the derivation of the analytical error propagation is shown in Appendix A.

III Validation of the method

In this section, we validate the Lagrangian technique described above, which we have implemented by interfacing the linear response module of CP2K [Kuehne2020_CP2K_JCP] with TurboRVB [2020NAK2] and exchanging the data through the TREX-IO library [2023POS]. By validation, we mean demonstrating two points: (i) forces and pressures obtained with the Lagrangian technique agree with numerical derivatives of the potential energy surface (i.e., the results are unbiased), and (ii) these forces and pressures are consistent with those computed by the finite-difference method (FDM) developed in Ref. Nakano2024. The validation was performed for a Cl₂ molecule and crystalline c-BN. For Cl₂, we evaluated forces; for c-BN, we evaluated both forces and pressure. For Cl₂, we used the ccECP pseudopotentials [2018BEN] together with their accompanied cc-pVDZ basis sets. For c-BN we used the ccECP [2017BEN] pseudopotentials combined with DZVP-MOLOPT-SR-GTH basis sets [Goedecker1996_GTH, VandeVondele2007_MOLOPT] to avoid spuriously large force error bars by GTOs with large exponents [Nakano2021PRB_finite_variance_of_forces_for_PBC]. The cBN calculations employed a conventional 1 $\times$ 1 $\times$ 1 supercell with $\Gamma$ -point. To estimate the statistical error of the linear-response (LR) correction, we used the jackknife approach [2017BEC]: we repeatedly applied the LR procedure to samples of $\partial E/\partial P_{\mu\nu}^{\uparrow\downarrow}$ drawn from TurboRVB and computed the jackknife mean and variance. Figures 1 and 2 summarize the results for the Cl₂ forces and for the c-BN forces and pressure, addressing points (i) and (ii). As shown, the forces and pressures from the Lagrangian technique coincide with the numerical derivatives of the potential energy surface (i.e., they are unbiased) and match the values obtained with the FDM [Nakano2024]. Taken together, these results confirm that the Lagrangian technique behaves as expected. Notice that we do not compare the obtained PESs and EOS with the experimental bond lengths and volume, since the used basis sets, the supercell size, and the $k$ -point ( $\Gamma$ only) are suitable only for numerically checking (i) and (ii).

Refer to caption — Figure 1: Validation of the Lagrangian technique for Cl₂. (a) Potential energy surface (PES) as a function of bond length (green circles) and its fit (green dotted line). The reference force on the right-hand Cl atom (Cl₂ oriented along the $x$ axis), obtained from the derivative of the fitted PES, is shown as a blue dashed line. Red diamonds, purple squares, and cyan triangles denote forces evaluated with the HF + Pulay terms, HF + Pulay terms + NV correction (via FDM), and HF + Pulay terms + NV correction (via LR), respectively. (b) Comparison between the PES-derived (reference) forces and those obtained from the estimators. Inset: direct comparison between HF + Pulay terms + NV correction (via FDM) and HF + Pulay terms + NV correction (via LR).

We next discuss computational cost. The primary advantage of the Lagrangian technique is efficiency: unlike the FDM approach [Nakano2024], which requires $3N$ separate DFT calculations to obtain the forces, the Lagrangian technique needs only a single solution of the CPHF/CPKS equations via the LR module. The complexity of DFT is typically $O(N^{3})$ , and together with the $3N$ required calculations for the FDM approach, results in a $O(N^{4})$ asymptotic scaling. On the other hand, solving the CPHF/CPKS equations has the same asymptotic scaling as a single HF/KS-SCF calculation. Therefore, the asymptotic complexity of the Lagrangian approach remains $O(N^{3})$ . As the system size increases, the Lagrangian approach is the more affordable and likely the only practical solution for larger systems.

From a technical point of view, the exchange of wavefunctions between CP2K [Kuehne2020_CP2K_JCP] and TurboRVB [2020NAK2] required careful attention to differences in conventions, which is a common challenge across electronic structure packages. Examples include the use of spherical-harmonic versus Cartesian Gaussian Type Orbitals (GTOs), normalization constants and their application stage (pre- vs post-contraction, shell-wise vs AO-level), and related notational choices. In this work, we adopted the TREX-IO [2023POS] format, a flexible library that standardizes conventions for exchanging wavefunctions, structures, and pseudopotentials among electronic structure packages. TREX-IO effectively absorbed inter-code convention mismatches and played a key role in robust data exchange between CP2k and TurboRVB in this work.

IV Application to organic molecules in rMD17 dataset

Now that we have established an unbiased VMC force evaluation based on the Lagrangian technique, a systematic assessment of VMC force accuracy is feasible across a wide range of compounds. VMC and DMC results are increasingly being used as training dataset for MLPs [2022TIR_MLP_hydrogen, 2023NIU_qmc_hydrogen, 2024CEP_qmc_hydrogen, 2024GIA_qmc_hugonioit, slootman2024accurate, 2025Giacomo, Goswami2025_hydrogen_MLP, Kevin2025_hydrogen_RQMC]. In this study, we selected ethanol, benzene, and malonaldehyde from the rMD17 dataset [Christensen2020_MLST, Christensen2020_rMD17_Figshare], and computed (i) unbiased VMC forces established in this work, (ii) DFT forces, and (iii) CCSD(T) forces as references. Specifically, we took 100 structures from molecular dynamics trajectories distributed as rMD17 dataset [Christensen2020_MLST, Christensen2020_rMD17_Figshare]. In a previous study that constructed MLPs with CCSD(T) references [2018Stefan], the Dunning cc-pVTZ [1989DUN] was used for ethanol, and cc-pVDZ [1989DUN] for benzene and malonaldehyde to generate a training dataset for MLPs. Here, we used cc-pVQZ for the three molecules for the CCSD(T) calculations (1 $s$ shells were kept frozen) to obtain results closer to the complete basis-set limit. For comparison, we also carried out WFT calculations — HF, second-order Møller–Plesset perturbation theory (MP2) [1934MOL_MP2], and CCSD [Purvis1982_CCSD] with the cc-pVQZ basis — and DFT calculations with the def2-QZVPPD [Weigend2005_def2] basis set using several exchange–correlation functionals with the D3BJ dispersion correction [Grimme2010_D3, Grimme2011_D3BJ]. Specifically, we chose SVWN [Vosko1980_VWN], PBE [1996PER], PBE+D3BJ, PBE0 [Adamo1999_PBE0]+D3BJ, $\omega$ B97X+D3BJ [Mardirossian2014_wB97X_V, Najibi2018_wB97X_wB97M_D3BJ], and $\omega$ B97M+D3BJ [Mardirossian2016_wB97M_V, Najibi2018_wB97X_wB97M_D3BJ]. Also in the case of MP2 and CCSD calculations, the 1 $s$ shells were kept frozen. All of these calculations were performed with Psi4 software suite [Smith2020_PSI4_1_4]. SVWN and PBE were selected as computationally inexpensive exchange–correlation functionals. PBE0 was included to allow comparison with a previous VMC/DMC benchmark study [slootman2024accurate], which combined Tkatchenko-Scheffler (TS) [2009ALEX-TS] and the many-body dispersion (MBD) [2012ALEX-MBD] corrections with PBE and PBE0, respectively. Since the Psi4 software suite does not implement TS or MBD, we used D3BJ dispersion correction in this work. The $\omega$ B97 family of functionals was chosen due to its widespread use in popular MLP datasets (e.g., SPICE [Eastman2023_SPICE_SciData]) and its reported good performance. For the VMC calculations, we employed ccECP pseudopotentials [2017BEN] together with the accompanied cc-pVTZ basis set. Following the previous VMC/DMC benchmark study by Slootman et al. [slootman2024accurate], we also benchmarked ccECP-based CCSD against all-electron CCSD using PySCF [2018SUN_pyscf, 2020SUN_pyscf] and confirmed that the force errors for the three molecules are $<$ 1 kcal/mol/Å in RMSEs. In the VMC calculations, the Slater determinant part was fixed to orbitals obtained from DFT (B3LYP), and only the Jastrow part was variationally optimized using stochastic reconfiguration [1998SOR]. The Jastrow factor comprised a two-body term enforcing the electron–electron cusp [2020NAK2] and a three-body term parametrized using pairings of atomic orbitals [2020NAK2]. For the three-body Jastrow basis, we used uncontracted primitive GTOs with [3s] for H and [3s1p] for C, N, and O; the corresponding Gaussian exponents were optimized for each configuration during the optimization procedure. Although matrix elements coupling different atomic pairs (often referred to as four-body Jastrow terms [2020NAK2]) are frequently set to zero in practice [2020NAK2], we retained them here because they were observed to have a significant impact on the forces. The total numbers of optimized variational parameters were 715, 958, and 1548 for ethanol, malonaldehyde, and benzene, respectively. Trial wavefunctions were generated with CP2K [Kuehne2020_CP2K_JCP]; VMC simulations were performed with TurboRVB [2020NAK2]; and the force-bias correction based on the Lagrangian technique was applied via CP2K [Kuehne2020_CP2K_JCP].

Table 1: The RMSE of atomic forces of all the configurations for the selected methods relative to CCSD(T). The unit is kcal/mol/Å.

Method	ethanol	benzene	malonaldehyde
VMC(biased)	5.29	4.78	14.16
VMC(unbiased)	3.64	2.69	3.81
HF	9.45	5.57	23.44
MP2	1.39	1.64	2.13
CCSD	1.62	2.00	5.01
SVWN	7.37	8.32	8.86
PBE	4.63	3.09	4.72
PBE-D3BJ	4.60	3.06	4.68
PBE0-D3BJ	2.21	3.75	7.80
$\omega$ B97X-D3BJ	0.98	3.13	5.40
$\omega$ B97M-D3BJ	1.24	3.08	6.31

Table 1 reports, for each molecule, the root-mean-square error (RMSE, in kcal mol^-1 Å^-1) of all force components — i.e., $N_{\mathrm{atoms}}\times 3$ directions $\times$ 100 structures — relative to CCSD(T). Figure 3 reports, for each molecule, the violin plot of the distribution of RMSE in 100 configurations for selected methods. We first emphasize that the benefit of the Lagrangian technique is evident from Table 1. Comparing the CCSD(T) reference with biased VMC forces (i.e., without the Lagrangian-technique correction), the RMSEs are 5.29, 4.78, and 14.16 kcal mol^-1 Å^-1 for ethanol, benzene, malonaldehyde, respectively — better than HF and SVWN, yet substantially worse than PBE for all systems. In contrast, the unbiased VMC forces developed in this study yield RMSEs of 3.64, 2.69, and 3.81 kcal mol^-1 Å^-1 for ethanol, benzene, and malonaldehyde, respectively, against CCSD(T). Thus, when accurate forces are sought with only partially optimized wavefunctions, our correction scheme is essential.

For ethanol, Slootman et al. [slootman2024accurate] performed an extensive benchmark for a set of 200 configurations and reported an RMSE of 3.055 kcal mol^-1 Å^-1 for VMC forces based on a fully optimized Jastrow Slater wavefunction. Although our ethanol configurations are not identical to theirs and a strict one-to-one comparison is not possible, our ethanol RMSE, 3.64 kcal mol^-1 Å^-1, is close to their value. This demonstrates that, even without optimizing the determinant part, correcting the non-variational term can achieve force accuracies comparable to those obtained with fully optimized wavefunctions, thereby reducing computational cost. Benzene and malonaldehyde similarly show RMSEs in the order of 3 kcal mol^-1 Å^-1. Slootman et al. [slootman2024accurate] found that achieving force accuracy of $\sim$ 1 kcal mol^-1 Å^-1 relative to CCSD(T) generally requires DMC. While the present Lagrangian technique is, in principle, applicable to DMC, a full DMC implementation lies beyond the scope of this paper and is left for future work.

Table 1 also lists the HF, MP2, CCSD, and DFT results. For ethanol, $\omega$ B97X-D3BJ and $\omega$ B97M-D3BJ perform exceptionally well, yielding $<$ 1 kcal mol^-1 Å^-1 errors relative to CCSD(T). Interestingly, this trend is not universal across molecules. For benzene, the VMC RMSE is 2.69 kcal mol^-1 Å^-1, whereas $\omega$ B97X-D3BJ and $\omega$ B97M-D3BJ exceed 3 kcal mol^-1 Å^-1. For malonaldehyde, VMC yields 3.81 kcal mol^-1 Å^-1, while the $\omega$ B97 functional family gives $\sim$ 6 kcal mol^-1 Å^-1. For ethanol, Slootman et al. [slootman2024accurate] compared VMC with dispersion-corrected PBE-TS and PBE0-MB, finding RMSE(PBE0-MBD) $<$ RMSE(VMC) $<$ RMSE(PBE-TS); that is, VMC performed between a GGA and a hybrid GGA. Our results reproduce this trend. For all the three compounds, HF shows the worst RMSE among the tested method, while MP2 shows a very good performance.

These results suggest that our unbiased VMC appears advantageous over DFT when one must treat chemically diverse sets of compounds (e.g., for generating general purpose machine-learning datasets) because VMC accuracy shows less system-dependence. However, in terms of closeness to the CCSD(T) reference values, VMC is outperformed by MP2 for the three compounds. Although VMC offers a more favorable scaling with the number of electrons (typically, $O(N^{3})$ [esler2008quantum] and $O(N^{4-5})$ for VMC and MP2, respectively) and is naturally suitable for parallelization, the large prefactor in practical VMC implementations complicates any conclusion about superior wall time efficiency. Consequently, the question of which method can deliver forces close to CCSD(T) accuracy at a cost lower than the CCSD(T) reference remains unresolved in the present study due to the limited number of test cases. A broader benchmark — covering more chemically varied systems and also including DMC benchmark — is a promising future work.

V Concluding remarks and future perspectives

In this paper, we introduce an application of the Lagrangian technique in Variational Monte Carlo (VMC) calculations to get atomic forces and pressures consistent with potential energy surfaces (also known as unbiased forces and pressures). The previous study to get unbiased forces and pressures with the Jastrow Slater determinant ansatz requires 3 $N$ times additional Density Functional Theory (DFT) calculations ( $N$ refers to the number of nuclei in a system) [Nakano2024], which hinders the applications of unbiased VMC force/pressure calculations in large molecules/crystal or in generations of training data for machine-learning potentials (MLPs). The formulation we report in this paper replaces the additional 3 $N$ -times DFT calculations with a single coupled-perturbed Hartree-Fock or coupled-perturbed Kohn-Sham calculation. This improves the scalability of the VMC force calculations with the conventional Jastrow Slater determinant ansatz. As an application of our developed method, we demonstrate that the unbiased force evaluation improves not only the consistency but also the accuracy of the VMC forces. We selected three molecules from the rMD17 benchmark set [Christensen2020_MLST, Christensen2020_rMD17_Figshare], ethanol, malonaldehyde and benzene, and compared biased VMC forces, unbiased VMC forces, and forces obtained by Coupled-Cluster Singles and Doubles with perturbative Triples [CCSD(T)] [Raghavachari1989_CCSDT] calculations. We found that the biased VMC forces are significantly biased from the CCSD(T) ones, while the unbiased ones give values close to those of the CCSD(T) ones. The unbiased force evaluation described here, for instance, plays a crucial role in generating high-quality training data for MLPs.

Code and Data availability

The ab initio DFT and QMC packages used in this work, CP2K and TurboRVB, are available from their GitHub repositories [https://github.com/cp2k/cp2k] and [https://github.com/sissaschool/turborvb], respectively.

acknowledgments

K.N. is grateful for computational resources from the Numerical Materials Simulator at National Institute for Materials Science (NIMS), from Research Institute for Information Technology at Kyushu University under the category of General Projects on the supercomputer GENKAI, and from supercomputer Fugaku provided by RIKEN through the HPCI System Research Projects (Project ID: hp250031). K.N. acknowledges financial support from MEXT Leading Initiative for Excellent Young Researchers (Grant No. JPMXS0320220025), from Murata Science and Education Foundation (Grant No. M24AN006), and from Japan Science and Technology Agency (JST), PRESTO (Grant No. JPMJPR24J9). This research was also supported by the NCCR MARVEL, a National Center of Competence in Research, funded by the Swiss National Science Foundation (Grant No. 205602). K.N. acknowledges Michele Casula (CNRS) for fruitful discussion about VMC force calculations. K.N. acknowledges the valuable comments on the method the author developed before from Dr. Emmanuel Giner (CNRS) and Dr. David Ceperley (UIUC) in a conference. We thank Guglielmo Mazzola (UZH) for organizing a seminar delivered by one of the authors K.N. at UZH. The seminar and subsequent discussions led to this collaboration.

Appendix A Details of the Variational Monte Carlo calculations

A.1 Variational Monte Carlo Methods

The expectation value of the energy for a given wavefunction, $\Psi$ , is evaluated as:

\left\langle E\right\rangle=\frac{{\int{d{\mathbf{x}}{\Psi^{2}}\left({\mathbf{x}}\right)\cdot\hat{\mathcal{H}}\Psi\left({\mathbf{x}}\right)/\Psi\left({\mathbf{x}}\right)}}}{{\int{d{\mathbf{x}}{\Psi^{2}}\left({\mathbf{x}}\right)}}}=\int{d{\mathbf{x}}{e_{L}}\left({\mathbf{x}}\right)\pi\left({\mathbf{x}}\right)},

(20)

where ${\mathbf{x}}=\left({{{\mathbf{r}}_{1}}{\sigma_{1}},{{\mathbf{r}}_{2}}{\sigma_{2}},\ldots{{\mathbf{r}}_{N}}{\sigma_{N}}}\right)$ is a shorthand notation for the $N$ electron coordinates and their spins, and

{e_{L}}\left({\mathbf{x}}\right)\equiv{\hat{\mathcal{H}}\Psi\left({\mathbf{x}}\right)\over\Psi\left({\mathbf{x}}\right)}\quad\text{and}\quad\pi\left({\mathbf{x}}\right)\equiv{{\Psi^{2}}\left({\mathbf{x}}\right)\over\int{d{\mathbf{x^{\prime}}}{\Psi^{2}}\left({\mathbf{x^{\prime}}}\right)}},

are the local energy and the probability distribution, respectively. The 3 $N$ -dimensional integration can be stochastically evaluated by generating a set $\left\{{{{\mathbf{x}}_{i}}}\right\}$ according to the probability distribution $\pi\left({\mathbf{x}}\right)$ generated using the Markov-Chain Monte Carlo (MCMC) technique and by averaging the obtained local energies ${e_{L}}\left({{{\mathbf{x}}_{i}}}\right)$ :

{E_{{\text{MCMC}}}}={\left\langle{{e_{L}}\left({\mathbf{x}}\right)}\right\rangle_{\pi\left({\mathbf{x}}\right)}}\approx\frac{1}{M}\sum\limits_{i=1}^{M}{{e_{L}}\left({{{\mathbf{x}}_{i}}}\right)}.

(21)

${E_{{\text{MCMC}}}}$ is always associated with an error since it is evaluated by a stochastic sampling. We note that the error bar is inversely proportional to the square root of the number of samplings.

Suppose $E_{0}$ is the ground state energy, the variational theorem guarantees that the expectation value of $E$ never becomes smaller than $E_{0}$ . In other words, to approach the ground state, one can optimize a given WF by introducing a set of parameters $\mathbf{p}\equiv\left(\cdots,p_{i},\cdots\right)$ to the WF ${\Psi\left({{\mathbf{x}},\mathbf{p}}\right)}$ :

{E_{{\text{VMC}}}}\left(\mathbf{p}\right)=\int{d{\mathbf{x}}{e_{L}}\left({{\mathbf{x}},\mathbf{p}}\right)\pi\left({{\mathbf{x}},\mathbf{p}}\right)}\geqslant E_{0}.

(22)

This framework is called Variational Monte Carlo (VMC). VMC is conceptually very simple, but the optimization of a many-body WF is a complex numerical task due to several reasons such as the presence of several local minima in the energy landscape and statistical errors associated with MCMC evaluation. Great improvements in this field have been achieved in the QMC community. For instance, in TurboRVB, the automatic differentiation [Sorella2010_AAD] allows the stable and efficient calculation of the energy derivatives with variational parameters and offer an efficient optimization method, the so-called stochastic reconfiguration method (also known as natural gradient optimization) [1998SOR, 2007SOR].

A.2 Wavefunction Ansatz

TurboRVB employs the Jastrow antisymmetrized geminal power (JAGP) [2003CAS] ansatz. The ansatz is composed of a Jastrow and an antisymmetric part ( $\Psi=J\cdot{\Psi_{{\text{AGP}}}}$ ). The antisymmetric part reads:

\displaystyle{\Psi_{{\text{AGP}}}}\left({{{\mathbf{r}}_{1}},\ldots,{{\mathbf{r}}_{N}}}\right)={\det}\left[{g\left({{\mathbf{r}}_{1}^{\uparrow},{\mathbf{r}}_{1}^{\downarrow}}\right)g\left({{\mathbf{r}}_{2}^{\uparrow},{\mathbf{r}}_{2}^{\downarrow}}\right)\cdots g\left({{\mathbf{r}}_{N/2}^{\uparrow},{\mathbf{r}}_{N/2}^{\downarrow}}\right)}\right],

(23)

where $g\left({{\mathbf{r}}^{\uparrow},{\mathbf{r}}^{\downarrow}}\right)$ is called the pairing function. Although, for the sake of clarity, an unpolarized system with an even number $N$ of electrons is considered here, it can be generalized to describe a polarized system with unpaired electrons [2020NAK2]. The pairing function, $g\left({{{\mathbf{r}}_{i}^{\uparrow}},{{\mathbf{r}}_{j}^{\downarrow}}}\right)$ , is expanded over Gaussian-type atomic orbitals (GTOs):

g\left({{{\mathbf{r}}_{i}^{\uparrow}},{{\mathbf{r}}_{j}^{\downarrow}}}\right)=\sum\limits_{\mu\nu}{P_{\mu\nu}^{\uparrow\downarrow}{\psi_{\mu}}\left({{{\mathbf{r}}_{i}^{\uparrow}}}\right){\psi_{\nu}}\left({{{\mathbf{r}}_{j}^{\downarrow}}}\right)},

(24)

where $\psi(\mathbf{r})$ are primitive or contracted GTOs. Notice that in Eq. 24, the expansion coefficients of the pairing functions are labeled with $P_{\mu\nu}^{\sigma\tau}$ , to highlight their similarity with the density matrix of Eq. 10. Depending on the symmetry of ${P_{\mu\nu}^{\uparrow\downarrow}}$ , the pairing function is dubbed as AGP with spin-singlet pairs (AGPs), AGP with spin-singlet pairs for triplet correlations (AGPu), and the combination of AGPs and AGPu [2020NAK2], which correspond to a symmetric, skew-symmetric, and general ${P_{\mu\nu}^{\uparrow\downarrow}}$ , respectively ¹¹1Any square matrix can be uniquely decomposed into a sum of symmetric and skew-symmetric matrices. The basis sets employed in this study are described in the main text. When the JAGP is expanded over $p$ = $N_{\rm el}/2$ molecular orbitals (i.e., only the occupied molecular orbitals are considered), the JAGP coincides with the Jastrow-Slater determinant (JSD) ansatz [2020NAK2]. In this way, we composed the JSD anstaz, where the SD part was taken from a DFT calculation. Indeed, the coefficients in the pairing functions (i.e., the variational parameters in the SD part) are obtained by CP2K and fixed during the VMC optimization step. Only the coefficients in the Jastrow factor are optimized at the VMC level.

The Jastrow term in this study is composed of two-body and three-body factors ( $J={J_{2}}{J_{3}}$ ) [2020NAK2]. We employed a spin-independent Jastrow parametrization in this study. The two-body factor is used to satisfy the electron–electron cusp conditions, and the three-/four-body factor is adopted to take the further electron–electron correlation into consideration. The one-body Jastrow factor, which is added to satisfy the electron–nucleus cusp condition, is not used in this study because effective core potentials remove the Coulomb singularity. The two-body Jastrow factor is defined by a long-range function as:

{J_{2}}\left({{{\mathbf{r}}_{1}},\ldots,{{\mathbf{r}}_{N}}}\right)=\exp\left({\sum\limits_{i<j}{v\left({\left|{{{\mathbf{r}}_{i}}-{{\mathbf{r}}_{j}}}\right|}\right)}}\right),

(25)

where $v\left(r\right)$ is a pade-type function:

v\left(r\right)=\frac{1}{2}r\cdot{\left({1+F\cdot r}\right)^{-1}}

(26)

and $F$ is a variational parameter. The three-body Jastrow factors are:

{J_{3}}\left({{{\mathbf{r}}_{1}},\ldots,{{\mathbf{r}}_{N}}}\right)=\exp\left({\sum\limits_{i<j}{{\Xi_{{\text{Jas}}}}\left({{{\mathbf{r}}_{i}},{{\mathbf{r}}_{j}}}\right)}}\right),

(27)

and

{\Xi_{{\text{Jas}}}}\left({{{\mathbf{r}}_{i}},{{\mathbf{r}}_{j}}}\right)=\sum\limits_{\mu,\nu}{\xi_{\mu,\nu}\psi_{\mu}^{{\text{Jas}}}\left({{{\mathbf{r}}_{i}}}\right)\psi_{\nu}^{{\text{Jas}}}\left({{{\mathbf{r}}_{j}}}\right)},

(28)

where $\xi_{\mu,\nu}$ are the variational parameters, and the indices $\mu$ and $\nu$ refer to different orbitals. The basis sets employed in this study are described in the main text. The Jastrow factors are optimized using the stochastic reconfiguration method.

A.3 Computing derivatives

The elements of the right-hand side tensor $\mathbf{B}$ of Eq. 13 can be efficiently computed from a VMC sampling:

\displaystyle B_{\mu i}^{\sigma}=-2\left\langle{E_{\rm L}(\mathbf{x})\left[(O_{\mu i}^{\sigma}(\mathbf{x})-\bar{O}_{\mu i}^{\sigma})\right]}\right\rangle\;,

(29)

where we made apparent the dependence of the local operators on the total electronic coordinate $\mathbf{x}$ , sampled by VMC, to distinguish them from constant values. In Eq. 29, $O_{\mu i}^{\sigma}(\mathbf{x})=\partial\ln\Psi(\mathbf{x})/\partial C_{\mu i}^{\sigma}$ , and $\bar{O}_{\mu i}^{\sigma}\sim\braket{O_{\mu i}^{\sigma}(\mathbf{x})}$ . We note that ${O}_{\mu i}^{\sigma}(\mathbf{x})$ can be efficiently computed in a VMC calculation using the adjoint algorithmic differentiation [Sorella2010_AAD], and the divergences of the generalized forces can be cured by reweighting methods [Attaccalite2008_AS, Pathak2008_PW_reweight]. It is extremely important that the variational force is evaluated in a covariance form of random variables to reduce its fluctuations [2005SOR, 2005UMR]. In practice, TurboRVB always computes the derivatives of $E$ with respect to the AGP matrix elements $P_{\mu\nu}^{\uparrow\downarrow}$ in Eq. 24, $\partial E/\partial P_{\mu\nu}^{\uparrow\downarrow}$ . In other words, this is the AO representation of the pairing function. Thus, the chain rule should be applied to obtain $\partial E/\partial C_{\mu i}^{\sigma}$ from $\partial E/\partial P_{\mu\nu}^{\uparrow\downarrow}$ . Molecular spin orbitals are defined as

\phi_{k}^{\sigma}(\mathbf{r})=\sum_{\mu}C_{\mu k}^{\sigma}\,\psi_{\mu}(\mathbf{r})\;.

(30)

Enforcing JSD $\leftrightarrow$ JAGP equivalence (idempotent geminal) via the occupied-orbital expansion,

g(\mathbf{r}_{i}^{\uparrow},\mathbf{r}_{j}^{\downarrow})=\sum_{k\in\mathrm{occ}}\phi_{k}^{\uparrow}(\mathbf{r}_{i}^{\uparrow})\,\phi_{k}^{\downarrow}(\mathbf{r}_{j}^{\downarrow})\;,

(31)

implies the geminal coefficients

P_{\mu\nu}^{\uparrow\downarrow}=\sum_{k\in\mathrm{occ}}C_{\mu k}^{\uparrow}\,C_{\nu k}^{\downarrow}\;.

(32)

Define $\tilde{B}_{\mu\nu}:=\partial E/\partial P_{\mu\nu}^{\uparrow\downarrow}$ (note: $\tilde{B}_{\mu\nu}=\tilde{B}_{\nu\mu}$ for AGPs). By the chain rule,

B_{\mu i}^{\sigma}:=\frac{\partial E}{\partial C_{\mu i}^{\sigma}}=\sum_{\lambda\nu}\frac{\partial E}{\partial P_{\lambda\nu}^{{\uparrow\downarrow}}}\frac{\partial P_{\lambda\nu}^{{\uparrow\downarrow}}}{\partial C_{\mu i}^{\sigma}}=2\sum_{\nu}\tilde{B}_{\mu\nu}C_{\nu i}^{\sigma}.

(33)

The derivatives of the energy with respect to the atomic positions, i.e., the nuclear forces, with the so-called space warp coordinate transformation (SWCT) [Umrigar1989_SWCT] are computed as [Sorella2010_AAD] :

	$\displaystyle{\bf F}_{\alpha}^{\rm VMC}=-\frac{\text{d}E}{\text{d}{\bf R}_{\alpha}}=$	$\displaystyle-$	$\displaystyle\Braket{\frac{\partial E_{\rm L}}{\partial{\bf R}_{\alpha}}+\sum\limits_{i}{{\omega}\left({\mathbf{R}}_{\alpha},{{{\mathbf{r}}_{i}}}\right)\frac{\partial E_{\rm L}}{{\partial{{\mathbf{r}}_{i}}}}}}$		(34)
		$\displaystyle-$	$\displaystyle 2\Braket{(E_{\rm L}-E)\left(\frac{\partial\log\Psi}{\partial{\bf R}_{\alpha}}+\sum\limits_{i}{\left[{{\omega}\left({\mathbf{R}}_{\alpha},{{{\mathbf{r}}_{i}}}\right)\frac{\partial\log\Psi}{{\partial{{\mathbf{r}}_{i}}}}+\frac{1}{2}\frac{\partial{\omega}\left({{{\mathbf{R}}_{\alpha},{\mathbf{r}}_{i}}}\right)}{{\partial{{\mathbf{r}}_{i}}}}}\right]}\right)},$		(35)

where

{\omega_{\alpha}}\left({\mathbf{r}}\right)=\frac{{\kappa\left({\left|{{\mathbf{r}}-{{\mathbf{R}}_{\alpha}}}\right|}\right)}}{{\sum\nolimits_{{\beta}=1}^{M}{\kappa\left({\left|{{\mathbf{r}}-{{\mathbf{R}}_{\beta}}}\right|}\right)}}},

(36)

with $\kappa\left(r\right)=1/{r^{4}}$ [Umrigar1989_SWCT]. In order to evaluate these differential expressions, 6 $N_{e}$ + 6 $N_{\rm a}$ components have to be evaluated, namely, $\left\langle{\cfrac{\partial{E_{L}}}{{\partial{{\mathbf{r}}_{i}}}}}\right\rangle$ , $\left\langle{\cfrac{\partial\log\Psi}{{\partial{{\mathbf{r}}_{i}}}}}\right\rangle$ , $\left\langle{\cfrac{\partial{E_{L}}}{{\partial{{\mathbf{R}}_{\alpha}}}}}\right\rangle$ , $\left\langle{\cfrac{\partial\log\Psi}{{\partial{{\mathbf{R}}_{\alpha}}}}}\right\rangle$ [Sorella2010_AAD]. These values are efficiently computed in TurboRVB,by using the automatic differentiation technique.

When computing the above derivatives, the so-called reweighting method is essential to mitigate the heavy-tailed distribution problem mentioned in the introduction of the main text. In the MCMC, the probability distribution can also differ from $\pi({\mathbf{x}})$ . Indeed, one can use an arbitrary probability distribution function $\pi^{\prime}({\mathbf{x}})=\Psi_{\text{G}}^{2}({\mathbf{x}})/\int{d{\mathbf{x}}\Psi_{\text{G}}^{2}({\mathbf{x}})}$ , and estimate a generic local observable $O({\mathbf{x}})$ either by using ${{\bar{O}}_{{\text{MCMC}}}}={\left\langle{O\left({\mathbf{x}}\right)}\right\rangle_{\pi\left({\mathbf{x}}\right)}}$ or:

{{\bar{O}}_{{\text{MCMC}}}}=\frac{{{{\left\langle{O\left({{\mathbf{x^{\prime}}}}\right)\mathcal{W}\left({{\mathbf{x^{\prime}}}}\right)}\right\rangle}_{\pi^{\prime}\left({{\mathbf{x^{\prime}}}}\right)}}}}{{{{\left\langle{\mathcal{W}\left({{\mathbf{x^{\prime}}}}\right)}\right\rangle}_{\pi^{\prime}\left({\mathbf{x}}\right)}}}}\approx\frac{{\sum\nolimits_{i=1}^{M^{\prime}}{O\left({{{{\mathbf{x^{\prime}}}}_{i}}}\right)\mathcal{W}\left({{{{\mathbf{x}}}^{\prime}_{i}}}\right)}}}{{\sum\nolimits_{i=1}^{M^{\prime}}{\mathcal{W}\left({{{{\mathbf{x}}}^{\prime}_{i}}}\right)}}},

(37)

where $\mathcal{W}\left({{{{\mathbf{x}}}}}\right)=|{\Psi}\left({\mathbf{x}}\right)/\Psi_{\text{G}}\left({\mathbf{x}}\right)|^{2}$ , and the points $\mathbf{x}^{\prime}_{i}$ are distributed according to $\pi^{\prime}$ . This reweighting scheme is very important when evaluating energy gradients with finite variances. TurboRVB implements the so-called Attaccalite and Sorella (AS) reweighting scheme [Attaccalite2008_AS] to suppress the divergences of derivatives in the vicinity of nodal surface.

A.4 Error propagation

Important for error propagation is the fact that the elements $B_{\mu i}^{\sigma}$ come with an error bar $\delta B_{\mu i}^{\sigma}$ , whereby these elements are also correlated with each other. In the following, it is most convenient to consider the tensor $\mathbf{B}$ as a vector with a single compound index $B_{\mu i}^{\sigma}=B_{(\mu i\sigma)}=B_{I}$ , and the elements of $\mathbf{A}$ in Eq. 13, $A_{\mu i,\nu j}^{\sigma\tau}=A_{(\mu i\sigma),(\nu j\tau)}=A_{I,J}$ , as forming a matrix of dimension $(N_{\text{AO}}\times N_{\text{MO}}\times N_{\text{spin}})\times(N_{\text{AO}}\times N_{\text{MO}}\times N_{\text{spin}})$ . Let’s denote the covariance matrix (tensor) of $X$ as $\operatorname{Cov}(\mathbf{X})$ . With the element-wise notation, Eq. 13 is written as:

$\displaystyle\sum_{\nu j\tau}A_{\mu i,\nu j}^{\sigma\tau}Z_{\nu j}^{\tau}$	$\displaystyle=-B_{\mu i}^{\sigma}$	(38)
$\displaystyle\sum_{\mu i\sigma}(A^{-1})_{\lambda k,\mu i}^{\eta\sigma}\sum_{\nu j\tau}A_{\mu i,\nu j}^{\sigma\tau}Z_{\nu j}^{\tau}$	$\displaystyle=-\sum_{\mu i\sigma}(A^{-1})_{\lambda k,\mu i}^{\eta\sigma}B_{\mu i}^{\sigma}$	(39)
$\displaystyle\sum_{\mu i\sigma}\sum_{\nu j\tau}(A^{-1})_{\lambda k,\mu i}^{\eta\sigma}A_{\mu i,\nu j}^{\sigma\tau}Z_{\nu j}^{\tau}$	$\displaystyle=-\sum_{\mu i\sigma}(A^{-1})_{\lambda k,\mu i}^{\eta\sigma}B_{\mu i}^{\sigma}$	(40)
$\displaystyle\sum_{\nu j\tau}\delta_{\nu\lambda}\delta_{jk}\delta_{\eta\tau}Z_{\nu j}^{\tau}$	$\displaystyle=-\sum_{\mu i\sigma}(A^{-1})_{\lambda k,\mu i}^{\eta\sigma}B_{\mu i}^{\sigma}$	(41)
$\displaystyle Z_{\lambda k}^{\eta}$	$\displaystyle=-\sum_{\mu i\sigma}(A^{-1})_{\lambda k,\mu i}^{\eta\sigma}B_{\mu i}^{\sigma}\;.$	(42)

In other words, we have

\displaystyle Z_{\mu i}^{\sigma}

\displaystyle=-\sum_{\nu j\tau}G_{\mu i,\nu j}^{\sigma\tau}B_{\nu j}^{\tau}

(43)

where $G$ is an analogue of the matrix inverse:

\displaystyle\sum_{\mu i\sigma}G_{\lambda k,\mu i}^{\eta\sigma}A_{\mu i,\nu j}^{\sigma\tau}=\delta_{\nu\lambda}\delta_{jk}\delta_{\eta\tau}\;,\quad\sum_{\mu i\sigma}A_{\lambda k,\mu i}^{\eta\sigma}G_{\mu i,\nu j}^{\sigma\tau}=\delta_{\nu\lambda}\delta_{jk}\delta_{\eta\tau}

(44)

Therefore, $\operatorname{Cov}(\mathbf{Z})$ reads:

\mathrm{Cov}(\mathbf{Z})_{\mu i,\nu j}^{\sigma\tau}=\sum_{\lambda k\eta,\delta l\zeta}G_{\mu i,\lambda k}^{\sigma\eta}\mathrm{Cov}(\mathbf{B})_{\lambda k,\delta l}^{\eta\zeta}G_{\delta l,\nu j}^{\zeta\tau}.

(45)

These Lagrange multipliers enter the formula for the force only linearly and they get contracted with deterministic quantities only; see Eq. 18. So, once we obtain $\operatorname{Cov}(\mathbf{Z})$ , it is trivial to propagate the error to the forces. Indeed, in Eq. 18, the force correction term, $F^{\rm{c}}$ , can be written as

F^{\rm{c}}=const.+\sum_{\mu i\sigma}R_{\mu i}^{\sigma}Z_{\mu i}^{\sigma}\;,

(46)

where $const.$ aggregates all terms that are not multiplied by $Z_{\mu i}^{\sigma}$ , and

R_{\mu i}^{\sigma}=\varepsilon_{i\sigma}\sum_{\nu}\frac{\partial S_{\mu\nu}}{\partial{R}}C_{\nu i\sigma}+\sum_{\nu}\frac{\partial F_{\mu\nu\sigma}}{\partial{R}}C_{\nu i\sigma}-\sum_{jk}\sum_{\lambda\nu\tau}I_{\mu i,jk}^{\sigma\tau}C_{\lambda j}^{\tau}\frac{\partial S_{\lambda\nu}}{\partial{R}}C_{\nu k}^{\tau}\;,

(47)

with $I_{\mu i,jk}^{\sigma\tau}$ containing the two-electron integrals entering in the definition of $W_{ij}^{\sigma}$ through the Fock matrix derivative with respect to the MO coefficients, i.e. Eq. 15. Thus, the variance of $F^{\rm{c}}$ is

\mathrm{Var}(F^{\rm{c}})=\sum_{\mu i\sigma}\sum_{\nu j\tau}R_{\mu i}^{\sigma}\mathrm{Cov}(\mathbf{Z})_{\mu i,\nu j}^{\sigma\tau}R_{\nu j}^{\tau}\;.

(48)

By plugging $\operatorname{Cov}(\mathbf{B})$ into the above equation, we obtain:

\mathrm{Var}(F^{\rm{c}})=\sum_{\mu i\sigma}\sum_{\lambda k\eta}\sum_{\delta l\zeta}\sum_{\nu j\tau}R_{\mu i}^{\sigma}G_{\mu i,\lambda k}^{\sigma\eta}\mathrm{Cov}(\mathbf{B})_{\lambda k,\delta l}^{\eta\zeta}G_{\delta l,\nu j}^{\zeta\tau}R_{\nu j}^{\tau}\;.

(49)

Dropping the element-wise notation, we get

\mathrm{Var}(F^{\rm{c}})=\vec{R}^{\text{T}}\mathbf{G}\mathrm{Cov}(\mathbf{B})\mathbf{G}^{\text{T}}\vec{R}\;.

(50)

The computation of Eq. 50 involves the contraction of matrices of dimension $(N_{\text{AO}}\times N_{\text{MO}}\times N_{\text{spin}})\times(N_{\text{AO}}\times N_{\text{MO}}\times N_{\text{spin}})$ , which results in an overall sixth power with respect to system size. This will limit the maximum system size for which we will be able to compute the error estimation on the forces. The computation of the $\mathbf{G}$ tensor itself also scales with the sixth power in system size. A possible way to reduce the computational cost is to perform a low-rank decomposition of the $\mathbf{B}$ tensor:

\mathbf{B}=\mathbf{b}\mathbf{b}^{\text{T}}

(51)

where $\mathbf{b}\in\mathcal{R}^{(N_{\text{AO}}\times N_{\text{MO}}\times N_{\text{spin}})\times N_{\text{b}}}$ , where $N_{\text{b}}$ is very small. Then, for each of the $N_{\text{b}}$ vectors, $\mathbf{b}_{k}$ , we have that $\mathbf{b}_{k}^{\text{T}}\mathbf{G}$ is the same as solving $\mathbf{A}\mathbf{Z}=\mathbf{B}$ , and we can then use a slightly modified version of the force evaluation subroutine to compute $\mathbf{Z}\vec{R}$ .

A.5 Computational details

The DFT calculations (including the LR calculations) and the VMC calculations in this work were performed on cluster machines operated by the National Institute for Materials Science (NIMS), on the Genkai supercomputer at Kyushu University, and on the Fugaku supercomputer at RIKEN. For the VMC optimization for Cl₂ dimer and cBN test cases reported in Section III, we adjusted the number of independent MCMC chains and the per-chain MCMC sample count so that the error bar of the per-iteration VMC energy estimate was 5.0 mHa. The wavefunction optimization was then run for 300 iterations. As described in the main text, only the Jastrow factor was optimized variationally. Denoting by $\vec{f}$ the vector of the parameter gradients $\equiv\{\cdots,-\cfrac{\partial E}{\partial p_{i}}\cdots,\}$ , where $p_{i}$ are the variational parameters, we monitored devmax $\equiv\max_{i}\,\bigl(|f_{i}|/\lvert\mathrm{std}\,f_{i}\rvert\bigr)$ and verified for all parameters converged in a statistical sense (devmax $\leq 4$ ). The learning-rate and regularization parameters were set to tpar=0.35 and parr=0.001, respectively. See Ref. 2020NAK2 for the implementation detail in TurboRVB. After optimization, we performed fresh MCMC simulations to compute the total energy, atomic forces, and the parameter gradients. The energies/forces and parameter gradients were evaluated in separate MCMC runs. For the energy and forces runs, the number of MCMC steps was adjusted so that the error bar of the energy estimate was 0.1 mHa; the resulting error bars on the force components were of comparable magnitude. For the parameter-gradients runs, after equilibration, the total pre-blocking MCMC sample count was set to $N_{\mathrm{sample}}=n_{\mathrm{chain}}\times n_{\mathrm{samples\_per\_chain}}=120\times 30{,}000=3{,}600{,}000.$ For the error bar estimations of the force corrections, we set the number of blocks to 50 per chain. For ethanol, malonaldehyde, and benzene calculations reported in Section IV, we adjusted the MCMC sample count such that the error bar of the per-iteration VMC energy estimate was 5.0 mHa and performed 100 iterations of VMC optimization. As in Section III, only the Jastrow factor was optimized and devmax was confirmed for all parameters. We used tpar=0.35 and parr=0.001. After optimization, energies, forces, and the parameter gradients were recomputed by MCMC in separate runs. For the energy and force runs, the MCMC steps was tuned so that the error bar of the energy estimate was 0.1 mHa, leading to force uncertainties of a similar magnitude. For the parameter-gradient runs, the total pre-blocking MCMC steps was set to $N_{\mathrm{sample}}=n_{\mathrm{chain}}\times n_{\mathrm{samples\_per\_chain}}=128\times 50{,}000=6{,}400{,}000.$ For the calculations in Section IV, we did not evaluate the error bar of the force-correction term, because the reported RMSE values were computed from the mean values only.

Appendix B Derivation of Lagrange multipliers equations

To arrive at the algebraic equations that determine the Lagrange multipliers, we need to explicitly evaluate the derivatives of the VMC Lagrangian with respect to the MO coefficients and project them onto the occupied and virtual orbital manifolds. We start by computing the derivatives appearing in Eq. 11, which we repeat here for convenience:

\displaystyle\frac{\partial L_{\rm VMC}}{\partial C_{\mu i}^{\sigma}}=\frac{\partial E_{\rm VMC}}{\partial C_{\mu i}^{\sigma}}+\sum_{\lambda k\tau}Z_{\lambda k}^{\tau}\sum_{\nu}\frac{\partial}{\partial C_{\mu i}^{\sigma}}(F_{\lambda\nu}^{\tau}C_{\nu k}^{\tau}-S_{\lambda\nu}C_{\nu k}^{\tau}\varepsilon_{k}^{\tau})-\sum_{kl\tau}W_{kl}^{\tau}\frac{\partial S_{kl}^{\tau}}{\partial C_{\mu i}^{\sigma}}\,.

(52)

The first term on the right-hand side is the derivative of the VMC energy with respect to the MO coefficients, which we simply lasbel as $B_{\mu i}^{\sigma}$ as in the main text. How these derivatives are obtained from the VMC calculation is discussed in Appendix A.3.

The second term in Eq. 52 corresponds to the derivative of the Roothaan-Hall equations, reading

\displaystyle\sum_{\lambda k\tau}Z_{\lambda k}^{\tau}\sum_{\nu}\frac{\partial}{\partial C_{\mu i}^{\sigma}}(F_{\lambda\nu}^{\tau}C_{\nu k}^{\tau}-S_{\lambda\nu}C_{\nu k}^{\tau}\varepsilon_{k}^{\tau})\,.

(53)

It is more convenient to work out the action of the derivative operator on the Fock and overlap components separately. By the chain rule, the first part is given by

\displaystyle\frac{\partial}{\partial C_{\mu i}^{\sigma}}F_{\lambda\nu}^{\tau}C_{\nu k}^{\tau}

\displaystyle=\frac{\partial F_{\lambda\nu}^{\tau}}{\partial C_{\mu i}^{\sigma}}C_{\nu k}^{\tau}+F_{\lambda\nu}^{\tau}\frac{\partial C_{\nu k}^{\tau}}{\partial C_{\mu i}^{\sigma}}\,,

(54)

where the derivative of the Fock matrix with respect to the MO coefficients reads

$\displaystyle\frac{\partial F_{\lambda\nu}^{\tau}}{\partial C_{\mu i}^{\sigma}}$	$\displaystyle=\frac{\partial}{\partial C_{\mu i}^{\sigma}}\left(h_{\lambda\nu}+\sum_{\gamma\delta\eta}P_{\gamma\delta}^{\eta}[(\lambda\nu\|\gamma\delta)-c_{x}\delta_{\tau\eta}(\lambda\gamma\|\nu\delta)]+V_{\lambda\nu}^{\text{xc},\tau}\right)$	(55)
	$\displaystyle=\sum_{\gamma\delta\eta}\frac{\partial P_{\gamma\delta}^{\eta}}{\partial C_{\mu i}^{\sigma}}[(\lambda\nu\|\gamma\delta)-c_{x}\delta_{\tau\eta}(\lambda\gamma\|\nu\delta)]+\frac{\partial V_{\lambda\nu}^{\text{xc},\tau}}{\partial C_{\mu i}^{\sigma}}$
	$\displaystyle=\sum_{\gamma\delta\eta}\left(\delta_{\sigma\eta}[\delta_{\mu\gamma}C_{\delta i}^{\eta}+\delta_{\mu\delta}C_{\gamma i}^{\eta}]\right)\big[(\lambda\nu\|\gamma\delta)-c_{x}\delta_{\tau\eta}(\lambda\gamma\|\nu\delta)]\big]+2f_{\lambda\nu\tau,\mu i\sigma}^{\rm xc}$
	$\displaystyle=\sum_{\gamma\delta\eta}\delta_{\sigma\eta}\delta_{\mu\gamma}C_{\delta i}^{\eta}(\lambda\nu\|\gamma\delta)+\sum_{\gamma\delta\eta}\delta_{\sigma\eta}\delta_{\mu\delta}C_{\gamma i}^{\eta}(\lambda\nu\|\gamma\delta)$
	$\displaystyle-\sum_{\gamma\delta\eta}c_{x}\delta_{\sigma\eta}\delta_{\tau\eta}\delta_{\mu\gamma}C_{\delta i}^{\eta}(\lambda\gamma\|\nu\delta)-\sum_{\gamma\delta\eta}c_{x}\delta_{\sigma\eta}\delta_{\tau\eta}\delta_{\mu\delta}C_{\gamma i}^{\eta}(\lambda\gamma\|\nu\delta)+2f_{\lambda\nu\tau,\mu i\sigma}^{\rm xc}$
	$\displaystyle=\sum_{\delta}C_{\delta i}^{\sigma}(\lambda\nu\|\mu\delta)+\sum_{\gamma}C_{\gamma i}^{\sigma}(\lambda\nu\|\gamma\mu)$
	$\displaystyle-\sum_{\delta}c_{x}\delta_{\tau\sigma}C_{\delta i}^{\sigma}(\lambda\mu\|\nu\delta)-\sum_{\gamma}c_{x}\delta_{\tau\sigma}C_{\gamma i}^{\sigma}(\lambda\gamma\|\nu\mu)+2f_{\lambda\nu\tau,\mu i\sigma}^{\rm xc}$
	$\displaystyle=2(\lambda\nu\|\mu i_{\sigma})-c_{x}\delta_{\tau\sigma}[(\lambda\mu\|\nu i_{\sigma})+(\lambda i_{\sigma}\|\nu\mu)]+2f_{\lambda\nu\tau,\mu i\sigma}^{\rm xc}\;.$

Here, we have introduced the exchange-correlation kernel $f^{\text{xc}}_{\lambda\nu\tau,\mu i\sigma}$ , defined as

\displaystyle f_{\lambda\nu\tau,\mu i\sigma}^{\rm xc}=\int\int\frac{\delta^{2}E^{\text{xc}}}{\delta\rho_{\tau}(\mathbf{r})\delta\rho_{\sigma}(\mathbf{r}^{\prime})}\Omega_{\lambda\nu}(\mathbf{r})\Omega_{\mu i_{\sigma}}(\mathbf{r}^{\prime})\text{d}\mathbf{r}\text{d}\mathbf{r}^{\prime}\;,

(56)

where $\Omega$ denotes an orbital-pair function, $\Omega_{\lambda\nu}(\mathbf{r})=\psi_{\lambda}(\mathbf{r})\psi_{\nu}(\mathbf{r})$ (for AOs) and $\Omega_{\mu i_{\sigma}}(\mathbf{r})=\phi_{\mu\sigma}(\mathbf{r})\phi_{i\sigma}(\mathbf{r})$ (for MOs). The second term of Eq. 54 is trivial:

F_{\lambda\nu}^{\tau}\frac{\partial C_{\nu k}^{\tau}}{\partial C_{\mu i}^{\sigma}}=F_{\lambda\nu}^{\tau}\delta_{\tau\sigma}\delta_{\nu\mu}\delta_{ik}\;.

(57)

Inserting Eqs. 55 and 57 into Eq. 54 results in

	$\displaystyle\frac{\partial F_{\lambda\nu}^{\tau}}{\partial C_{\mu i}^{\sigma}}C_{\nu k}^{\tau}+F_{\lambda\nu}^{\tau}\frac{\partial C_{\nu k}^{\tau}}{\partial C_{\mu i}^{\sigma}}$	$\displaystyle=\Bigg[2(\lambda\nu\|\mu i_{\sigma})-c_{x}\delta_{\tau\sigma}[(\lambda\mu\|\nu i_{\sigma})+(\lambda i_{\sigma}\|\nu\mu)]+2f_{\lambda\nu\tau,\mu i\sigma}^{\rm xc}\Bigg]C_{\nu k}^{\tau}$		(58)
		$\displaystyle+F_{\lambda\nu}^{\tau}\delta_{\tau\sigma}\delta_{\nu\mu}\delta_{ik}\;.$		(58)

The second term of Eq. 53 is given by:

	$\displaystyle\frac{\partial}{\partial C_{\mu i}^{\sigma}}S_{\lambda\nu}C_{\nu k}^{\tau}\varepsilon_{k}^{\tau}$	$\displaystyle=S_{\lambda\nu}\frac{\partial C_{\nu k}^{\tau}}{\partial C_{\mu i}^{\sigma}}\varepsilon_{k}^{\tau}+S_{\lambda\nu}C_{\nu k}^{\tau}\frac{\partial\varepsilon_{k}^{\tau}}{\partial C_{\mu i}^{\sigma}}$		(59)
		$\displaystyle=\delta_{\tau\sigma}\delta_{\mu\nu}\delta_{ik}S_{\lambda\nu}\varepsilon_{k}^{\tau}+S_{\lambda\nu}C_{\nu k}^{\tau}\frac{\partial\varepsilon_{k}^{\tau}}{\partial C_{\mu i}^{\sigma}}\;.$		(59)

Plugging Eqs. 58 and 59 into Eq. 53 finally results in:

$\displaystyle\sum_{\lambda k\tau}Z_{\lambda k}^{\tau}\sum_{\nu}\frac{\partial(F_{\lambda\nu}^{\tau}C_{\nu k}^{\tau}-S_{\lambda\nu}C_{\nu k}^{\tau}\varepsilon_{k}^{\tau})}{\partial C_{\mu i}^{\sigma}}$	$\displaystyle=\sum_{\lambda k\tau}Z_{\lambda k}^{\tau}\sum_{\nu}\Big[C_{\nu k}^{\tau}\left(2(\lambda\nu\|\mu i_{\sigma})-c_{x}\delta_{\tau\sigma}[(\lambda\mu\|\nu i_{\sigma})+(\lambda i_{\sigma}\|\nu\mu)]+2f_{\lambda\nu\tau,\mu i\sigma}^{\rm xc}\right)$	(60)
	$\displaystyle+\delta_{\tau\sigma}\delta_{\nu\mu}\delta_{ik}F_{\lambda\nu}^{\tau}-\delta_{\tau\sigma}\delta_{\mu\nu}\delta_{ik}S_{\lambda\nu}\varepsilon_{k}^{\tau}-S_{\lambda\nu}C_{\nu k}^{\tau}\frac{\partial\varepsilon_{k}^{\tau}}{\partial C_{\mu i}^{\sigma}}\Big]$
	$\displaystyle=\sum_{\lambda\nu k\tau}Z_{\lambda k}^{\tau}C_{\nu k}^{\tau}\left(2(\lambda\nu\|\mu i_{\sigma})-c_{x}\delta_{\tau\sigma}[(\lambda\mu\|\nu i_{\sigma})+(\lambda i_{\sigma}\|\nu\mu)]+2f_{\lambda\nu\tau,\mu i\sigma}^{\rm xc}\right)$
	$\displaystyle+\sum_{\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\mu}^{\sigma}-S_{\lambda\mu}\varepsilon_{i}^{\sigma})-\sum_{k\tau}\underbrace{\sum_{\lambda\nu}Z_{\lambda k}^{\tau}S_{\lambda\nu}C_{\nu k}^{\tau}}_{=0}\frac{\partial\varepsilon_{k}^{\tau}}{\partial C_{\mu i}^{\sigma}}$
	$\displaystyle=\sum_{\lambda\nu\tau}\tilde{P}_{\lambda\nu}^{\tau}\left(2(\lambda\nu\|\mu i_{\sigma})-c_{x}\delta_{\tau\sigma}[(\lambda\mu\|\nu i_{\sigma})+(\lambda i_{\sigma}\|\nu\mu)]+2f_{\lambda\nu\tau,\mu i\sigma}^{\rm xc}\right)$
	$\displaystyle+\sum_{\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\mu}^{\sigma}-S_{\lambda\mu}\varepsilon_{i}^{\sigma})$
	$\displaystyle=H_{\mu i}^{\sigma}\big[\tilde{\mathbf{P}}\big]+\sum_{\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\mu}^{\sigma}-S_{\lambda\mu}\varepsilon_{i}^{\sigma})\;,$

where we have identified the intermediate tensor $\mathbf{H}\big[\tilde{\mathbf{P}}\big]$ as the derivative of the Fock matrix with respect to the MO coefficients (see Eq. 15 in the main text), traced with the asymmetric response spin-density tensor $\tilde{\mathbf{P}}$ . Notice that contrary to the main text, here we derive all the equations with the asymmetric response spin-density tensor, though, in practice, we use a symmetrized version as it is numerically more stable.

At last, the third term in Eq. 52 is

$\displaystyle\sum_{kl\tau}W_{kl}^{\tau}\frac{\partial S_{kl}^{\tau}}{\partial C_{\mu i}^{\sigma}}$	$\displaystyle=\sum_{kl\tau}W_{kl}^{\tau}\left(\frac{\partial}{\partial C_{\mu i}^{\sigma}}\sum_{\lambda\nu}C_{\lambda k}^{\tau}S_{\lambda\nu}C_{\nu l}^{\tau}\right)$	(61)
	$\displaystyle=\sum_{kl\tau}W_{kl}^{\tau}\left(\sum_{\lambda\nu}\delta_{\lambda\mu}\delta_{ik}\delta_{\tau\sigma}S_{\lambda\nu}C_{\nu l}^{\tau}+\sum_{\lambda\nu}\delta_{\mu\nu}\delta_{il}\delta_{\tau\sigma}S_{\lambda\nu}C_{\lambda k}^{\tau}\right)$
	$\displaystyle=\sum_{kl\tau}\delta_{ik}\delta_{\tau\sigma}W_{kl}^{\tau}\sum_{\nu}S_{\mu\nu}C_{\nu l}^{\tau}+\sum_{kl\tau}\delta_{il}\delta_{\tau\sigma}W_{kl}^{\tau}\sum_{\lambda}S_{\lambda\mu}C_{\lambda k}^{\tau}$
	$\displaystyle=\sum_{l\nu}W_{il}^{\sigma}S_{\mu\nu}C_{\nu l}^{\sigma}+\sum_{k\lambda}W_{ki}^{\sigma}S_{\lambda\nu}C_{\lambda k}^{\sigma}$
	$\displaystyle=2\sum_{j\nu}W_{ij}^{\sigma}S_{\mu\nu}C_{\nu j}^{\sigma}\;,$

where in the last step we simply relabeled the indices and used the fact that both $\mathbf{W}$ and $\mathbf{S}$ are symmetric.

To determine the Lagrange multipliers, we equate Eq. 52 to zero and project it onto the occupied and virtual MO manifolds (see Eqs. 12b and 12a of the main text), solving for $\mathbf{W}$ and $\mathbf{Z}$ , respectively. Projection onto the virtual manifold results in a linear system of equations, known as the Z-vector equations, while $\mathbf{W}$ drops out. Let us start by explicitly writing out Eq. 12a:

\displaystyle\sum_{\mu}\frac{\partial L_{\rm VMC}}{\partial C_{\mu i}^{\sigma}}Q_{\mu\nu}^{\sigma}=\underbrace{\sum_{\mu}B_{\mu i}^{\sigma}Q_{\mu\nu}^{\sigma}}_{\text{term I}}+\underbrace{\sum_{\mu}\left[H_{\mu i}^{\sigma}[\tilde{\mathbf{P}}]+\sum_{\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\mu}^{\sigma}-S_{\lambda\mu}\varepsilon_{i}^{\sigma})\right]Q_{\mu\nu}^{\sigma}}_{\text{term II}}-\underbrace{\sum_{\mu}2\sum_{j\lambda}W_{ij}^{\sigma}S_{\mu\lambda}C_{\lambda j}^{\sigma}Q_{\mu\nu}^{\sigma}}_{\text{term III}}=0\;,

(62)

where we recall here the definition of $Q_{\mu\nu}^{\sigma}$ :

Q_{\mu\nu}^{\sigma}=\delta_{\mu\nu}-\sum_{\lambda k}C_{\mu k}^{\sigma}C_{\lambda k}^{\sigma}S_{\lambda\nu}\;.

(63)

Term I in Eq. 52 remains untouched. Term III vanishes:

$\displaystyle\sum_{\mu}2\sum_{j\lambda}W_{ij}^{\sigma}S_{\mu\lambda}C_{\lambda j}^{\sigma}Q_{\mu\nu}^{\sigma}$	$\displaystyle=2\sum_{\mu j\lambda}W_{ij}^{\sigma}S_{\mu\lambda}C_{\lambda j}^{\sigma}\delta_{\mu\nu}-2\sum_{\mu j\lambda}\sum_{\delta k}W_{ij}^{\sigma}S_{\mu\lambda}C_{\lambda j}^{\sigma}C_{\mu k}^{\sigma}C_{\delta k}^{\sigma}S_{\delta\nu}$	(64)
	$\displaystyle=2\sum_{j\lambda}W_{ij}^{\sigma}S_{\nu\lambda}C_{\lambda j}^{\sigma}-2\sum_{jk\delta}W_{ij}^{\sigma}\underbrace{S_{kj}^{\sigma}}_{=\delta_{kj}}C_{\delta k}^{\sigma}S_{\delta\nu}$
	$\displaystyle=2\sum_{j\lambda}W_{ij}^{\sigma}S_{\nu\lambda}C_{\lambda j}^{\sigma}-2\sum_{j\delta}W_{ij}^{\sigma}S_{\delta\nu}C_{\delta j}^{\sigma}$
	$\displaystyle=0\;,$

where we have used the fact that the overlap matrix is symmetric. Finally, we look at term II, which is the most complicated one:

\displaystyle\sum_{\mu}\left[H_{\mu i}^{\sigma}\big[\tilde{\mathbf{P}}\big]+\sum_{\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\mu}^{\sigma}-S_{\lambda\mu}\varepsilon_{i}^{\sigma})\right]Q_{\mu\nu}^{\sigma}

\displaystyle=\underbrace{\sum_{\mu}H_{\mu i}^{\sigma}\big[\tilde{\mathbf{P}}\big]Q_{\mu\nu}^{\sigma}}_{\text{term II.A}}+\underbrace{\sum_{\mu\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\mu}^{\sigma}-S_{\lambda\mu}\varepsilon_{i}^{\sigma})Q_{\mu\nu}^{\sigma}}_{\text{term II.B}}\;.

(65)

Term II.A does not simplify, while term II.B simplifies owing to the Roothaan-Hall equation, which vanishes when the Lagrangian is stationary with respect to the MO coefficients:

$\displaystyle\sum_{\mu\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\mu}^{\sigma}-S_{\lambda\mu}\varepsilon_{i}^{\sigma})Q_{\mu\nu}^{\sigma}$	$\displaystyle=\sum_{\mu\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\mu}^{\sigma}-S_{\lambda\mu}\varepsilon_{i}^{\sigma})\delta_{\mu\nu}-\sum_{\mu\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\mu}^{\sigma}-S_{\lambda\mu}\varepsilon_{i}^{\sigma})\sum_{\delta k}C_{\mu k}^{\sigma}C_{\delta k}^{\sigma}S_{\delta\nu}$	(66)
	$\displaystyle=\sum_{\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\nu}^{\sigma}-S_{\lambda\nu}\varepsilon_{i}^{\sigma})-\sum_{\lambda\delta k}Z_{\lambda i}^{\sigma}\sum_{\mu}(F_{\lambda\mu}^{\sigma}C_{\mu k}^{\sigma}-S_{\lambda\mu}C_{\mu k}^{\sigma}\varepsilon_{i}^{\sigma})C_{\delta k}^{\sigma}S_{\delta\nu}$
	$\displaystyle=\sum_{\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\nu}^{\sigma}-S_{\lambda\nu}\varepsilon_{i}^{\sigma})-\sum_{\delta k}\sum_{\lambda\mu}Z_{\lambda i}^{\sigma}(S_{\lambda\mu}C_{\mu k}^{\sigma}\varepsilon_{k}^{\sigma}-S_{\lambda\mu}C_{\mu k}^{\sigma}\varepsilon_{i}^{\sigma})C_{\delta k}^{\sigma}S_{\delta\nu}$
	$\displaystyle=\sum_{\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\nu}^{\sigma}-S_{\lambda\nu}\varepsilon_{i}^{\sigma})-\sum_{\delta k}\bigg[\underbrace{\sum_{\lambda\mu}Z_{\lambda i}^{\sigma}S_{\lambda\mu}C_{\mu k}^{\sigma}}_{=0}\bigg]\varepsilon_{k}^{\sigma}C_{\delta k}^{\sigma}S_{\delta\nu}$
	$\displaystyle-\sum_{\delta k}\bigg[\underbrace{\sum_{\lambda\mu}Z_{\lambda i}^{\sigma}S_{\lambda\mu}C_{\mu k}^{\sigma}}_{=0}\bigg]\varepsilon_{i}^{\sigma}C_{\delta k}^{\sigma}S_{\delta\nu}$
	$\displaystyle=\sum_{\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\nu}^{\sigma}-S_{\lambda\nu}\varepsilon_{i}^{\sigma})\;.$

Putting together Eq. 65 and Eq. 66, results in

\displaystyle\sum_{\mu}H_{\mu i}^{\sigma}\big[\tilde{\mathbf{P}}\big]Q_{\mu\nu}^{\sigma}+\sum_{\mu\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\mu}^{\sigma}-S_{\lambda\mu}\varepsilon_{i}^{\sigma})Q_{\mu\nu}^{\sigma}=\sum_{\mu}H_{\mu i}^{\sigma}\big[\tilde{\mathbf{P}}\big]Q_{\mu\nu}^{\sigma}+\sum_{\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\nu}^{\sigma}-S_{\lambda\nu}\varepsilon_{i}^{\sigma})\;.

(67)

Finally, inserting Eqs. 64 and 67 into Eq. 52 yields:

$\displaystyle\sum_{\mu}\frac{\partial L_{\rm VMC}}{\partial C_{\mu i}^{\sigma}}Q_{\mu\nu}^{\sigma}=\sum_{\mu}B_{\mu i}^{\sigma}Q_{\mu\nu}^{\sigma}+\sum_{\mu}$	$\displaystyle H_{\mu i}^{\sigma}\big[\tilde{\mathbf{P}}\big]Q_{\mu\nu}^{\sigma}+\sum_{\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\nu}^{\sigma}-S_{\lambda\nu}\varepsilon_{i}^{\sigma})=0$	(68)
	$\displaystyle\Longleftrightarrow$
$\displaystyle\sum_{\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\nu}^{\sigma}-S_{\lambda\nu}\varepsilon_{i}^{\sigma})+\sum_{\mu}$	$\displaystyle H_{\mu i}^{\sigma}\big[\tilde{\mathbf{P}}\big]Q_{\mu\nu}^{\sigma}=-\sum_{\mu}B_{\mu i}^{\sigma}Q_{\mu\nu}^{\sigma}\;,$

which corresponds to Eq. 13 of the main text.

To determine the $\mathbf{W}$ Lagrange multipliers, we project the derivative of the Lagrangian onto the occupied orbitals and equate to zero, cf. Eq. 12b. This equation is identical to Eq. 62, the only difference is the projector:

\displaystyle\sum_{\mu}\frac{\partial L_{\rm VMC}}{\partial C_{\mu i}^{\sigma}}C_{\mu j}^{\sigma}=\sum_{\mu}B_{\mu i}^{\sigma}C_{\mu j}^{\sigma}+\sum_{\mu}\left[H_{\mu i}^{\sigma}[\tilde{\mathbf{P}}]+\sum_{\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\mu}^{\sigma}-S_{\lambda\mu}\varepsilon_{i}^{\sigma})\right]C_{\mu j}^{\sigma}-\sum_{\mu}2\sum_{j\lambda}W_{ij}^{\sigma}S_{\mu\lambda}C_{\lambda j}^{\sigma}C_{\mu j}^{\sigma}=0\;.

(69)

No simplifications are possible for the first term, while the third term is simply

\sum_{\mu}2\sum_{j\lambda}W_{ij}^{\sigma}S_{\mu\lambda}C_{\lambda j}^{\sigma}C_{\mu j}^{\sigma}=2\sum_{k}W_{ik}^{\sigma}S_{kj}^{\sigma}=2W_{ij}^{\sigma}\;.

(70)

The middle term simplifies to

$\displaystyle\sum_{\mu}\left[H_{\mu i}^{\sigma}[\tilde{\mathbf{P}}]+\sum_{\lambda}Z_{\lambda i}^{\sigma}(F_{\lambda\mu}^{\sigma}-S_{\lambda\mu}\varepsilon_{i}^{\sigma})\right]C_{\mu j}^{\sigma}$	$\displaystyle=H_{ji}^{\sigma}[\tilde{\mathbf{P}}]+\sum_{\mu\lambda}Z_{\lambda i}^{\sigma}F_{\lambda\mu}^{\sigma}C_{\mu j}^{\sigma}-\underbrace{\sum_{\mu\lambda}Z_{\lambda i}^{\sigma}S_{\lambda\mu}C_{\mu j}^{\sigma}}_{=0}\varepsilon_{i}^{\sigma}$	(71)
	$\displaystyle=H_{ij}^{\sigma}[\tilde{\mathbf{P}}]+\underbrace{\sum_{\mu\lambda}Z_{\lambda i}^{\sigma}S_{\lambda\mu}C_{\mu j}^{\sigma}}_{=0}\varepsilon_{j}^{\sigma}$	(72)
	$\displaystyle=H_{ij}^{\sigma}[\tilde{\mathbf{P}}]\;,$	(73)

using the symmetry of $\mathbf{H}$ and Eq. 7. With these results, we can rearrange Eq. 69 to solve for $\mathbf{Z}$ , yielding:

W_{ij}^{\sigma}=\frac{1}{2}H_{ij}^{\sigma}[\tilde{\mathbf{P}}]+\frac{1}{2}\sum_{\mu}B_{\mu i}^{\sigma}C_{\mu j}^{\sigma}\;,

(74)

that is reported as Eq. 17 in the main text.

$\displaystyle\frac{\partial F_{\lambda\nu}^{\tau}}{\partial C_{\mu i}^{\sigma}}$	$\displaystyle=\frac{\partial}{\partial C_{\mu i}^{\sigma}}\left(h_{\lambda\nu}+\sum_{\gamma\delta\eta}P_{\gamma\delta}^{\eta}[(\lambda\nu\|\gamma\delta)-c_{x}\delta_{\tau\eta}(\lambda\gamma\|\nu\delta)]+V_{\lambda\nu}^{\text{xc},\tau}\right)$	(55)
	$\displaystyle=\sum_{\gamma\delta\eta}\frac{\partial P_{\gamma\delta}^{\eta}}{\partial C_{\mu i}^{\sigma}}[(\lambda\nu\|\gamma\delta)-c_{x}\delta_{\tau\eta}(\lambda\gamma\|\nu\delta)]+\frac{\partial V_{\lambda\nu}^{\text{xc},\tau}}{\partial C_{\mu i}^{\sigma}}$
	$\displaystyle=\sum_{\gamma\delta\eta}\left(\delta_{\sigma\eta}[\delta_{\mu\gamma}C_{\delta i}^{\eta}+\delta_{\mu\delta}C_{\gamma i}^{\eta}]\right)\big[(\lambda\nu\|\gamma\delta)-c_{x}\delta_{\tau\eta}(\lambda\gamma\|\nu\delta)]\big]+2f_{\lambda\nu\tau,\mu i\sigma}^{\rm xc}$
	$\displaystyle=\sum_{\gamma\delta\eta}\delta_{\sigma\eta}\delta_{\mu\gamma}C_{\delta i}^{\eta}(\lambda\nu\|\gamma\delta)+\sum_{\gamma\delta\eta}\delta_{\sigma\eta}\delta_{\mu\delta}C_{\gamma i}^{\eta}(\lambda\nu\|\gamma\delta)$
	$\displaystyle-\sum_{\gamma\delta\eta}c_{x}\delta_{\sigma\eta}\delta_{\tau\eta}\delta_{\mu\gamma}C_{\delta i}^{\eta}(\lambda\gamma\|\nu\delta)-\sum_{\gamma\delta\eta}c_{x}\delta_{\sigma\eta}\delta_{\tau\eta}\delta_{\mu\delta}C_{\gamma i}^{\eta}(\lambda\gamma\|\nu\delta)+2f_{\lambda\nu\tau,\mu i\sigma}^{\rm xc}$
	$\displaystyle=\sum_{\delta}C_{\delta i}^{\sigma}(\lambda\nu\|\mu\delta)+\sum_{\gamma}C_{\gamma i}^{\sigma}(\lambda\nu\|\gamma\mu)$
	$\displaystyle-\sum_{\delta}c_{x}\delta_{\tau\sigma}C_{\delta i}^{\sigma}(\lambda\mu\|\nu\delta)-\sum_{\gamma}c_{x}\delta_{\tau\sigma}C_{\gamma i}^{\sigma}(\lambda\gamma\|\nu\mu)+2f_{\lambda\nu\tau,\mu i\sigma}^{\rm xc}$
	$\displaystyle=2(\lambda\nu\|\mu i_{\sigma})-c_{x}\delta_{\tau\sigma}[(\lambda\mu\|\nu i_{\sigma})+(\lambda i_{\sigma}\|\nu\mu)]+2f_{\lambda\nu\tau,\mu i\sigma}^{\rm xc}\;.$