Synthesizing Grid Data with Cyber Resilience and Privacy Guarantees

Shengyang Wu and Vladimir Dvorkin Shengyang Wu ([email protected]) and Vladimir Dvorkin ([email protected]) are with the Department of Electrical Engineering and Computer Science, University of Michigan, MI 48109, USA.

Abstract

Differential privacy (DP) provides a principled approach to synthesizing data (e.g., loads) from real-world power systems while limiting the exposure of sensitive information. However, adversaries may exploit synthetic data to calibrate cyberattacks on the source grids. To control these risks, we propose new DP algorithms for synthesizing data that provide the source grids with both cyber resilience and privacy guarantees. The algorithms incorporate both normal operation and attack optimization models to balance the fidelity of synthesized data and cyber resilience. The resulting post-processing optimization is reformulated as a robust optimization problem, which is compatible with the exponential mechanism of DP to moderate its computational burden.

I Introduction

Optimal power flow (OPF) analysis in power systems requires realistic grid models with accurate network, generation, and load parameters—data that is difficult to source from real-world grids due to privacy and (cyber-)security concerns. While the lack of such models has inspired the development of artificial grids [1, 2], a more principled approach leverages the theory of differential privacy (DP) [3] to release grid models directly from real-world systems.

The DP theory asserts that it is impossible—up to prescribed privacy parameters—to infer the original parameters from their DP release. Such strong privacy guarantees originate from Laplacian perturbations [4] of real grid parameters, followed by post-processing optimization of the perturbed parameters to restore their modeling fidelity to the source grid, e.g., in terms of similarity of the OPF outcomes [5, 6, 7]. The DP theory also lies at the core of modern privacy-preserving OPF solvers [8, 9, 10], the release of aggregated grid statistics [11], and related grid information [12].

However, the privacy guarantees alone may not suffice to release grid parameters, as cybersecurity risks associated with such releases remain largely unexplored. Possible cyber attacks include false data injection, which subtly alters state estimation results [13], line outage masking, which disconnects a transmission line and misguides a control center to seek outage elsewhere [14], and load redistribution, which manipulates demand measurements to increase OPF cost and constraint violation [15]. The latter is of main interest to this work. Executing such attacks requires some grid knowledge [16], which is traditionally difficult to obtain. However, the availability of synthetic grid data may unintentionally inform adversaries and help them calibrate the attack.

Contribution: Recognizing the risks that synthetic grid parameters may inform cyber adversaries, we develop new DP algorithms that simultaneously guarantee cyber resilience and privacy for the source power grids. Our algorithms build on [5, 6, 7] and leverage the Laplace mechanism and post-processing optimization to tune synthetic data while anticipating cyber risks through embedded attack optimization.

The contributions of this paper are summarized as follows:

1.

We formulate a Cyber Resilient Obfuscation (CRO) algorithm, an optimization-based algorithm to release electric load data with a guarantee to preserve the privacy of the original data and ensure the resilience of the source grid to load redistribution attacks. The algorithm post-processes synthetic loads by balancing their fidelity with the potential damage to the grid.
2.

The underlying post-processing optimization is an intractable trilevel problem, which is reduced to a tractable yet more conservative bilevel problem. We achieve this by exploring the connections between robust and bilevel optimization, in the spirit of [17].
3.

To further improve computational tractability of the algorithm, we provide the extension of CRO, termed CRO-Exp, which uses the exponential mechanism of DP to identify only the most important constraints for post-processing optimization of synthetic loads.

We next provide preliminaries on OPF and DP theory. Sec. III explains the risks of cyberattacks, and Sec. IV introduces the algorithms to mitigate them. Sec. V presents simulations, and Sec. VI concludes. Proofs are relegated to the appendix.

Notation: lower- (upper-) case boldface letters denote column vectors (matrices). Scalar $a_{i}$ is the $i^{\text{th}}$ element of vector $\mathbf{a}$ . Vectors $\mathbf{0}$ and $\mathbf{1}$ are the all-zero and all-one vectors; $\top$ stands for transposition, and $\mathbf{x}^{\star}$ is the optimal value of $\mathbf{x}$ .

II Preliminaries

II-A DC Optimal Power Flow (OPF) Problem

For a given load vector $\mathbf{d}$ , the DC OPF problem seeks the least-cost generation dispatch in high-voltage grids that satisfies the loads and grid limits. Consider OPF as a parametric linear program:


$\displaystyle C_{\text{opf}}(\mathbf{d})=\underset{{\mathbf{p},\mathbf{v}% \geqslant\mathbf{0}}}{\text{minimize}}\quad$	$\displaystyle\mathbf{q}^{\top}\mathbf{p}+\bm{\psi}^{\top}\mathbf{v}$	(1a)
subject to	$\displaystyle{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0% }\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\underline{\mathbf{p}}}% \leqslant\mathbf{p}\leqslant\overline{\mathbf{p}}$	(1b)
	$\displaystyle\bm{1}^{\top}(\mathbf{p}-\mathbf{d})=0$	(1c)
	$\displaystyle\|\mathbf{F}(\mathbf{p}-\mathbf{d})\|\leqslant\overline{\mathbf{f}}% +\mathbf{v}$	(1d)

where decision variables include generator dispatch $\mathbf{p}$ , bounded by dispatch range $[\underline{\mathbf{p}},\overline{\mathbf{p}}]$ , and power flow constraint violations $\mathbf{v}$ , penalized by $\bm{\psi}$ . The matrix $\mathbf{F}$ of power transfer distribution factors is used to map net power injections $(\mathbf{p}-\mathbf{d})$ to power flows as $\mathbf{F}(\mathbf{p}-\mathbf{d})$ . Constraint (1c) defines system-wide power balance between dispatched generation and loads. The power flows in transmission lines are capped by line capacity $\overline{\mathbf{f}}$ using constraint (1d). In highly loaded condition, these constraints can be temporally violated by $\mathbf{v}\geqslant\mathbf{0}$ . As transmission constraint violations are not desired, they are penalized with a large parameter $|\bm{\psi}|\gg|{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{% 0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}\mathbf{q}}|$ .

We write the linear OPF problem (1) in a compact form


$\displaystyle C_{\text{opf}}(\mathbf{d})=\underset{{\mathbf{x}}}{\text{% minimize}}\quad$	$\displaystyle\mathbf{c}^{\top}\mathbf{x}$	(2a)
subject to	$\displaystyle\mathbf{a}_{k}^{\top}\mathbf{x}+\mathbf{b}_{k}^{\top}\mathbf{d}+e% _{k}\leqslant\mathbf{0},$
	$\displaystyle\quad\quad\quad\!\;\forall k=1,\dots,K,$	(2b)

where $\mathbf{x}=[\mathbf{p}^{\top}\;\mathbf{v}^{\top}]^{\top}$ , $\mathbf{c}=[\mathbf{q}^{\top}\;\bm{\psi}^{\top}]^{\top}$ . The $K$ inequalities in (2b) encode the dispatch constraints (1b), power flow constraints (1c) and (1d) using properly specified parameters $\mathbf{a}_{1},\mathbf{b}_{1},e_{1},\dots,\mathbf{a}_{K},\mathbf{b}_{K},e_{K}$ .

II-B Differential Privacy for Synthetic OPF Datasets

Optimization parameters in problem (2) are either classified or owned by private system actors, and thus can not be directly disclosed to public. Our goal is thus to synthesize some realistic version of these parameters. In this work, we focus on the obfuscation of demand vector $\mathbf{d}$ . This is without much loss of generality, because other parameters, such as transmission data in $\mathbf{a}_{k}$ , $\mathbf{b}_{k}$ and $e_{k}$ , can be synthesized similarly; see the state-of-the-art obfuscation algorithms [5, 6, 7]. Towards this goal, we leverage DP to render the original vector $\mathbf{d}$ statistically indistinguishable from its synthetic counterpart $\tilde{\mathbf{d}}$ , up to some prescribed parameters: $\alpha$ , termed the adjacency parameter, and $\varepsilon$ , termed the privacy loss [3].

Definition 1 (Adjacency).

Two vectors $\mathbf{d},\mathbf{d}^{\prime}\in\mathcal{D}\subset\mathbb{R}^{n}$ are $\alpha-$ adjacent, for some $\alpha>0$ , if $\exists i\in\{1,\dots,n$ }, such that $d_{j}=d_{j}^{\prime},\forall j\in\{1,\dots,n\}\backslash i$ , and $|d_{i}-d_{i}^{\prime}|\leqslant\alpha$ . That is, they are different in one item by at most $\alpha$ . $\triangleleft$

To synthesize a DP version $\tilde{\mathbf{d}}$ of $\mathbf{d}$ , the standard Laplace mechanism applies a random noise to the original data, i.e., $\tilde{\mathbf{d}}=\mathbf{d}+\text{Lap}(\frac{\alpha}{\varepsilon})^{n}$ , where $\text{Lap}(s)^{n}$ is a random draw from the $n-$ dimensional Laplace distribution with zero mean and diagonal covariance matrix with each diagonal element equal $2s^{2}$ [6]. The mechanism guarantees that if the attacker’s prior for any load is within the $\pm\alpha$ MW range of the true value, it will not be improved by the DP release. If the prior is outside this range, the prior knowledge will be improved (thus enhancing grid transparency), but the exact loads will not be disclosed. In other words, the mechanism satisfies the following definition of $\varepsilon-$ DP.

Definition 2 ( $\varepsilon-$ DP).

The Laplace mechanism above, with domain $\mathcal{D}$ and output range $\mathcal{O}$ , is called $\varepsilon-$ DP if, for any outcome within $\hat{\mathcal{O}}\subseteq\mathcal{O}$ and any two $\alpha-$ adjacent load vectors $\mathbf{d}$ and $\mathbf{d}^{\prime}$ , the ratio of probabilities is bounded as

\displaystyle\frac{\text{Pr}[\mathbf{d}{\color[rgb]{1,1,1}\definecolor[named]{% pgfstrokecolor}{rgb}{1,1,1}\pgfsys@color@gray@stroke{1}\pgfsys@color@gray@fill% {1}^{\prime}}+\text{Lap}(\frac{\alpha}{\varepsilon})^{n}\in\hat{\mathcal{O}}]}% {\text{Pr}[\mathbf{d}^{\prime}+\text{Lap}(\frac{\alpha}{\varepsilon})^{n}\in% \hat{\mathcal{O}}]}\leqslant\text{exp}(\varepsilon).

(3)

where $\varepsilon$ is a prescribed non-negative parameter. $\triangleleft$

Intuitively, a smaller privacy loss $\varepsilon$ results in more noise applied to data and higher requirement for distribution similarity, which would make it more likely to observe the same random outcome. However, the Laplace mechanism alone is likely to produce such load vector $\tilde{\mathbf{d}}$ that does not admit a feasible OPF solution, i.e., $C_{\text{opf}}(\tilde{\mathbf{d}})=\varnothing.$ The prior work introduced the following two-stage solution:

1.

Laplace mechanism $\tilde{\mathbf{d}}^{0}=\mathbf{d}+\text{Lap}\left(\!\frac{2\alpha}{\varepsilon% }\!\right)^{n}$ , followed by
2.

Post-processing of $\tilde{\mathbf{d}}^{0}$ using a bilevel optimization:

\displaystyle\underset{{\tilde{\mathbf{d}}}}{\text{minimize}}\;\;

\displaystyle\lVert C_{\text{opf}}(\tilde{\mathbf{d}})-\tilde{C}_{\text{opf}}% \rVert_{1}+\gamma\lVert\tilde{\mathbf{d}}-\tilde{\mathbf{d}}^{0}\rVert_{1}

(4)

where the OPF costs $C_{\text{opf}}(\tilde{\mathbf{d}})$ comes from the embedded optimization problem (2) formulated on synthetic load vector $\tilde{\mathbf{d}}$ , and $\tilde{C}_{\text{opf}}=C_{\text{opf}}(\mathbf{d})+\text{Lap}(\frac{2\overline{% c}}{\varepsilon})$ computes a DP estimate of OPF costs on true data with $\overline{c}$ being the cost of the most expensive generator. The synthetic vector $\tilde{\mathbf{d}}$ is optimized using feedback from the embedded OPF problem, which constraints $\tilde{\mathbf{d}}$ to take only those values that admit a feasible OPF solution. The main objective of (4) is to match the OPF cost on synthetic load vector with that on the original load vector, thereby ensuring high modeling fidelity of the synthetic data. The second term in (4) is a regularization term with some small hyper-parameter $\gamma>0$ to choose the optimal solution that is closest to the original load after DP obfuscation $\tilde{\mathbf{d}^{0}}$ . Solution to (4) is the feasible and cost-consistent synthetic counterpart $\tilde{\mathbf{d}}$ , which ensures $\varepsilon-$ DP guarantee for the original load vector $\mathbf{d}$ [7].

One barrier to releasing synthetic OPF parameters is the risk posed by cyber adversaries who might exploit them to disrupt grid operations. Next, we substantiate these risks.

III Cyber Resilience Risks in Releasing Differentially Private OPF Datasets

Although synthetic OPF datasets contribute to overall grid transparency and enable independent power flow analysis, they can also be misused by cyber adversaries launching attacks on the grid. One class of attacks, which is of interest to this work, is load redistribution attacks. In terms of OPF problem (2), the adversary optimizes an attack vector $\bm{\delta}$ that alters loads in $\mathbf{d}$ to increase either the dispatch cost or the magnitude of power flow constraint violations.

According to [15], the optimal attack vector is found by solving the following bilevel optimization (BO) problem:


	$\displaystyle C_{\text{att}}^{\text{BO}}(\mathbf{d})=\underset{{\bm{\delta}\in% \Delta}}{\text{maximize}}\;C_{\text{opf}}(\mathbf{d}+\bm{\delta})$		(5a)
where the OPF costs $C_{\text{opf}}(\mathbf{d}+\bm{\delta})$ comes from the embedded optimization problem (2) formulated on load vector after attack $\mathbf{d}+\bm{\delta}$ . The attack vector is constrained by the set of admissible attacks

	$\displaystyle\Delta$	$\displaystyle\triangleq\left\{\bm{\delta}\;\bigg{\|}\begin{array}[]{l}% \underline{\bm{\delta}}\leqslant\bm{\delta}\leqslant\overline{\bm{\delta}}\\ \bm{1}^{\top}\bm{\delta}=0\end{array}\right\}$	(5d)

where $\overline{\bm{\delta}}$ and $\underline{\bm{\delta}}$ are limits on attack magnitude, and $\bm{1}^{\top}\bm{\delta}=0$ ensures that the total system loading remains unchanged after the attack, thus ensuring the stealthiness of the attack.

While the actual load vector $\mathbf{d}$ is not revealed to public, the adversary may leverage its DP release $\tilde{\mathbf{d}}$ to calibrate the attack. Our experiments in Sec. V reveal that the vector computed on $\tilde{\mathbf{d}}$ leads to a substantial increase of OPF costs across standard power systems benchmarks (see Tab. I).

IV Cyber Resilience and Privacy Guarantees for Synthetic OPF Datasets

Recognizing the risks of misusing synthetic datasets, we revisit the post-processing to enhance the cyber resilience of source grids. Instead of (4), we propose the following upper-level objective for the post-processing optimization:

	$\displaystyle\underset{{\tilde{\mathbf{d}}}}{\text{minimize}}\quad\lVert C_{% \text{opf}}(\tilde{\mathbf{d}})-\tilde{C}_{\text{opf}}\rVert_{1}$	$\displaystyle+\beta\lVert C_{\text{att}}^{\text{BO}}(\tilde{\mathbf{d}})-% \tilde{C}_{\text{opf}}\rVert_{1}$
		$\displaystyle\quad{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{% 0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}+\gamma\lVert% \tilde{\mathbf{d}}-\tilde{\mathbf{d}}^{0}\rVert_{1}}$		(6)

where the first term controls the fidelity of the synthetic data, the second term measures the damage under attack calibrated on the synthetic data, and the third term regularizes the demand vector. For a small penalty $\gamma$ , this objective represents a trade-off between the fidelity of synthetic grid parameters and resilience of the grid to redistribution attacks, which can be explored by varying parameter $\beta>0.$ The embedded optimization $C_{\text{att}}^{\text{BO}}(\tilde{\mathbf{d}})$ includes the real grid data except for the load vector, thus modeling the worst-case attack when only the loads are unknown to adversaries.

The challenge is that (IV) requires solving a trilevel optimization problem, where the synthetic data is optimized over embedded BO attack model $C_{\text{att}}^{\text{BO}}(\tilde{\mathbf{d}})$ . Inspired by [17], we seek computational tractability by exploring the connection between the bilevel model of attack and robust optimization.

IV-A Computational Tractability via Robust Optimization (RO)

The conservative RO approximation of (5) is


$\displaystyle C_{\text{att}}^{\text{RO}}(\mathbf{d})=$	$\displaystyle\;\underset{{\mathbf{x}}}{\text{minimize}}\;\;\mathbf{c}^{\top}% \mathbf{x}$	(7a)
subject to	$\displaystyle\underset{\bm{\delta_{k}\in\Delta}}{\max}\left[\mathbf{a}_{k}^{% \top}\mathbf{x}+\mathbf{b}_{k}^{\top}(\mathbf{d}+\bm{\delta}_{k})+e_{k}\right]% \leqslant\mathbf{0},\forall k,\!\!\!$	(7b)

where each constraint $k$ is formulated for the worst-case realization of the attack vector from the set of admissible attacks. In contrast to bilevel formulation (5), the RO attack generates a worst-case attack vector for each constraint. The following result shows that the RO attack provides an upper-bound on the BO attack.

Proposition 3 (Conservative attack approximation).

For any feasible load vector $\mathbf{d}$ , relation $C_{\text{att}}^{\text{RO}}(\mathbf{d})\geqslant C_{\text{att}}^{\text{BO}}(% \mathbf{d})$ holds. $\triangleleft$

Although conservative, formulation (7) is computationally advantageous over (5) as it admits a linear programming reformulation via duality [18, §2.2] (see the link to online repository below for details). Let $\underline{\bm{\mu}}$ and $\overline{\bm{\mu}}$ be the duals of the first constraints in (5d), and $\lambda$ be the dual of the last condition in (5d). The exact reformulation of (7) is


$\displaystyle C_{\text{att}}^{\text{RO}}(\mathbf{d})=\;\underset{{\mathbf{x},% \underline{\bm{\mu}},\overline{\bm{\mu}},\lambda}}{\text{minimize}}\;\;\;$	$\displaystyle\mathbf{c}^{\top}\mathbf{x}$	(8a)
subject to	$\displaystyle\mathbf{a}_{k}^{\top}\mathbf{x}+\mathbf{b}_{k}^{\top}\mathbf{d}$
	$\displaystyle\;+\overline{\bm{\mu}}_{k}^{\top}\overline{\bm{\delta}}-% \underline{\bm{\mu}}_{k}^{\top}\underline{\bm{\delta}}+e_{k}\leqslant\mathbf{0% },\!\!\!$	(8b)
	$\displaystyle\mathbf{b}_{k}-\overline{\bm{\mu}}_{k}+\underline{\bm{\mu}}_{k}-% \bm{1}\lambda_{k}=\bm{0},$	(8c)
	$\displaystyle\underline{\bm{\mu}}_{k},\overline{\bm{\mu}}_{k}\geqslant\mathbf{% 0},\forall k=1,\dots,K.$	(8d)

Therefore, replacing $C_{\text{att}}^{\text{BO}}$ with $C_{\text{att}}^{\text{RO}}$ in objective function (IV) gives rise to bilevel post-processing optimization, which can be handled by mixed-integer optimization solvers [7, 6].

Next, we introduce a tractable post-processing algorithm for synthesizing loads with privacy and cyber resilience guarantees. Then, in Sec. IV-C, we modify the algorithm to tune the computational burden of the RO approximation.

IV-B Differentially Private CRO

Input:

\mathbf{d}

(\alpha,\varepsilon_{1},\varepsilon_{2})

(\beta,\gamma,\Delta)

1 Initial load obfuscation:

\tilde{\mathbf{d}}^{0}=\mathbf{d}+\text{Lap}\left(\!\frac{\alpha}{\varepsilon_% {1}}\!\right)^{n}

2 DP estimation of OPF costs:

\tilde{C}_{\text{opf}}=C_{\text{opf}}(\mathbf{d})+\text{Lap}\left(\frac{\alpha% \overline{c}}{\varepsilon_{2}}\right)

3 Post-processing optimization of the synthetic load vector

	$\displaystyle\tilde{\mathbf{d}}\in\underset{\tilde{\mathbf{d}}}{\text{argmin}}\;$	$\displaystyle\lVert C_{\text{opf}}(\tilde{\mathbf{d}})-\tilde{C}_{\text{opf}}% \rVert_{1}$
		$\displaystyle\quad+\beta\lVert C_{\text{att}}^{\text{RO}}(\tilde{\mathbf{d}})-% \tilde{C}_{\text{opf}}\rVert_{1}+\gamma\lVert\tilde{\mathbf{d}}-\tilde{\mathbf% {d}}^{0}\rVert_{1}$		(9)

Output: Synthetic load vector

\tilde{\mathbf{d}}

Algorithm 1 Privacy-preserving CRO

The CRO algorithm for privacy-preserving and cyber-resilient synthesis of load parameters is summarized in Alg. 1. It takes as inputs load adjacency and $\varepsilon$ -DP parameters, as well as optimization trade-off, regularization and attack parameters, $\beta$ , $\gamma$ and $\Delta$ , respectively. Step 1 initializes the synthetic load vector using the Laplace mechanism with a privacy loss of $\varepsilon_{1}$ . Step 2 performs a DP estimation of the OPF costs on real loads using the Laplace mechanism with a privacy loss of $\varepsilon_{2}$ . Following prior work in [7], this step requires the cost $\overline{c}$ of the most expensive generator. Finally, Step 3 post-processes the initial synthetic load by solving the bilevel optimization problem (1) using the conservative RO approximation of the attack. Since Step 3 does not optimize over real loads, it does not introduce any privacy loss. The complete formulation of (1) can be seen in Appendix B.

The resilience of the source grid to load redistribution attacks is controlled by the parameter $\beta$ and admissible set $\Delta$ . Naturally, a larger $\beta$ and a larger set $\Delta$ lead to greater resilience, but at the expense of the fidelity of the synthesized data. Our experiments in Sec. V will justify for the choices of these parameters. The privacy guarantee for $\alpha$ -adjacent load vectors is established by the following result.

Theorem 4 (DP of CRO).

Setting $\varepsilon_{1}=\varepsilon_{2}=\varepsilon/2$ renders Alg. 1 $\varepsilon-$ DP for $\alpha-$ adjacent load vectors. $\triangleleft$

IV-C Exponential Mechanism to Ease Computational Burden

While the RO approximation (7) leads to a more tractable bilevel optimization, it is still computationally expensive in large systems due to the massive amount of variables and complementarity constraints, as later substantiated by Fig. 2. We propose to alleviate the computational burden by selecting only a subset $\mathcal{K}=\{k_{i}\}_{i=1}^{\tau}$ of $\tau$ constraints for RO reformulation that affect the OPF cost the most. The remaining constraints $\mathcal{K}^{\prime}:=\{1,\dots,K\}\backslash\mathcal{K}$ are enforced deterministically. Setting $\tau=K$ leads to the full RO formulation, while $\tau<K$ leads to a reduced problem:


$\displaystyle C_{\text{att},\tau}^{\text{RO}}(\mathbf{d})=$	$\displaystyle\;\underset{{\mathbf{x}}}{\text{minimize}}\;\;\mathbf{c}^{\top}% \mathbf{x}$	(10a)
subject to	$\displaystyle\underset{\bm{\delta\in\Delta}}{\max}\left[\mathbf{a}_{k}^{\top}% \mathbf{x}+\mathbf{b}_{k}^{\top}(\mathbf{d}+\bm{\delta})+e_{k}\right]\leqslant% \mathbf{0},$	(10b)
	$\displaystyle\mathbf{a}_{k^{\prime}}^{\top}\mathbf{x}+\mathbf{b}_{k^{\prime}}^% {\top}\mathbf{d}+e_{k^{\prime}}\leqslant\mathbf{0},$	(10c)
	$\displaystyle\quad\quad\quad\quad\quad\quad\quad\!\;\forall k\in\mathcal{K},\;% \forall k^{\prime}\in\mathcal{K^{\prime}}.$

While directly replacing $C_{\text{att}}^{\text{RO}}(\tilde{\mathbf{d}})$ with $C_{\text{att},\tau}^{\text{RO}}(\tilde{\mathbf{d}})$ in Alg. 1 alleviates the computational burden, this also degrades the privacy guarantee of Theorem 4: since the worst-case constraint set $\mathcal{K}$ is specific to a particular load vector $\mathbf{d}$ , the post-processing on $\mathcal{K}$ would leak information we intend to obfuscate. As a remedy, we leverage the report-noisy-max algorithm, a discrete version of the exponential mechanism of DP [3], to privately identify the worst-case constraints without leaking information about the actual load. The resulting algorithm, termed CRO-Exp, is given in Alg. 2.

The first two steps of Alg. 2 follow those in Alg. 1. At Step 3, the algorithm applies the exponential mechanism $\tau$ times to construct set $\mathcal{K}$ . In each iteration $t$ , the mechanism identifies the constraint $k_{t}$ that—when reformulated in a robust fashion—leads to the greatest increase of OPF cost. After $\tau$ iterations, set $\mathcal{K}$ contains $\tau$ worst-case constraints. Finally, Step 4 solves the post-processing optimization with only $\tau$ constraints reformulated in RO way.

Theorem 5 (DP of CRO-Exp).

Setting $\varepsilon_{1}=\varepsilon_{2}=\varepsilon/3$ and $\varepsilon_{3}=\varepsilon/(3\tau)$ renders Alg. 2 $\varepsilon$ -DP for $\alpha$ -adjacent loads. $\triangleleft$

Input:

\mathbf{d}

(\alpha,\varepsilon_{1},\varepsilon_{2},\varepsilon_{3})

(\beta,\gamma,\Delta,\tau)

\mathcal{K}=\{\varnothing\}

1 Initial load obfuscation:

\tilde{\mathbf{d}}^{0}=\mathbf{d}+\text{Lap}\left(\!\frac{\alpha}{\varepsilon_% {1}}\!\right)^{n}

2 DP estimation of OPF costs:

\tilde{C}_{\text{opf}}=C_{\text{opf}}(\mathbf{d})+\text{Lap}\left(\frac{\alpha% \overline{c}}{\varepsilon_{2}}\right)

3 DP estimation of the set

\mathcal{K}

of the worst-case constraints

for $t=1,\dots,\tau$ do

for $k=1,\dots,K$ do

C_{k}=C_{\text{att},t}^{\text{RO}}(\mathbf{d})+\text{Lap}\left(\frac{\alpha% \overline{c}}{\varepsilon_{3}}\right)

end for

k_{t}\leftarrow\text{argmax}_{k}\;C_{k}

[2pt]

\mathcal{K}\leftarrow\mathcal{K}\cup\{k_{t}\}

end for

4 Post-processing optimization of the synthetic load vector

	$\displaystyle\tilde{\mathbf{d}}\in\underset{\tilde{\mathbf{d}}}{\text{argmin}}\;$	$\displaystyle\lVert C_{\text{opf}}(\tilde{\mathbf{d}})-\tilde{C}_{\text{opf}}% \rVert_{1}$
		$\displaystyle\hskip 4.5pt+\beta\lVert C_{\text{att},\tau}^{\text{RO}}(\tilde{% \mathbf{d}})-\tilde{C}_{\text{opf}}\rVert_{1}+\gamma\lVert\tilde{\mathbf{d}}-% \tilde{\mathbf{d}}^{0}\rVert_{1}$		(11)

Output: Synthetic load vector

\tilde{\mathbf{d}}

Algorithm 2 Privacy-preserving CRO-Exp

V Experiment Results

We run experiments using standard power grid testbeds. The set of admissible attacks includes the limits on attack magnitude as percentage $\eta$ of nominal loads. The privacy loss $\varepsilon=1$ , and we vary adjacency $\alpha$ throughout the experiments. The code and data to replicate our results are available at

https://github.com/Wu-ShengY/CRO_SynDataset.

V-A Substantiating Attacks Calibrated on DP Data

Table I collects the damage of load redistribution attacks. The synthetic loads are generated using the standard post-processing (4) with no cyber resilience guarantee. The results reveal that the load redistribution attacks are as effective on synthetic loads as on the original loads, motivating the cyber resilient obfuscation by means of Alg. 1 and 2.

TABLE I: Average outcomes of load redistribution attacks [$ 1,000]

Testbed	Load	$C_{\text{opf}}$	$C_{\text{att}}^{\text{BO}}$ (for varying $\eta$ )
Testbed	Load	$C_{\text{opf}}$	$\pm 5\%$	$\pm 10\%$	$\pm 15\%$
5_pjm	actual, $\mathbf{d}$	88.2	92.5	100.0	108.1
5_pjm	synth., $\tilde{\mathbf{d}}$	87.4	92.4	100.0	108.1
14_ieee	actual, $\mathbf{d}$	4.80	4.93	5.06	5.19
14_ieee	synth., $\tilde{\mathbf{d}}$	4.78	4.93	5.03	5.17
24_ieee	actual, $\mathbf{d}$	227.2	255.0	283.0	311.1
24_ieee	synth., $\tilde{\mathbf{d}}$	212.5	242.3	259.1	274.9
118_ieee	actual, $\mathbf{d}$	237.0	252.4	256.4	259.8
118_ieee	synth., $\tilde{\mathbf{d}}$	225.1	229.1	238.8	241.0

V-B Insights from the Small PJM 5-Bus Testbed

We test the CRO Alg. 1 in mitigating the attack damage. We generate $1,000$ synthetic loads using the standard post-processing (PP) in (4) and $1,000$ synthetic loads from the CRO assuming $\eta=5\%$ . The histograms of the normal and post-attack OPF costs are shown in Fig. 1. Their range becomes wider as load adjacency (and hence the noise) increases. For the standard post-processing (PP) (top row), we observe a notable shift of the post-attack histogram to the right relative to the cost of normal operations, confirming the results from Tab. I. The attacks calibrated on the outcomes of the CRO algorithm result in no extra OPF cost, as the histograms of the normal and post-attack cost overlap (bottom row). Thus, when attacks are calibrated on CRO results, the adversary sees no gain from launching an attack.

Table II shows the impact of the trade-off parameter $\beta$ on the CRO algorithm. The load redistribution attack demonstrate notable damage when disregarding attacks in the CRO algorithm $(\beta=0)$ . On the other hand, as long as $\beta$ exceeds the regularization weight $\gamma$ , the source grid remains immune to attacks. This trade-off is “flat” as we model the linear OPF costs; we expect it to be smoother for quadratic OPF costs, which is a subject of future investigation.

Refer to caption — Figure 1: Histograms of normal and post-attack OPF costs in the PJM 5-bus systems. Blue and red dotted lines represent the average OPF costs on synthetic load parameters in normal and post-attack scenarios, respectively. Top row: histograms resulting from the standard post-processing based on (4). Bottom row: histograms resulting from the CRO algorithm.

TABLE II: OPF costs induced on synthetic load vectors

\tilde{\mathbf{d}}_{\text{cro}}

for varying trade-off parameters

\beta

and load adjacency

\alpha

. Attack magnitude

\eta=5\%

Trade-off Parameters	$\alpha=20$ MW		$\alpha=100$ MW		$\alpha=200$ MW
Trade-off Parameters	$C_{\text{opf}}$	$C_{\text{att}}^{\text{BO}}$	$C_{\text{opf}}$	$C_{\text{att}}^{\text{BO}}$	$C_{\text{opf}}$	$C_{\text{att}}^{\text{BO}}$
$\beta\in[0,\gamma)$	88.2	92.9	87.3	91.5	84.5	88.2
$\beta\in[\gamma,\infty)$	88.2	88.2	87.3	87.3	84.5	84.5

V-C Large-Scale Applications with CRO-Exp

The post-processing optimization (1) in CRO is difficult to scale to large systems. As shown in Fig. 2, the number of variables and complementarity constraints grow with the size of the testbed. The CRO-Exp Alg. 2 reduces the problem by at least one order of magnitude to a similar level as the standard post-processing, since it only considers a subset of $\tau$ worst-case constraints in the attack. Fig. 3 shows the damage of attacks calibrated on synthetic loads released by CRO-Exp for three large testbeds. The increase of $\tau$ reduces the attack damage. Notably, $\tau=5$ suffices to minimize the attack damage, showing no improvement of cyber resilience beyond this threshold. This is due to the fact that only the attacks on a limited number of constraints can greatly increase the OPF cost. Moreover, the selection of the worst-case constraints in Step 3 of Alg. 2 becomes less informative with more noise, which only increases in $\tau$ , as per Theorem 5.

VI Conclusion

We developed algorithms for synthesizing credible grid parameters from real-world systems for OPF analysis. Similar to existing DP algorithms, they obfuscate loads by injecting Laplacian noise and using post-processing; however, they differ in a post-processing stage which optimizes for the trade-off between modeling fidelity (OPF cost consistency) and the resilience of source grids to cyber attacks. Our results reveal that these trade-offs are “flat”, meaning resilience can be achieved with little to no impact on the fidelity of the synthetic data. We also found that the post-processing formulation can be reduced with no loss of resilience using the exponential mechanism to select only important constraints for the attack. Inspired by these observations, future work aims to further investigate these trade-offs in the OPF setting with nonlinear (quadratic) costs and a broader class of attack models amenable to optimization-based representation.

-A Proof of Proposition 3

Consider two perturbed OPF problems formulated on the same vector $\mathbf{d}$ , one resulting from the BO attack (5)


$\displaystyle C_{\text{opf}}^{\text{BO}}(\mathbf{d})$	$\displaystyle=\underset{{\mathbf{x}}}{\text{minimize}}\quad\mathbf{c}^{\top}% \mathbf{x}$	(12a)
subject to	$\displaystyle\mathbf{a}_{k}^{\top}\mathbf{x}\leqslant-\mathbf{b}_{k}^{\top}(% \mathbf{d}+\bm{\gamma}^{\star})-e_{k},\;\forall k$	(12b)

and one from the RO approximation of the attack in (7)


$\displaystyle C_{\text{opf}}^{\text{RO}}(\mathbf{d})$	$\displaystyle=\underset{{\mathbf{x}}}{\text{minimize}}\quad\mathbf{c}^{\top}% \mathbf{x}$	(13a)
subject to	$\displaystyle\mathbf{a}_{k}^{\top}\mathbf{x}\leqslant-\mathbf{b}_{k}^{\top}(% \mathbf{d}+\bm{\delta}_{k}^{\star})-e_{k},\;\forall k$	(13b)

with perturbations $\bm{\gamma}^{\star},\bm{\delta}_{1}^{\star},\dots,\bm{\delta}_{K}^{\star}\in\Delta.$

To show that the optimal value of (12a) is upper-bounded by the optimal value of (13a), we need to establish that the feasible set (13b) is a subset of (12b). This is per the global inequality in perturbation analysis of convex programs [19, §5.6, Eq. (5.57)]. Inspecting (12b) and (13b), observe that this is the case when

\displaystyle\mathbf{b}_{k}^{\top}\bm{\delta}_{k}^{\star}\geqslant\mathbf{b}_{% k}^{\top}\bm{\gamma}^{\star},\quad\forall k=1,\dots,K.

(14)

The attack vectors $\bm{\delta}_{1}^{\star},\dots,\bm{\delta}_{K}^{\star}$ come from the RO, so the left-hand side of (14) is given by the following optimization:

\displaystyle\mathbf{b}_{k}^{\top}\bm{\delta}_{k}^{\star}=\max_{\bm{\delta}_{k% }\in\Delta}\;\mathbf{b}_{k}^{\top}\bm{\delta}_{k},\quad\forall k=1,\dots,K.

(15)

At the same time, the right-hand side of (14) can be represented by the following optimization problem:


$\displaystyle\mathbf{b}_{k}^{\top}\bm{\gamma}^{\star}=\max_{\bm{\gamma}_{k}\in% \Delta}\;$	$\displaystyle\mathbf{b}_{k}^{\top}\bm{\gamma}_{k}$	(16a)
	$\displaystyle\quad\quad\quad\quad\quad\forall k=1,\dots,K$
s.t.	$\displaystyle\bm{\gamma}_{k}=\bm{\gamma}^{\star}$	(16b)

Although trivial, this optimization problem allows us to clearly relate both sides of inequality (14) by relating problems (15) and (16). They are similar except for the additional consensus constraint (16b). Since $\bm{\gamma}^{\star}\in\Delta$ by design, the feasible set of $\bm{\gamma}_{k}$ is the subset of that for $\bm{\delta}_{k}$ . Hence, we can conclude that the optimal value of (15) is greater or equal than that of (16a). Therefore, inequality (14) holds and (13b) is indeed a subset of (12b), completing the proof.

-B Complete Formulation of the CRO Post Processing

The complete formulation of (1) with the Karush-Kuhn-Tucker conditions (KKTs) of embedded problems is


	$\displaystyle\underset{{}}{\text{minimize}}\;$	$\displaystyle\lVert\tilde{C}_{\text{opf}}-\mathbf{c}^{\top}\mathbf{x}_{1}% \rVert_{1}+\beta\lVert\tilde{C}_{\text{opf}}-\mathbf{c}^{\top}\mathbf{x}_{2}% \rVert_{1}+\gamma\lVert\tilde{\mathbf{d}}-\tilde{\mathbf{d}}^{0}\rVert_{1}$
	subject to
		$\displaystyle\left\{\begin{array}[]{ll}\overline{\bm{\mu}}_{k}^{\top}\overline% {\bm{\delta}}-\underline{\bm{\mu}}_{k}^{\top}\underline{\bm{\delta}}\leqslant-% \mathbf{a}_{k}^{\top}\mathbf{x}_{1}-\mathbf{b}_{k}^{\top}\tilde{\mathbf{d}}-e_% {k}\\ \mathbf{b}_{k}-\overline{\bm{\mu}}_{k}+\underline{\bm{\mu}}_{k}-\bm{1}\lambda_% {k}=\bm{0}\\ \underline{\bm{\mu}}_{k},\overline{\bm{\mu}}_{k}\geqslant\bm{0}\\ \mathbf{c}+\sum_{k}\theta_{k}\mathbf{a}_{k}=\mathbf{0}\\ \theta_{k}\overline{\bm{\delta}}-\bm{\zeta}_{k}-\overline{\bm{\pi}}_{k}=\bm{0}% \\ \theta_{k}\underline{\bm{\delta}}-\bm{\zeta}_{k}+\underline{\bm{\pi}}_{k}=\bm{% 0}\\ \bm{1}^{\top}\bm{\zeta}_{k}=0\\ 0\leqslant\theta_{k}\;\perp(-\overline{\bm{\mu}}_{k}^{\top}\overline{\bm{% \delta}}+\underline{\bm{\mu}}_{k}^{\top}\underline{\bm{\delta}}-\mathbf{a}_{k}% ^{\top}\mathbf{x}_{1}\\ \quad\quad\quad\quad\quad\!-\mathbf{b}_{k}^{\top}\tilde{\mathbf{d}}-e_{k})% \geqslant 0\\ \bm{0}\leqslant\underline{\pi}_{k}\;\perp\underline{\bm{\mu}}_{k}\geqslant\bm{% 0}\\ \bm{0}\leqslant\overline{\pi}_{k}\;\perp\overline{\bm{\mu}}_{k}\geqslant\bm{0}% \\ \theta_{k},\overline{\bm{\pi}}_{k},\underline{\bm{\pi}}_{k},\bm{\zeta}% \geqslant\bm{0}\end{array}\right\}$
		$\displaystyle\left\{\begin{array}[]{ll}\mathbf{a}_{k}^{\top}\mathbf{x}_{2}+% \mathbf{b}_{k}^{\top}\tilde{\mathbf{d}}+e_{k}\leqslant 0\\ \mathbf{c}+\sum_{k}\nu_{k}\mathbf{a}_{k}=0\\ 0\leqslant\nu_{k}\;\perp(-\mathbf{a}_{k}^{\top}\mathbf{x}_{2}-\mathbf{b}_{k}^{% \top}\tilde{\mathbf{d}}-e_{k})\geqslant 0\\ \nu_{k}\geqslant 0\end{array}\right\}$

$\forall k=1,\dots,K$ , where $\mathbf{c}^{\top}\mathbf{x}_{1}$ and $\mathbf{c}^{\top}\mathbf{x}_{2}$ represents the OPF cost in post-attack and normal conditions, respectively. Here, the $\perp$ denotes complementarity conditions.

-C Proof of Theorem 4

CRO uses the real data in the following computations:

1.

Step 1 adds Laplacian noise with magnitude $\alpha/\varepsilon_{2}$ to an identity query, whose sensitivity is $\alpha$ . By the sequential composition rule [3], this computation is $\varepsilon_{1}-$ DP.
2.

Step 2 adds Laplacian noise with parameter $(\alpha\overline{c})/\varepsilon_{2}$ . Since the sensitivity of OPF cost is $\alpha\overline{c}$ as shown in Section II.B, this computation is $\varepsilon_{2}-$ DP.

Since the post-processing optimization in Step 3 only uses obfuscated data, it will not induce any privacy loss due to post-processing immunity [3]. Per the sequential composition rule, the total privacy loss of the algorithm is $\varepsilon_{1}+\varepsilon_{2}$ , which adds up to $\varepsilon$ when we take $\varepsilon_{1}=\varepsilon/2,\varepsilon_{2}=\varepsilon/2$ .

-D Proof of Theorem 5

The algorithm queries data in the following computations:

1.

Following the similar arguments from Appendix C, Step 1 is $\varepsilon_{1}-$ DP and Step 2 is $\varepsilon_{2}-$ DP
2.

The worst-case constraints are estimated using $\tau$ iterations of the report-noisy-max algorithm in Step 3; each iteration injects the Laplacian noise with magnitude $\alpha\overline{c}$ providing $\varepsilon_{3}-$ DP, and the whole report-noisy-max algorithm is $\tau\varepsilon_{3}-$ DP.

As the post-processing optimization in Step 4 only uses obfuscated numerical data $\tilde{\mathbf{d}}^{0},\tilde{C}_{\text{opf}}$ and non-numerical data $\mathcal{K}$ , it is immune to privacy loss. The accumulated privacy loss of Alg. 2 is $\varepsilon_{1}+\varepsilon_{2}+\tau\varepsilon_{3}$ , which amounts to $\varepsilon$ when setting $\varepsilon_{1}=\varepsilon/3,\varepsilon_{2}=\varepsilon/3$ and $\varepsilon_{3}=\varepsilon/(3\tau)$ .

References

[1] A. B. Birchfield, T. Xu, and T. J. Overbye, “Power flow convergence and reactive power planning in the creation of large synthetic grids,” IEEE Trans. Power Syst., vol. 33, no. 6, pp. 6667–6674, 2018.
[2] S. Taylor et al., “California test system (CATS): A geographically accurate test system based on the California grid,” IEEE Trans. on Enrgy Mrkts, Pol and Reg., vol. 2, no. 1, pp. 107–118, 2024.
[3] C. Dwork and A. Roth, “The algorithmic foundations of differential privacy,” Foundations and Trends® in Theoretical Computer Science, vol. 9, no. 3–4, pp. 211–407, 2014.
[4] C. Dwork et al., “Calibrating noise to sensitivity in private data analysis,” in Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006. Proceedings 3. Springer, 2006, pp. 265–284.
[5] F. Fioretto, T. W. K. Mak, and P. Van Hentenryck, “Differential privacy for power grid obfuscation,” IEEE T. Smart Grid, vol. 11, no. 2, pp. 1356–1366, 2020.
[6] T. W. K. Mak et al., “Privacy-preserving power system obfuscation: A bilevel optimization approach,” IEEE Trans. Power Syst., vol. 35, no. 2, pp. 1627–1637, 2020.
[7] V. Dvorkin and A. Botterud, “Differentially private algorithms for synthetic power system datasets,” IEEE Control Systems Letters, vol. 7, pp. 2053–2058, 2023.
[8] V. Dvorkin et al., “Differentially private distributed optimal power flow,” in 2020 59th IEEE Conference on Decision and Control (CDC), 2020, pp. 2092–2097.
[9] ——, “Differentially private optimal power flow for distribution grids,” IEEE Trans. Power Syst., vol. 36, no. 3, pp. 2186–2196, 2021.
[10] M. Ryu and K. Kim, “A privacy-preserving distributed control of optimal power flow,” IEEE Trans. Power Syst., vol. 37, no. 3, pp. 2042–2051, 2022.
[11] F. Zhou, J. Anderson, and S. H. Low, “Differential privacy of aggregated dc optimal power flow data,” in 2019 American Control Conference (ACC), 2019, pp. 1307–1314.
[12] N. Ravi et al., “Differentially private k-means clustering applied to meter data analysis and synthesis,” IEEE T. Smart Grid, vol. 13, no. 6, pp. 4801–4814, 2022.
[13] Y. Liu, P. Ning, and M. K. Reiter, “False data injection attacks against state estimation in electric power grids,” ACM Transactions on Information and System Security, vol. 14, no. 1, pp. 1–33, 2011.
[14] H.-M. Chung, W.-T. Li, C. Yuen, W.-H. Chung, Y. Zhang, and C.-K. Wen, “Local cyber-physical attack for masking line outage and topology attack in smart grid,” IEEE T. Smart Grid, vol. 10, no. 4, pp. 4577–4588, 2019.
[15] X. Liu and Z. Li, “Local load redistribution attacks in power systems with incomplete network information,” IEEE T. Smart Grid, vol. 5, no. 4, pp. 1665–1676, Jul. 2014.
[16] G. Liang et al., “A review of false data injection attacks against modern power systems,” IEEE T. Smart Grid, vol. 8, pp. 1630–1638, 2016.
[17] M. Goerigk et al., “Connections between robust and bilevel optimization,” Open j. math. optim, vol. 6, no. 2, pp. 1–17, 2025.
[18] D. Bertsimas et al., “Theory and applications of robust optimization,” SIAM review, vol. 53, no. 3, pp. 464–501, 2011.
[19] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, UK: Cambridge University Press, 2004.