Synthesizing Grid Data with Cyber Resilience and Privacy Guarantees
Abstract
Differential privacy (DP) provides a principled approach to synthesizing data (e.g., loads) from real-world power systems while limiting the exposure of sensitive information. However, adversaries may exploit synthetic data to calibrate cyberattacks on the source grids. To control these risks, we propose new DP algorithms for synthesizing data that provide the source grids with both cyber resilience and privacy guarantees. The algorithms incorporate both normal operation and attack optimization models to balance the fidelity of synthesized data and cyber resilience. The resulting post-processing optimization is reformulated as a robust optimization problem, which is compatible with the exponential mechanism of DP to moderate its computational burden.
I Introduction
Optimal power flow (OPF) analysis in power systems requires realistic grid models with accurate network, generation, and load parameters—data that is difficult to source from real-world grids due to privacy and (cyber-)security concerns. While the lack of such models has inspired the development of artificial grids [1, 2], a more principled approach leverages the theory of differential privacy (DP) [3] to release grid models directly from real-world systems.
The DP theory asserts that it is impossible—up to prescribed privacy parameters—to infer the original parameters from their DP release. Such strong privacy guarantees originate from Laplacian perturbations [4] of real grid parameters, followed by post-processing optimization of the perturbed parameters to restore their modeling fidelity to the source grid, e.g., in terms of similarity of the OPF outcomes [5, 6, 7]. The DP theory also lies at the core of modern privacy-preserving OPF solvers [8, 9, 10], the release of aggregated grid statistics [11], and related grid information [12].
However, the privacy guarantees alone may not suffice to release grid parameters, as cybersecurity risks associated with such releases remain largely unexplored. Possible cyber attacks include false data injection, which subtly alters state estimation results [13], line outage masking, which disconnects a transmission line and misguides a control center to seek outage elsewhere [14], and load redistribution, which manipulates demand measurements to increase OPF cost and constraint violation [15]. The latter is of main interest to this work. Executing such attacks requires some grid knowledge [16], which is traditionally difficult to obtain. However, the availability of synthetic grid data may unintentionally inform adversaries and help them calibrate the attack.
Contribution: Recognizing the risks that synthetic grid parameters may inform cyber adversaries, we develop new DP algorithms that simultaneously guarantee cyber resilience and privacy for the source power grids. Our algorithms build on [5, 6, 7] and leverage the Laplace mechanism and post-processing optimization to tune synthetic data while anticipating cyber risks through embedded attack optimization.
The contributions of this paper are summarized as follows:
-
1.
We formulate a Cyber Resilient Obfuscation (CRO) algorithm, an optimization-based algorithm to release electric load data with a guarantee to preserve the privacy of the original data and ensure the resilience of the source grid to load redistribution attacks. The algorithm post-processes synthetic loads by balancing their fidelity with the potential damage to the grid.
-
2.
The underlying post-processing optimization is an intractable trilevel problem, which is reduced to a tractable yet more conservative bilevel problem. We achieve this by exploring the connections between robust and bilevel optimization, in the spirit of [17].
-
3.
To further improve computational tractability of the algorithm, we provide the extension of CRO, termed CRO-Exp, which uses the exponential mechanism of DP to identify only the most important constraints for post-processing optimization of synthetic loads.
We next provide preliminaries on OPF and DP theory. Sec. III explains the risks of cyberattacks, and Sec. IV introduces the algorithms to mitigate them. Sec. V presents simulations, and Sec. VI concludes. Proofs are relegated to the appendix.
Notation: lower- (upper-) case boldface letters denote column vectors (matrices). Scalar is the element of vector . Vectors and are the all-zero and all-one vectors; stands for transposition, and is the optimal value of .
II Preliminaries
II-A DC Optimal Power Flow (OPF) Problem
For a given load vector , the DC OPF problem seeks the least-cost generation dispatch in high-voltage grids that satisfies the loads and grid limits. Consider OPF as a parametric linear program:
(1a) | ||||
subject to | (1b) | |||
(1c) | ||||
(1d) |
where decision variables include generator dispatch , bounded by dispatch range , and power flow constraint violations , penalized by . The matrix of power transfer distribution factors is used to map net power injections to power flows as . Constraint (1c) defines system-wide power balance between dispatched generation and loads. The power flows in transmission lines are capped by line capacity using constraint (1d). In highly loaded condition, these constraints can be temporally violated by . As transmission constraint violations are not desired, they are penalized with a large parameter .
II-B Differential Privacy for Synthetic OPF Datasets
Optimization parameters in problem (2) are either classified or owned by private system actors, and thus can not be directly disclosed to public. Our goal is thus to synthesize some realistic version of these parameters. In this work, we focus on the obfuscation of demand vector . This is without much loss of generality, because other parameters, such as transmission data in , and , can be synthesized similarly; see the state-of-the-art obfuscation algorithms [5, 6, 7]. Towards this goal, we leverage DP to render the original vector statistically indistinguishable from its synthetic counterpart , up to some prescribed parameters: , termed the adjacency parameter, and , termed the privacy loss [3].
Definition 1 (Adjacency).
Two vectors are adjacent, for some , if }, such that , and . That is, they are different in one item by at most .
To synthesize a DP version of , the standard Laplace mechanism applies a random noise to the original data, i.e., , where is a random draw from the dimensional Laplace distribution with zero mean and diagonal covariance matrix with each diagonal element equal [6]. The mechanism guarantees that if the attacker’s prior for any load is within the MW range of the true value, it will not be improved by the DP release. If the prior is outside this range, the prior knowledge will be improved (thus enhancing grid transparency), but the exact loads will not be disclosed. In other words, the mechanism satisfies the following definition of DP.
Definition 2 (DP).
The Laplace mechanism above, with domain and output range , is called DP if, for any outcome within and any two adjacent load vectors and , the ratio of probabilities is bounded as
(3) |
where is a prescribed non-negative parameter.
Intuitively, a smaller privacy loss results in more noise applied to data and higher requirement for distribution similarity, which would make it more likely to observe the same random outcome. However, the Laplace mechanism alone is likely to produce such load vector that does not admit a feasible OPF solution, i.e., The prior work introduced the following two-stage solution:
-
1.
Laplace mechanism , followed by
-
2.
Post-processing of using a bilevel optimization:
(4) |
where the OPF costs comes from the embedded optimization problem (2) formulated on synthetic load vector , and computes a DP estimate of OPF costs on true data with being the cost of the most expensive generator. The synthetic vector is optimized using feedback from the embedded OPF problem, which constraints to take only those values that admit a feasible OPF solution. The main objective of (4) is to match the OPF cost on synthetic load vector with that on the original load vector, thereby ensuring high modeling fidelity of the synthetic data. The second term in (4) is a regularization term with some small hyper-parameter to choose the optimal solution that is closest to the original load after DP obfuscation . Solution to (4) is the feasible and cost-consistent synthetic counterpart , which ensures DP guarantee for the original load vector [7].
One barrier to releasing synthetic OPF parameters is the risk posed by cyber adversaries who might exploit them to disrupt grid operations. Next, we substantiate these risks.
III Cyber Resilience Risks in Releasing Differentially Private OPF Datasets
Although synthetic OPF datasets contribute to overall grid transparency and enable independent power flow analysis, they can also be misused by cyber adversaries launching attacks on the grid. One class of attacks, which is of interest to this work, is load redistribution attacks. In terms of OPF problem (2), the adversary optimizes an attack vector that alters loads in to increase either the dispatch cost or the magnitude of power flow constraint violations.
According to [15], the optimal attack vector is found by solving the following bilevel optimization (BO) problem:
(5a) | ||||
where the OPF costs comes from the embedded optimization problem (2) formulated on load vector after attack . The attack vector is constrained by the set of admissible attacks | ||||
(5d) |
where and are limits on attack magnitude, and ensures that the total system loading remains unchanged after the attack, thus ensuring the stealthiness of the attack.
IV Cyber Resilience and Privacy Guarantees for Synthetic OPF Datasets
Recognizing the risks of misusing synthetic datasets, we revisit the post-processing to enhance the cyber resilience of source grids. Instead of (4), we propose the following upper-level objective for the post-processing optimization:
(6) |
where the first term controls the fidelity of the synthetic data, the second term measures the damage under attack calibrated on the synthetic data, and the third term regularizes the demand vector. For a small penalty , this objective represents a trade-off between the fidelity of synthetic grid parameters and resilience of the grid to redistribution attacks, which can be explored by varying parameter The embedded optimization includes the real grid data except for the load vector, thus modeling the worst-case attack when only the loads are unknown to adversaries.
The challenge is that (IV) requires solving a trilevel optimization problem, where the synthetic data is optimized over embedded BO attack model . Inspired by [17], we seek computational tractability by exploring the connection between the bilevel model of attack and robust optimization.
IV-A Computational Tractability via Robust Optimization (RO)
The conservative RO approximation of (5) is
(7a) | ||||
subject to | (7b) |
where each constraint is formulated for the worst-case realization of the attack vector from the set of admissible attacks. In contrast to bilevel formulation (5), the RO attack generates a worst-case attack vector for each constraint. The following result shows that the RO attack provides an upper-bound on the BO attack.
Proposition 3 (Conservative attack approximation).
For any feasible load vector , relation holds.
Although conservative, formulation (7) is computationally advantageous over (5) as it admits a linear programming reformulation via duality [18, §2.2] (see the link to online repository below for details). Let and be the duals of the first constraints in (5d), and be the dual of the last condition in (5d). The exact reformulation of (7) is
(8a) | ||||
subject to | ||||
(8b) | ||||
(8c) | ||||
(8d) |
Therefore, replacing with in objective function (IV) gives rise to bilevel post-processing optimization, which can be handled by mixed-integer optimization solvers [7, 6].
Next, we introduce a tractable post-processing algorithm for synthesizing loads with privacy and cyber resilience guarantees. Then, in Sec. IV-C, we modify the algorithm to tune the computational burden of the RO approximation.
IV-B Differentially Private CRO
(9) |
The CRO algorithm for privacy-preserving and cyber-resilient synthesis of load parameters is summarized in Alg. 1. It takes as inputs load adjacency and -DP parameters, as well as optimization trade-off, regularization and attack parameters, , and , respectively. Step 1 initializes the synthetic load vector using the Laplace mechanism with a privacy loss of . Step 2 performs a DP estimation of the OPF costs on real loads using the Laplace mechanism with a privacy loss of . Following prior work in [7], this step requires the cost of the most expensive generator. Finally, Step 3 post-processes the initial synthetic load by solving the bilevel optimization problem (1) using the conservative RO approximation of the attack. Since Step 3 does not optimize over real loads, it does not introduce any privacy loss. The complete formulation of (1) can be seen in Appendix B.
The resilience of the source grid to load redistribution attacks is controlled by the parameter and admissible set . Naturally, a larger and a larger set lead to greater resilience, but at the expense of the fidelity of the synthesized data. Our experiments in Sec. V will justify for the choices of these parameters. The privacy guarantee for -adjacent load vectors is established by the following result.
Theorem 4 (DP of CRO).
Setting renders Alg. 1 DP for adjacent load vectors.
IV-C Exponential Mechanism to Ease Computational Burden
While the RO approximation (7) leads to a more tractable bilevel optimization, it is still computationally expensive in large systems due to the massive amount of variables and complementarity constraints, as later substantiated by Fig. 2. We propose to alleviate the computational burden by selecting only a subset of constraints for RO reformulation that affect the OPF cost the most. The remaining constraints are enforced deterministically. Setting leads to the full RO formulation, while leads to a reduced problem:
(10a) | ||||
subject to | (10b) | |||
(10c) | ||||
While directly replacing with in Alg. 1 alleviates the computational burden, this also degrades the privacy guarantee of Theorem 4: since the worst-case constraint set is specific to a particular load vector , the post-processing on would leak information we intend to obfuscate. As a remedy, we leverage the report-noisy-max algorithm, a discrete version of the exponential mechanism of DP [3], to privately identify the worst-case constraints without leaking information about the actual load. The resulting algorithm, termed CRO-Exp, is given in Alg. 2.
The first two steps of Alg. 2 follow those in Alg. 1. At Step 3, the algorithm applies the exponential mechanism times to construct set . In each iteration , the mechanism identifies the constraint that—when reformulated in a robust fashion—leads to the greatest increase of OPF cost. After iterations, set contains worst-case constraints. Finally, Step 4 solves the post-processing optimization with only constraints reformulated in RO way.
Theorem 5 (DP of CRO-Exp).
Setting and renders Alg. 2 -DP for -adjacent loads.
(11) |
V Experiment Results
We run experiments using standard power grid testbeds. The set of admissible attacks includes the limits on attack magnitude as percentage of nominal loads. The privacy loss , and we vary adjacency throughout the experiments. The code and data to replicate our results are available at
https://github.com/Wu-ShengY/CRO_SynDataset.
V-A Substantiating Attacks Calibrated on DP Data
Table I collects the damage of load redistribution attacks. The synthetic loads are generated using the standard post-processing (4) with no cyber resilience guarantee. The results reveal that the load redistribution attacks are as effective on synthetic loads as on the original loads, motivating the cyber resilient obfuscation by means of Alg. 1 and 2.
Testbed | Load | (for varying ) | |||
---|---|---|---|---|---|
5_pjm | actual, | 88.2 | 92.5 | 100.0 | 108.1 |
synth., | 87.4 | 92.4 | 100.0 | 108.1 | |
14_ieee | actual, | 4.80 | 4.93 | 5.06 | 5.19 |
synth., | 4.78 | 4.93 | 5.03 | 5.17 | |
24_ieee | actual, | 227.2 | 255.0 | 283.0 | 311.1 |
synth., | 212.5 | 242.3 | 259.1 | 274.9 | |
118_ieee | actual, | 237.0 | 252.4 | 256.4 | 259.8 |
synth., | 225.1 | 229.1 | 238.8 | 241.0 |
V-B Insights from the Small PJM 5-Bus Testbed
We test the CRO Alg. 1 in mitigating the attack damage. We generate synthetic loads using the standard post-processing (PP) in (4) and synthetic loads from the CRO assuming . The histograms of the normal and post-attack OPF costs are shown in Fig. 1. Their range becomes wider as load adjacency (and hence the noise) increases. For the standard post-processing (PP) (top row), we observe a notable shift of the post-attack histogram to the right relative to the cost of normal operations, confirming the results from Tab. I. The attacks calibrated on the outcomes of the CRO algorithm result in no extra OPF cost, as the histograms of the normal and post-attack cost overlap (bottom row). Thus, when attacks are calibrated on CRO results, the adversary sees no gain from launching an attack.
Table II shows the impact of the trade-off parameter on the CRO algorithm. The load redistribution attack demonstrate notable damage when disregarding attacks in the CRO algorithm . On the other hand, as long as exceeds the regularization weight , the source grid remains immune to attacks. This trade-off is “flat” as we model the linear OPF costs; we expect it to be smoother for quadratic OPF costs, which is a subject of future investigation.

Trade-off Parameters | MW | MW | MW | |||
---|---|---|---|---|---|---|
88.2 | 92.9 | 87.3 | 91.5 | 84.5 | 88.2 | |
88.2 | 88.2 | 87.3 | 87.3 | 84.5 | 84.5 |
V-C Large-Scale Applications with CRO-Exp
The post-processing optimization (1) in CRO is difficult to scale to large systems. As shown in Fig. 2, the number of variables and complementarity constraints grow with the size of the testbed. The CRO-Exp Alg. 2 reduces the problem by at least one order of magnitude to a similar level as the standard post-processing, since it only considers a subset of worst-case constraints in the attack. Fig. 3 shows the damage of attacks calibrated on synthetic loads released by CRO-Exp for three large testbeds. The increase of reduces the attack damage. Notably, suffices to minimize the attack damage, showing no improvement of cyber resilience beyond this threshold. This is due to the fact that only the attacks on a limited number of constraints can greatly increase the OPF cost. Moreover, the selection of the worst-case constraints in Step 3 of Alg. 2 becomes less informative with more noise, which only increases in , as per Theorem 5.


VI Conclusion
We developed algorithms for synthesizing credible grid parameters from real-world systems for OPF analysis. Similar to existing DP algorithms, they obfuscate loads by injecting Laplacian noise and using post-processing; however, they differ in a post-processing stage which optimizes for the trade-off between modeling fidelity (OPF cost consistency) and the resilience of source grids to cyber attacks. Our results reveal that these trade-offs are “flat”, meaning resilience can be achieved with little to no impact on the fidelity of the synthetic data. We also found that the post-processing formulation can be reduced with no loss of resilience using the exponential mechanism to select only important constraints for the attack. Inspired by these observations, future work aims to further investigate these trade-offs in the OPF setting with nonlinear (quadratic) costs and a broader class of attack models amenable to optimization-based representation.
-A Proof of Proposition 3
Consider two perturbed OPF problems formulated on the same vector , one resulting from the BO attack (5)
(12a) | ||||
subject to | (12b) |
and one from the RO approximation of the attack in (7)
(13a) | ||||
subject to | (13b) |
with perturbations
To show that the optimal value of (12a) is upper-bounded by the optimal value of (13a), we need to establish that the feasible set (13b) is a subset of (12b). This is per the global inequality in perturbation analysis of convex programs [19, §5.6, Eq. (5.57)]. Inspecting (12b) and (13b), observe that this is the case when
(14) |
The attack vectors come from the RO, so the left-hand side of (14) is given by the following optimization:
(15) |
At the same time, the right-hand side of (14) can be represented by the following optimization problem:
(16a) | ||||
s.t. | (16b) |
Although trivial, this optimization problem allows us to clearly relate both sides of inequality (14) by relating problems (15) and (16). They are similar except for the additional consensus constraint (16b). Since by design, the feasible set of is the subset of that for . Hence, we can conclude that the optimal value of (15) is greater or equal than that of (16a). Therefore, inequality (14) holds and (13b) is indeed a subset of (12b), completing the proof.
-B Complete Formulation of the CRO Post Processing
The complete formulation of (1) with the Karush-Kuhn-Tucker conditions (KKTs) of embedded problems is
subject to | ||||
, where and represents the OPF cost in post-attack and normal conditions, respectively. Here, the denotes complementarity conditions.
-C Proof of Theorem 4
CRO uses the real data in the following computations:
-
1.
Step 1 adds Laplacian noise with magnitude to an identity query, whose sensitivity is . By the sequential composition rule [3], this computation is DP.
-
2.
Step 2 adds Laplacian noise with parameter . Since the sensitivity of OPF cost is as shown in Section II.B, this computation is DP.
Since the post-processing optimization in Step 3 only uses obfuscated data, it will not induce any privacy loss due to post-processing immunity [3]. Per the sequential composition rule, the total privacy loss of the algorithm is , which adds up to when we take .
-D Proof of Theorem 5
The algorithm queries data in the following computations:
-
1.
Following the similar arguments from Appendix C, Step 1 is DP and Step 2 is DP
-
2.
The worst-case constraints are estimated using iterations of the report-noisy-max algorithm in Step 3; each iteration injects the Laplacian noise with magnitude providing DP, and the whole report-noisy-max algorithm is DP.
As the post-processing optimization in Step 4 only uses obfuscated numerical data and non-numerical data , it is immune to privacy loss. The accumulated privacy loss of Alg. 2 is , which amounts to when setting and .
References
- [1] A. B. Birchfield, T. Xu, and T. J. Overbye, “Power flow convergence and reactive power planning in the creation of large synthetic grids,” IEEE Trans. Power Syst., vol. 33, no. 6, pp. 6667–6674, 2018.
- [2] S. Taylor et al., “California test system (CATS): A geographically accurate test system based on the California grid,” IEEE Trans. on Enrgy Mrkts, Pol and Reg., vol. 2, no. 1, pp. 107–118, 2024.
- [3] C. Dwork and A. Roth, “The algorithmic foundations of differential privacy,” Foundations and Trends® in Theoretical Computer Science, vol. 9, no. 3–4, pp. 211–407, 2014.
- [4] C. Dwork et al., “Calibrating noise to sensitivity in private data analysis,” in Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006. Proceedings 3. Springer, 2006, pp. 265–284.
- [5] F. Fioretto, T. W. K. Mak, and P. Van Hentenryck, “Differential privacy for power grid obfuscation,” IEEE T. Smart Grid, vol. 11, no. 2, pp. 1356–1366, 2020.
- [6] T. W. K. Mak et al., “Privacy-preserving power system obfuscation: A bilevel optimization approach,” IEEE Trans. Power Syst., vol. 35, no. 2, pp. 1627–1637, 2020.
- [7] V. Dvorkin and A. Botterud, “Differentially private algorithms for synthetic power system datasets,” IEEE Control Systems Letters, vol. 7, pp. 2053–2058, 2023.
- [8] V. Dvorkin et al., “Differentially private distributed optimal power flow,” in 2020 59th IEEE Conference on Decision and Control (CDC), 2020, pp. 2092–2097.
- [9] ——, “Differentially private optimal power flow for distribution grids,” IEEE Trans. Power Syst., vol. 36, no. 3, pp. 2186–2196, 2021.
- [10] M. Ryu and K. Kim, “A privacy-preserving distributed control of optimal power flow,” IEEE Trans. Power Syst., vol. 37, no. 3, pp. 2042–2051, 2022.
- [11] F. Zhou, J. Anderson, and S. H. Low, “Differential privacy of aggregated dc optimal power flow data,” in 2019 American Control Conference (ACC), 2019, pp. 1307–1314.
- [12] N. Ravi et al., “Differentially private k-means clustering applied to meter data analysis and synthesis,” IEEE T. Smart Grid, vol. 13, no. 6, pp. 4801–4814, 2022.
- [13] Y. Liu, P. Ning, and M. K. Reiter, “False data injection attacks against state estimation in electric power grids,” ACM Transactions on Information and System Security, vol. 14, no. 1, pp. 1–33, 2011.
- [14] H.-M. Chung, W.-T. Li, C. Yuen, W.-H. Chung, Y. Zhang, and C.-K. Wen, “Local cyber-physical attack for masking line outage and topology attack in smart grid,” IEEE T. Smart Grid, vol. 10, no. 4, pp. 4577–4588, 2019.
- [15] X. Liu and Z. Li, “Local load redistribution attacks in power systems with incomplete network information,” IEEE T. Smart Grid, vol. 5, no. 4, pp. 1665–1676, Jul. 2014.
- [16] G. Liang et al., “A review of false data injection attacks against modern power systems,” IEEE T. Smart Grid, vol. 8, pp. 1630–1638, 2016.
- [17] M. Goerigk et al., “Connections between robust and bilevel optimization,” Open j. math. optim, vol. 6, no. 2, pp. 1–17, 2025.
- [18] D. Bertsimas et al., “Theory and applications of robust optimization,” SIAM review, vol. 53, no. 3, pp. 464–501, 2011.
- [19] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, UK: Cambridge University Press, 2004.