Evaluating Differentially Private Machine Learning in Practice
Evaluating Differentially Private Machine Learning in Practice
of these achieve tighter analysis of cumulative privacy loss domized algorithm M is (µ, τ)-concentrated differentially pri-
by taking advantage of the fact that the privacy loss random vate if, for all pairs of adjacent data sets D and D0 ,
variable is strictly centered around an expected privacy loss.
The cumulative privacy budget obtained from these analy- DsubG (M(D) || M(D0 )) ≤ (µ, τ)
ses bounds the worst case privacy loss of the composition of
mechanisms with all but δ failure probability. This reduces where the sub-Gaussian divergence, DsubG , is defined such
the noise required to satisfy a given privacy budget and hence that the expected privacy loss is bounded by µ and after
improves utility over multiple compositions. However, it is subtracting µ, the resulting centered sub-Gaussian distribu-
important to consider the actual impact the noise reductions tion has standard deviation τ. Any -DP algorithm satisfies
enabled by these variants have on the privacy leakage, which ( · (e − 1)/2, )-CDP, however the converse is not true.
is a main focus of this paper. A variation on CDP, zero-concentrated differential privacy
Note that the variants are just different techniques for an- (zCDP) [9] uses Rényi divergence as a different method to
alyzing the composition of the mechanism—by themselves show that the privacy loss random variable follows a sub-
they do not impact the noise added, which is all the actual Gaussian distribution:
privacy depends on. What they do is enable a tighter analy-
Definition 2.4 (Zero-Concentrated Differential Privacy). A
sis of the guaranteed privacy. This means for a fixed privacy
randomized mechanism M is (ξ, ρ)-zero-concentrated differ-
budget, the relaxed definitions can be satisfied by adding less
entially private if, for all neighbouring data sets D and D0 and
noise than would be required for looser analyses, hence they
all α ∈ (1, ∞),
result in less privacy for the same level. Throughout the
paper, for simplicity, we refer to the differential privacy vari- Dα (M(D) || M(D0 )) ≤ ξ + ρα
ants but implicitly mean the mechanism used to satisfy that
definition of differential privacy. Thus, when we say RDP has where Dα (M(D) || M(D0 )) is the α-Rényi divergence be-
a given privacy leakage, it means the corresponding gradient tween the distribution of M(D) and the distribution of M(D0 ).
perturbation mechanism has that privacy leakage when it is
analyzed using RDP to bound its cumulative privacy loss. Dα also gives the α-th moment of the privacy loss random
Dwork et al. [18] note that the privacy loss of a differen- variable. For example, D1 gives the first order moment which
tially private mechanism follows a sub-Gaussian distribution. is the mean or the expected privacy loss, and D2 gives the
In other words, the privacy loss is strictly distributed around second order moment or the variance of privacy loss. There is
the expected privacy loss and the spread is controlled by the a direct relation between DP and zCDP. If M satisfies -DP,
variance of the sub-Gaussian distribution. Multiple composi- then it also satisfies ( 12 2 )-zCDP. Furthermore, if M provides
ρ-zCDP, it is (ρ + 2 ρ log(1/δ), δ)-DP for any δ > 0.
p
tions of differentially private mechanisms thus result in the
aggregation of corresponding mean and variance values of the The Rényi divergence allows zCDP to be mapped back
individual sub-Gaussian distributions. This can be converted to DP, which is not the case for CDP. However, Bun and
to a cumulative privacy budget similar to the advanced com- Steinke [9] give a relationship between CDP and zCDP, which
position theorem, which in turn reduces the noise that must allows an indirect mapping from CDP to DP (Table 1).
be added to the individual mechanisms. The authors call this The use of Rényi divergence as a metric to bound the pri-
concentrated differential privacy (CDP) [18]: vacy loss leads to the formulation of a more generic notion
of Rényi differential privacy (RDP) that is applicable to any
Definition 2.3 (Concentrated Differential Privacy). A ran- individual moment of privacy loss random variable:
Definition 2.5 (Rényi Differential Privacy [50]). A random- Data: Training data set (X, y)
ized mechanism M is said to have -Rényi differential privacy Result: Model parameters θ
of order α (which can be abbreviated as (α, )-RDP), if for θ ← Init(0)
any adjacent data sets D, D0 it holds that #1. Add noise here: objective perturbation
J(θ) = 1n ni=1 `(θ, Xi , yi ) + λR(θ)+β
P
Dα (M(D) || M(D0 )) ≤ . for epoch in epochs do
#2. Add noise here: gradient perturbation
The main difference is that CDP and zCDP linearly bound θ = θ − η(∇J(θ)+β)
all positive moments of privacy loss, whereas RDP bounds end
one moment at a time, which allows for a more accurate #3. Add noise here: output perturbation
numerical analysis of privacy loss. If M is an (α, )-RDP return θ+β
log 1/δ
mechanism, it also satisfies ( + α−1 , δ)-DP for any 0 < δ < 1.
Algorithm 1: Privacy noise mechanisms.
Table 1 compares the relaxed variations of differential pri-
vacy. For all the variations, the privacy budget grows sub-
linearly with the number of compositions k. first order [14, 37, 58, 77] and second order [40, 44] meth-
Moments Accountant. Motivated by the variants of differen- ods exist to solve this minimization problem, the most basic
tial privacy, Abadi et al. [1] propose the moments accountant procedure is gradient descent where we iteratively calculate
(MA) mechanism for bounding the cumulative privacy loss the gradient of J(θ) with respect to θ and update θ with the
of differentially private algorithms. The moments accoun- gradient information. This process is repeated until J(θ) ≈ 0
tant keeps track of a bound on the moments of the privacy or some other termination condition is met.
loss random variable during composition. Though the authors There are three obvious candidates for where to add privacy-
do not formalize this as a differential privacy variant, their preserving noise during this training process, demarcated in
definition of the moments bound is analogous to the Rényi Algorithm 1. First, we could add noise to the objective func-
divergence [50]. Thus, the moments accountant can be con- tion J(θ), which gives us the objective perturbation mech-
sidered as an instantiation of Rényi differential privacy. The anism (#1 in Algorithm 1). Second, we could add noise to
moments accountant is widely used for differentially private the gradients at each iteration, which gives us the gradient
deep learning due to its practical implementation in the Ten- perturbation mechanism (#2). Finally, we can add noise to θ∗
sorFlow Privacy library [2] (see Section 2.3 and Table 4). obtained after the training, which gives us the output perturba-
tion mechanism (#3). While there are other methods of achiev-
ing differential privacy such as input perturbation [15], sample-
2.2 Differential Privacy Methods for ML aggregate framework [52], exponential mechanism [49] and
teacher ensemble framework [53]. We focus our experimental
This section summarizes methods for modifying machine
analysis on gradient perturbation since it is applicable to all
learning algorithms to satisfy differential privacy. First, we
machine learning algorithms in general and is widely used for
review convex optimization problems, such as empirical risk
deep learning with differential privacy.
minimization (ERM) algorithms, and show several methods
The amount of noise that must be added depends on the
for achieving differential privacy during the learning process.
sensitivity of the machine learning algorithm. For instance,
Next, we discuss methods that can be applied to non-convex
consider logistic regression with `2 regularization penalty.
optimization problems, including deep learning.
The objective function is of the form:
ERM. Given a training data set (X, y), where X is a feature n
matrix and y is the vector of class labels, an ERM algorithm 1X λ
log(1 + e−Xi θyi ) + k θ k22
>
J(θ) =
aims to reduce the convex objective function of the form, n i=1 2
n
1X Assume that the training features are bounded, kXi k2 ≤ 1 and
J(θ) = `(θ, Xi , yi ) + λR(θ), yi ∈ {−1, 1}. Chaudhuri et al. [12] prove that for this setting,
n i=1
objective perturbation requires sampling noise in the scale
where `(·) is a convex loss function (such as mean square error of n2 , and output perturbation requires sampling noise in the
2
(MSE) or cross-entropy loss) that measures the training loss scale of nλ . The gradient of the objective function is:
for a given θ, and R(·) is a regularization function. Commonly n
used regularization functions include `1 penalty, which makes 1 X −Xi yi
∇J(θ) = + λθ
the vector θ sparse, and `2 penalty, which shrinks the values n i=1 1 + eXi> θyi
of θ vector.
The goal of the algorithm is to find the optimal θ∗ that min- which has a sensitivity of 2n . Thus, gradient perturbation re-
imizes the objective function: θ∗ = arg minθ J(θ). While many quires sampling noise in the scale of n2 at each iteration.
Deep learning. Deep learning follows the same learning pro- gression. Chaudhuri et al. [12] subsequently generalized this
cedure as in Algorithm 1, but the objective function is non- method for ERM algorithms. This sensitivity analysis method
convex. As a result, the sensitivity analysis methods of Chaud- has since been used by many works for binary classifica-
huri et al. [12] do not hold as they require a strong convexity tion tasks under different learning settings (listed in Table 2).
assumption. Hence, their output and objective perturbation While these applications can be implemented with low pri-
methods are not applicable. An alternative approach is to vacy budgets ( ≤ 1), they only perform learning in restricted
replace the non-convex function with a convex polynomial settings such as learning with low dimensional data, smooth
function [56, 57], and then use the standard objective pertur- objective functions and strong convexity assumptions, and are
bation. This approach requires carefully designing convex only applicable to simple binary classification tasks.
polynomial functions that can approximate the non-convexity, There has also been considerable progress in general-
which can still limit the model’s learning capacity. Moreover, izing privacy-preserving machine learning to more com-
it would require a considerable change in the existing machine plex scenarios such as learning in high-dimensional set-
learning infrastructure. tings [33, 34, 65], learning without strong convexity assump-
A simpler and more popular approach is to add noise to tions [66], or relaxing the assumptions on data and objective
the gradients. Application of gradient perturbation requires functions [63, 69, 78]. However, these advances are mainly of
a bound on the gradient norm. Since the gradient norm can theoretical interest and only a few works provide implemen-
be unbounded in deep learning, gradient perturbation can be tations [33, 34].
used after manually clipping the gradients at each iteration. As
noted by Abadi et al. [1], norm clipping provides a sensitivity Complex learning tasks. All of the above works are lim-
bound on the gradients which is required for generating noise ited to convex learning problems with binary classification
in gradient perturbation. tasks. Adopting their approaches to more complex learning
tasks requires higher privacy budgets (see Table 3). For in-
stance, the online version of ERM as considered by Jain et
2.3 Implementing Differential Privacy al. [32] requires as high as 10 to achieve acceptable utility.
From the definition of differential privacy, we can see that
This section surveys how differential privacy has been used in Pr[M(D) ∈ S ] ≤ e10 × Pr[M(D0 ) ∈ S ]. In other words, even if
machine learning applications, with a particular focus on the the model’s output probability is 0.0001 on a data set D0 that
compromises implementers have made to obtain satisfactory doesn’t contain the target record, the model’s output proba-
utility. While the effective privacy provided by differential bility can be as high as 0.9999 on a neighboring data set D
privacy mechanisms depends crucially on the choice of pri- that contains the record. This allows an adversary to infer
vacy budget , setting the value is discretionary and often the presence or absence of a target record from the training
done as necessary o achieve acceptable utlity without any data with high confidence. Adopting these binary classifica-
consideration of privacy. tion methods for multi-class classification tasks requires even
Some of the early data analytics works on frequent pattern higher values. As noted by Wu et al. [71], it would require
mining [7, 41], decision trees [21], private record linkage [30] training a separate binary classifier for each class. Finally,
and recommender systems [48] were able to achieve both high high privacy budgets are required for non-convex learning
utility and privacy with settings close to 1. These methods algorithms, such as deep learning [61, 80]. Since the output
rely on finding frequency counts as a sub-routine, and hence and objective perturbation methods of Chaudhuri et al. [12]
provide -differential privacy by either perturbing the counts are not applicable to non-convex settings, implementations
using Laplace noise or by releasing the top frequency counts of differentially private deep learning rely on gradient pertur-
using the exponential mechanism [49]. Machine learning, on bation in their iterative learning procedure. These methods
the other hand, performs much more complex data analysis, do not scale to large numbers of training iterations due to
and hence requires higher privacy budgets to maintain utility. the composition theorem of differential privacy which causes
Next, we cover simple binary classification works that use the privacy budget to accumulate across iterations. The only
small privacy budgets ( ≤ 1). Then we survey complex clas- exceptions are the works of Phan et al. [56, 57] that replace
sification tasks which seem to require large privacy budgets. the non-linear functions in deep learning with polynomial
Finally, we summarize recent works that aim to perform com- approximations and then apply objective perturbation. With
plex tasks with low privacy budgets by using variants of dif- this transformation, they achieve high model utility for = 1,
ferential privacy that offer tighter bounds on composition. as shown in Table 3. However, we note that this polynomial
approximation is a non-standard approach to deep learning
Binary classification. The first practical implementation of a which can limit the model’s learning capacity, and thereby
private machine learning algorithm was proposed by Chaud- diminish the model’s accuracy for complex tasks.
huri and Monteleoni [11]. They provide a novel sensitivity
analysis under strong convexity constraints, allowing them to Machine learning with other DP definitions. To avoid the
use output and objective perturbation for binary logistic re- stringent composition property of differential privacy, several
Perturbation Data Set n d
Adult 45,220 105 0.2
Chaudhuri et al. [12] Output and Objective
KDDCup99 70,000 119 0.2
Pathak et al. [55] Output Adult 45,220 105 0.2
KDDCup99 493,000 123 1.0
Hamm et al. [25] Output
URL 200,000 50 1.0
US 370,000 14 0.8
Zhang et al. [79] Objective
Brazil 190,000 14 0.8
CoverType 500,000 54 0.5
Jain and Thakurta [33] Objective
KDDCup2010 20,000 2M 0.5
URL 100,000 20M 0.1
Jain and Thakurta [34] Output and Objective
COD-RNA 60,000 8 0.1
KDDCup99 50,000 9 1.0
Song et al. [64] Gradient
MNIST† 60,000 15 1.0
Protein 72,876 74 0.05
Wu et al. [71] Output
CoverType 498,010 54 0.05
Adult 45,220 104 0.5
Jayaraman et al. [35] Output
KDDCup99 70,000 122 0.5
Table 2: Simple ERM Methods which achieve High Utility with Low Privacy Budget.
† While MNIST is normally a 10-class task, Song et al. [64] use this for ‘1 vs rest’ binary classification.
proposed privacy-preserving deep learning methods adopt the inference attacks can uncover highly sensitive information
differential privacy variants introduced in Section 2.1. Table 4 from training data. An early membership inference attack
lists works that use gradient perturbation with differential showed that it is possible to identify individuals contributing
privacy variants to reduce the overall privacy budget during DNA to studies that analyze a mixture of DNA from many
iterative learning. The utility benefit of using these variants is individuals, using a statistical distance measure to determine
evident from the fact that the privacy budget for deep learning if a known individual is in the mixture [27].
algorithms is significantly less than the prior works of Shokri
and Shmatikov [61] and Zhao et al. [80]. Membership inference attacks can either be completely
While these variants of differential privacy make complex black-box where an attacker only has query access to the
iterative learning feasible for reasonable values, they might target model [62], or can assume that the attacker has full
lead to more privacy leakage in practice. The main goal of white-box access to the target model, along with some auxil-
our study is to evaluate the impact of implementation deci- iary information [75]. The first membership inference attack
sions regarding the privacy budget and variants of differential on machine learning was proposed by Shokri et al. [62]. They
privacy on the concrete privacy leakage that can be exploited consider an attacker who can query the target model in a
by an attacker in practice. We do this by experimenting with black-box way to obtain confidence scores for the queried
various inference attacks, described in the next section. input. The attacker tries to exploit the confidence score to
determine whether the query input was present in the train-
ing data. Their attack method involves first training shadow
3 Inference Attacks on Machine Learning models on a labelled data set, which can be generated either
via black-box queries to the target model or through assump-
This section surveys the two types of inference attacks, mem- tions about the underlying distribution of training set. The
bership inference (Section 3.1) and attribute inference (Sec- attacker then trains an attack model using the shadow models
tion 3.2), and explains why they are useful metrics for evalu- to distinguish whether or not an input record is in the shadow
ating privacy leakage. Section 3.3 briefly summarizes other training set. Finally, the attacker makes API calls to the target
relevant privacy attacks on machine learning. model to obtain confidence scores for each given input record
and infers whether or not the input was part of the target
3.1 Membership Inference model’s training set. The inference model distinguishes the
target model’s predictions for inputs that are in its training set
The aim of a membership inference attack is to infer whether from those it did not train on. The key assumption is that the
or not a given record is present in the training set. Membership confidence score of the target model is higher for the training
Task Perturbation Data Set n d C
Year 500,000 90 2 10
Jain et al. [32] Online ERM Objective
CoverType 581,012 54 2 10
Binary ERM Adult 45,220 104 2 10
Binary ERM KDDCup99 70,000 114 2 10
Iyengar et al. [31] Multi-Class ERM Objective CoverType 581,012 54 7 10
Multi-Class ERM MNIST 65,000 784 10 10
High Dimensional ERM Gisette 6,000 5,000 2 10
YesiWell 254 30 2 1
Phan et al. [56, 57] Deep Learning Objective
MNIST 60,000 784 10 1
MNIST 60,000 1,024 10 369,200
Shokri and Shmatikov [61] Deep Learning Gradient
SVHN 100,000 3,072 10 369,200
US 500,000 20 2 100
Zhao et al. [80] Deep Learning Gradient
MNIST 60,000 784 10 100
Table 4: Gradient Perturbation based Classification Methods using Different Notions of Differential Privacy
instances than it would be for arbitrary instances not present ability estimate of the sensitive attribute. More concretely,
in the training set. This can be due to the generalization gap, for a test record x where the attacker knows the values of
which is prominent in models that overfit to training data. its non-sensitive attributes x1 , x2 , · · · xd−1 and all the prior
A more targeted approach was proposed by Long et al. [45] probabilities of the attributes, the attacker obtains the out-
where the shadow models are trained with and without a tar- put of the model, f (x), and attempts to recover the value of
geted input record t. At inference time, the attacker can check the sensitive attribute xd . The attacker essentially searches
if the input record t was present in the training set of tar- for the value of xd that maximizes the posterior probability
get model. This approach tests the membership of a specific P(xd | x1 , x2 , · · · xd−1 , f (x)). The success of this attack is based
record more accurately than Shokri et al.’s approach [62]. Re- on the correlation between the sensitive attribute, xd , and the
cently, Salem et al. [60] proposed more generic membership model output, f (x).
inference attacks by relaxing the requirements of Shokri et Yeom et al. [75] also propose an attribute inference attack
al. [62]. In particular, requirements on the number of shadow using the same principle they use for their membership in-
models, knowledge of training data distribution and the tar- ference attack. The attacker evaluates the model’s empirical
get model architecture can be relaxed without substantially loss on the input instance for different values of the sensitive
degrading the effectiveness of the attack. attribute, and reports the value which has the maximum pos-
Yeom et al. [75] recently proposed a more computationally terior probability of achieving the empirical loss. The authors
efficient membership inference attack when the attacker has define the attribute advantage similarly to their definition of
access to the target model and knows the average training loss membership advantage for membership inference.
of the model. To test the membership of an input record, the Fredrikson et al. [20] demonstrated attribute inference at-
attacker evaluates the loss of the model on the input record tacks that could identify genetic markers based on warfarin
and then classifies it as a member if the loss is smaller than dosage output by a model with just black-box access to model
the average training loss. API.1 With additional access to confidence scores of the
model (noted as white-box information by Wu et al. [70]),
Connection to Differential Privacy. Differential privacy, by more complex tasks have been performed, such as recovering
definition, aims to obfuscate the presence or absence of a faces from the training data [19].
record in the data set. On the other hand, membership in-
ference attacks aim to identify the presence or absence of Connection to Differential Privacy. Differential privacy is
a record in the data set. Thus, intuitively these two notions mainly tailored to obfuscate the presence or absence of a
counteract each other. Li et al. [42] point to this fact and record in a data set, by limiting the effect of any single record
provide a direct relationship between differential privacy and on the output of differential private model trained on the data
membership inference attacks. Backes et al. [4] studied mem- set. Logically this definition also extends to attributes or fea-
bership inference attacks on microRNA studies and showed tures of a record. In other words, by adding sufficient differ-
that differential privacy can reduce the success of membership ential privacy noise, we should be able to limit the effect of a
inference attacks, but at the cost of utility. sensitive attribute on the model’s output. This relationship be-
Yeom et al. [75] formally define a membership inference tween records and attributes is discussed by Yeom et al. [75].
attack as an adversarial game where a data element is selected Hence, we include these attacks in our experiments.
from the distribution, which is randomly either included in
the training set or not. Then, an adversary with access to 3.3 Other Attacks on Machine Learning
the trained model attempts to determine if that element was
used in training. The membership advantage is defined as the Apart from inference attacks, many other attacks have been
difference between the adversary’s true and false positive rates proposed in the literature which try to infer specific infor-
for this game. The authors prove that if the learning algorithm mation from the target model. The most relevant are mem-
satisfies -differential privacy, then the adversary’s advantage orization attacks, which try to exploit the ability of high
is bounded by e − 1. Hence, it is natural to use membership capacity models to memorize certain sensitive patterns in
inference attacks as a metric to evaluate the privacy leakage the training data [10]. These attacks have been found to be
of differentially private algorithms. thwarted by differential privacy mechanisms with very little
noise ( = 109 ) [10].
Other privacy attacks include model stealing, hyperparame-
3.2 Attribute Inference ter stealing, and property inference attacks. A model stealing
attack aims to recover the model parameters via black-box
The aim of an attribute inference attack (also called model access to the target model, either by adversarial learning [46]
inversion) is to learn hidden sensitive attributes of a test in- 1 This application has stirred some controversy based on the warfarin
put given at least API access to the model and information dosage output by the model itself being sensitive information correlated
about the non-sensitive attributes. Fredrikson et al. [20] for- to the sensitive genetic markers, hence the assumption on attacker’s prior
malize this attack in terms of maximizing the posterior prob- knowledge of warfarin dosage is somewhat unrealistic [47].
or by equation solving attacks [67]. Hyperparameter steal- 4.1 Experimental Setup
ing attacks try to recover the underlying hyperparameters
We evaluate the privacy leakage of two differentially private
used during the model training, such as regularization coeffi-
algorithms using gradient perturbation: logistic regression
cient [68] or model architecture [73]. These hyperparameters
for empirical risk minimization (Section 4.2) and neural net-
are intellectual property of commercial organizations that de-
works for non-convex learning (Section 4.3). For both, we
ploy machine learning models as a service, and hence these
consider the different notions of differential privacy and com-
attacks are regarded as a threat to valuable intellectual prop-
pare their privacy leakage. The variations that we implement
erty. A property inference attack tries to infer whether the
are naïve composition (NC), advanced composition (AC),
training data set has a specific property, given a white-box
zero-concentrated differential privacy (zCDP) and Rényi dif-
access to the trained model. For instance, given access to a
ferential privacy (RDP) (see Section 2.1 for details). We do
speech recognition model, an attacker can infer if the train-
not include CDP as it has the same composition property as
ing data set contains speakers with a certain accent. Here
zCDP (Table 1). For RDP, we use the RDP accountant (previ-
the attacker can use the shadow training method of Shokri
ously known as the moments accountant) of the TF Privacy
et al. [62] for distinguishing the presence and absence of a
framework [2].
target property. These attacks have been performed on HMM
We evaluate the models on two main metrics: accuracy
and SVM models [3] and neural networks [22].
loss, the model’s accuracy on a test set relative to the non-
Though all these attacks may leak sensitive information
private baseline model, and privacy leakage, the attacker’s
about the target model or training data, the information leaked
advantage as defined by Yeom et al. [75]. The accuracy loss is
tends to be application-specific and is not clearly defined in a
normalized with respect to the accuracy of non-private model
general way. For example, a property inference attack leaks
to clearly depict the model utility:
some statistical property of the training data that is surprising
to the model developer. Of course, the overall purpose of the Accuracy of Private Model
model is to learn statistical properties from the training data. Accuracy Loss = 1 − .
Accuracy of Non-Private Model
So, there is no general definition of a property inference attack
without a prescriptive decision about which statistical proper- To evaluate the inference attack, we provide the attacker
ties of the training data should be captured by the model and with a set of 20,000 records consisting of 10,000 records
which are sensitive to leak. In addition, the attacks mentioned from training set and 10,000 records from the test set. We
in this section do not closely follow the threat model of differ- call records in the training set members, and the other records
ential privacy. Thus, we only consider inference attacks for non-members. These labels are not known to the attacker. The
our experimental evaluation. task of the attacker is to predict whether or not a given input
In addition to these attacks, several attacks have been pro- record belongs to the training set (i.e., if it is a member). The
posed that require an adversary that can actively interfere with privacy leakage metric is calculated by taking the difference
the model training process [5, 51, 72, 74]. We consider these between the true positive rate (TPR) and the false positive rate
out of scope, and assume a clean training process not under (FPR) of the inference attack. Thus the privacy leakage metric
the control of the adversary. is always between 0 and 1, where the value of 0 indicates that
there is no leakage. For example, given that there are 100
member and 100 non-member records, if an attacker performs
4 Empirical Evaluation membership inference on the model and correctly identifies
all the ‘true’ member records while falsely labelling 30 non-
To quantify observable privacy leakage, we conduct exper-
member records as members, then the privacy leakage would
iments to measure how much an adversary can infer from
be 0.7. To better understand the potential impact of leakage,
a model. As motivated in Section 3, we measure privacy
we also conduct experiments to estimate the actual number
leakage using membership and attribute inference in our ex-
of members who are at risk for disclosure in a membership
periments. Note, however, that the conclusions we can draw
inference attack.
from experiments like this are limited to showing a lower
bound on the information leakage since they are measuring Data sets. We evaluate our models over two data sets for
the effectiveness of a particular attack. This contrasts with multi-class classification tasks: CIFAR-100 [38] and Purchase-
differential privacy guarantees, which provide an upper bound 100 [36]. CIFAR-100 consists of 28 × 28 images of 100 real
on possible leakage. Experimental results cannot be used to world objects, with 500 instances of each object class. We
make strong claims about what the best possible attack would use PCA to reduce the dimensionality of records to 50. The
be able to infer, especially in cases where an adversary has Purchase-100 data set consists of 200,000 customer purchase
auxiliary information to help guide the attack. Evidence from records of size 100 each (corresponding to the 100 frequently-
our experiments, however, can provide clear evidence that purchased items) where the records are grouped into 100
implemented privacy protections do not appear to provide classes based on the customers’ purchase style. For both data
sufficient privacy. sets, we use 10,000 randomly-selected instances for train-
ing and 10,000 randomly-selected non-training instances for we found optimal values to be λ = 10−5 for logistic regres-
the test set. The remaining records are used for training the sion and λ = 10−4 for neural network. For Purchase-100, we
shadow models and inference model used in the Shokri et al. found optimal values to be λ = 10−5 for logistic regression
attacks. and λ = 10−8 for neural network. Next, we fix this setting to
train differentially private models using gradient perturbation.
Attacks. For our experiments, we use the attack frameworks We vary between 0.01 and 1000 while keeping δ = 10−5 ,
of Shokri et al. [62] and Yeom et al. [75] for membership and report the accuracy loss and privacy leakage. The choice
inference and the method proposed by Yeom et al. [75] for of δ = 10−5 satisfies the requirement that δ should be smaller
attribute inference. In Shokri et al.’s framework [62], multiple than the inverse of the training set size 10,000. We use the
shadow models are trained on data that is sampled from the ADAM optimizer for training and fix the learning rate to 0.01
same distribution as the private data set. These shadow mod- with a batch size of 200. Due to the random noise addition,
els are used to train an inference model to identify whether all the experiments are repeated five times and the average
an input record belongs to the private data set. The inference results and standard errors are reported. We do not assume
model is trained using a set of records used to train the shadow pre-trained model parameters, unlike the prior works of Abadi
models, a set of records randomly selected from the distribu- et al. [1] and Yu et al. [76].
tion that are not part of the shadow model training, along with
the confidence scores output by the shadow models for all Clipping. For gradient perturbation, clipping is required to
of the input records. Using these inputs, the inference model bound the sensitivity of the gradients. We tried clipping at
learns to distinguish the training records from the non-training both the batch and per-instance level. Batch clipping is more
records. At the inference stage, the inference model takes an computationally efficient and a standard practice in deep learn-
input record along with the confidence score of the target ing. On the other hand, per-instance clipping uses the privacy
model on the input record, and outputs whether the input budget more efficiently, resulting in more accurate models
record belongs to the target model’s private training data set. for a given privacy budget. We use the TensorFlow Privacy
The intuition is that if the target model overfits on its training framework [2] which implements both batch and per-instance
set, its confidence score for a training record will be higher clipping. We fix the clipping threshold at C = 1.
than its confidence score for an otherwise similar input that Figure 1 compares the accuracy loss of logistic regression
was not used in training. The inference model tries to exploit models trained over CIFAR-100 data set with both batch clip-
this property. In our instantiation of the attack framework, ping and per-instance clipping. Per-instance clipping allows
we use five shadow models which all have the same model learning more accurate models for all values of and ampli-
architecture as the target model. Our inference model is a fies the differences between the different mechanisms. For
neural network with two hidden layers of size 64. This setting example, while none of the models learn anything useful when
is consistent with the original work [62]. using batch clipping, the model trained with RDP achieves
The attack framework of Yeom et al. [75] is simpler than accuracy close to the non-private model for = 100 when
Shokri et al.’s design. It assumes the attacker has access to performing per-instance clipping. Hence, for the rest of the
the target model’s expected training loss on the private train- paper we only report the results for per-instance clipping.
ing data set, in addition to having access to the target model.
For membership inference, the attacker simply observes the 4.2 Logistic Regression Results
target model’s loss on the input record. The attacker classifies
the record as a member if the loss is smaller than the target We train `2 -regularized logistic regression models on both the
model’s expected training loss, otherwise the record is classi- CIFAR-100 and Purchase-100 data sets.
fied as a non-member. The same principle is used for attribute
CIFAR-100. The baseline model for non-private logistic re-
inference. Given an input record, the attacker brute-forces all
possible values for the unknown attribute and observes the gression achieves accuracy of 0.225 on training set and 0.155
target model’s loss, outputting the value for which the loss is on test set, which is competitive with the state-of-art neural
closest to the target’s expected training loss. Since there are network model [62] that achieves test accuracy close to 0.20
no attributes in our data sets that are explicitly annotated as on CIFAR-100 after training on larger data set. Thus, there is a
private, we randomly choose five attributes, and perform the small generalization gap of 0.07 for the inference attacks to
attribute inference attack on each attribute independently, and exploit.
report the averaged results. Figure 1(b) compares the accuracy loss for logistic regres-
sion models trained with different notions of differential pri-
Hyperparameters. For both data sets, we train logistic re- vacy as we vary the privacy budget . As depicted in the figure,
gression and neural network models with `2 regularization. naïve composition achieves accuracy close to 0.01 for ≤ 10
First, we train a non-private model and perform a grid search which is random guessing for 100-class classification. Naïve
over the regularization coefficient λ to find the value that min- composition achieves accuracy loss close to 0 for = 1000.
imizes the classification error on the test set. For CIFAR-100, Advanced composition adds more noise than naïve composi-
1.0 AC 1.0
NC
zCDP 0.8
0.8 RDP
Accuracy Loss
Accuracy Loss
0.6 0.6
0.4 AC
0.4 RDP zCDP NC
0.2
0.2
0.0
0.0
10 2 10 1 100 101 102 103 10 2 10 1 100 101 102 103
Privacy Budget ( ) Privacy Budget ( )
(a) Batch gradient clipping (b) Per-instance gradient clipping
Privacy Leakage
-DP Bound -DP Bound
Privacy Leakage
tion when the privacy budget is greater than the number of between the theoretical upper bound and the empirical privacy
training epochs ( ≥ 100), so it should never be used in such leakage for all the variants of differential privacy. This implies
settings. The zCDP and RDP variants achieve accuracy loss that more powerful inference attacks could exist in practice.
close to 0 at = 500 and = 50 respectively, which is order Figure 2(b) shows results for the white-box attacker of
of magnitudes smaller than the naïve composition. This is Yeom et al. [75], which has access to the target model’s loss on
expected since these variations require less added noise for the input record. As expected, zCDP and RDP leak the most.
the same privacy budget. Naïve composition does not have any significant leakage for
Figures 2(a) and 2(b) show the privacy leakage due to ≤ 10, but the leakage reaches 0.077 ± 0.003 for = 1000.
membership inference attacks on logistic regression models. The observed leakage of all the variations is in accordance
Figure 2(a) shows results for the Shokri et al. attack [62], with the noise magnitude required for different differential
which has access to the target model’s confidence scores on privacy guarantees. From Figure 1(b) and Figure 2, we see
the input record. Naïve composition achieves privacy leakage that RDP at = 10 achieves similar utility and privacy leakage
close to 0 for ≤ 10, and the leakage reaches 0.065 ± 0.004 for to NC at = 500.
= 1000. The RDP and zCDP variants have average leakage Figure 2(c) depicts the privacy leakage due to the attribute
close to 0.080 ± 0.004 for = 1000. As expected, the differ- inference attack. Naïve composition has low privacy leakage
ential privacy variations have leakage in accordance with the for ≤ 10 (attacker advantage of 0.005 ± 0.007 at = 10),
amount of noise they add for a given . The plots also include but it quickly increases to 0.093 ± 0.002 for = 1000. As
a dashed line showing the theoretical upper bound on the expected, across all variations as privacy budgets increase
privacy leakage for -differential privacy, where the bound both the attacker’s advantage (privacy leakage) and the model
is e − 1 (see Section 3.1). As depicted, there is a huge gap utility (accuracy) increase.
Run 2 Run 1 1.00
0.500 PPV
Accuracy Loss
zCDP NC
bership inference attack of Yeom et al. on the models trained 0.6
with RDP at = 1000, since this is the setting where the pri-
vacy leakage is the highest. As shown in Figure 2(b), the 0.4
average leakage across five runs for this setting is 0.10 where
the average positive predictive value2 (PPV) is 0.55. On aver- 0.2
age, the attack identifies 9,196 members, of which 5,084 are
0.0
true members (out of 10,000 members) and 4,112 are false 10 2 10 1 100 101 102 103
positives (out of 10,000 non-members). To see if the members Privacy Budget ( )
are exposed randomly across different runs, we calculate the
overlap of membership predictions. Figure 3 illustrates the Figure 5: Accuracy loss of logistic regression (Purchase-100).
overlap of membership predictions across two of the five runs.
Out of 9,187 membership predictions, 5,070 members are cor-
rectly identified (true positives) and 4,117 non-members are releases. This is not a realistic attack scenario, but might ap-
incorrectly classified as members (false positives) in the first proximate a scenario where a model is repeatedly retrained
run, and 5,094 members are correctly identified in the second using mostly similar training data. Using this extra informa-
run. Across both the runs, there are 8,673 records predicted tion, the adversary can be more confident about predictions
as members in both runs, of which 4,805 are true members that are made consistently across multiple models. As the fig-
(PPV = 0.55). Thus, the overlap is much higher than would ure depicts, however, in this case the PPV increases slightly
be expected if exposure risks was due only to the random with intersection of more runs but remains close to 0.5. Thus,
noise added. It depends on properties of the data, although the even after observing the outputs of all the five models, the
relatively small increase in PPV suggests they are not highly adversary has little advantage in this setting. While this seems
correlated with actual training set membership. like the private models are performing well, the adversary
Figure 4 shows how many members and non-members are identifies 5,179 members with a PPV of 0.56 even with the
predicted as members multiple times across five runs. For non-private model. Hence the privacy benefit here is not due
comparison, the dotted line depicts the fraction of records that to the privacy mechanisms — even without any privacy noise,
would be identified as members by a test that randomly and the logistic regression model does not appear to leak enough
independently predicted membership with 0.5 probability (so, information to enable effective membership inference attacks.
PPV=0.5 at all points). This result corresponds to a scenario This highlights the huge gap between the theoretical upper
where models are trained independently five times on the bound on privacy leakage and the empirical leakage of the
same data set, and the adversary has access to all the model implemented inference attacks (Figure 2), showing that there
2 PPV gives the fraction of correct membership predictions among all the could be more powerful inference attacks in practice or ways
membership predictions made by the attacker. A PPV value of 0.5 means to provide more meaningful privacy guarantees.
that the adversary has no advantage, i.e. the adversary cannot distinguish
between members and non-members. Thus only PPV values greater than 0.5 Purchase-100. For the Purchase-100 dataset, the baseline
are of significance to the adversary. model for non-private logistic regression achieves accuracy of
0.25 0.25 0.25
0.20 0.20 0.20
-DP Bound -DP Bound
Privacy Leakage
Privacy Leakage
Privacy Leakage
-DP Bound 0.15
0.15 0.15
0.10 RDP NC
0.10 0.10
NC zCDP AC
RDP zCDP 0.05
0.05 0.05
NC AC 0.00
RDP zCDP
0.00 AC 0.00
10 2 10 1 100 101 102 103 10 2 10 1 100 101 102 103 10 2 10 1 100 101 102 103
Privacy Budget ( ) Privacy Budget ( ) Privacy Budget ( )
(a) Shokri et al. membership inference (b) Yeom et al. membership inference (c) Yeom et al. attribute inference
Accuracy Loss
Accuracy Loss
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
10 2 10 1 100 101 102 103 10 2 10 1 100 101 102 103
Privacy Budget ( ) Privacy Budget ( )
(a) CIFAR-100 (b) Purchase-100
Privacy Leakage
Privacy Leakage
-DP Bound
Privacy Leakage
0.500 PPV
1.00
Run 2 Run 1 0.656 PPV
True Members
Fraction of Data Set
Non Members
0.25
4370 0.822 PPV
0.00
TP (6156) TP (6150) >= 0 >= 1 >= 2 >= 3 >= 4 =5
Number of times identified as member (out of 5 runs)
(a) Overlap of membership predictions across two runs (b) Predictions across multiple runs
Figure 11: Predictions across multiple runs of neural network with RDP at = 1000 (CIFAR-100)
4.3 Neural Networks information, the privacy noise substantially diminishes the
inference attacks.
We train neural network models consisting of two hidden lay-
ers and an output layer. The hidden layers have 256 neurons Purchase-100. The baseline non-private neural network mod-
that use ReLU activation. The output layer is a softmax layer el achieves accuracy of 0.982 on the training set and 0.605 on
with 100 neurons, each corresponding to a class label. This test set. In comparison, the neural network model of Shokri et
architecture is similar to the one used by Shokri et al. [62]. al. [62] trained on a similar data set (but with 600 attributes
instead of 100 as in our data set) achieves 0.670 test accuracy.
CIFAR-100. The baseline non-private neural network model Figure 9(b) compares the accuracy loss of neural network
achieves accuracy of 1.000 on the training set and 0.168 on models trained with different variants of differential privacy.
test set, which is competitive to the neural network model The trends are similar to those for the logistic regression mod-
of Shokri et al. [62]. Their model is trained on a training set els (Figure 5). The zCDP and RDP variants achieve model
of size 29,540 and achieves test accuracy of 0.20, whereas utility close to the non-private baseline for = 1000, while
our model is trained on 10,000 training instances. There is a naïve composition continues to suffer from high accuracy
huge generalization gap of 0.832, which the inference attacks loss (0.372). Advanced composition has higher accuracy loss
can exploit. Figure 9(a) compares the accuracy loss of neural of 0.702 for = 1000 as it requires addition of more noise
network models trained with different notions of differential than naïve composition when is greater than the number of
privacy with varying privacy budget . The model trained training epochs.
with naïve composition does not learn anything useful until
Figure 12 shows the privacy leakage comparison of the
= 100 (accuracy loss of 0.907 ± 0.004), at which point the
variants against the inference attacks. The results are consis-
advanced composition also has accuracy loss close to 0.935
tent with those observed for CIFAR-100. Similar to the models
and the other variants achieve accuracy loss close to 0.24.
in other settings, we analyze the vulnerability of members
None of the variants approach zero accuracy loss, even for
across multiple runs of neural network model trained with
= 1000. The relative performance is similar to that of the
RDP at = 1000. Figure 13(a) shows the overlap of member-
logistic regression model discussed in Section 4.2.
ship predictions across two runs, while the figure 13(b) shows
Figure 10 shows the privacy leakage on neural network the fraction of members and non-members classified as mem-
models for the inference attacks. The privacy leakage for bers by the adversary across multiple runs. The PPV increases
each variant of differential privacy accords with the amount with intersection of more runs, but less information is leaked
of noise it adds to the model. At higher values, the leakage than for the CIFAR-100 models. The adversary correctly iden-
is significant for zCDP and RDP due to model overfitting. For tifies 3,807 members across all five runs with a PPV of 0.66.
= 1000, with the Shokri et al. attack, naïve composition has In contrast, the adversary predicts 9,461 members with 0.64
leakage of 0.034 compared to 0.219 for zCDP and 0.277 for PPV using the non-private model.
RDP (above the region shown in the plot). For the Yeom et al.
attack, at = 1000, RDP exhibits privacy leakage of 0.399, a
substantial advantage for the adversary. Naïve composition 4.4 Discussion
and advanced composition achieve strong privacy against
membership inference attackers, but fail to learning anything While the tighter cumulative privacy loss bounds provided
useful even with the highest privacy budgets tested. RDP by variants of differential privacy improve model utility for a
and zCDP are able to learn useful models at privacy budgets given privacy budget, the reduction in noise increases vulnera-
above = 100, but exhibit significant privacy leakage at the bility to inference attacks. While these definitions still satisfy
corresponding levels. Thus, none of the CIFAR-100 NN models the (, δ)-differential privacy guarantees, the meaningful value
seem to provide both acceptable model utility and strong of these guarantees diminishes rapidly with high values.
privacy. Although the theoretical guarantees provided by differential
Like we did for logistic regression, we also analyze the vul- privacy are very appealing, once values exceed small values,
nerability of individual records across multiple model training the practical value of these guarantees is insignificant—in
runs with RDP at = 1000. Figure 11 shows the membership most of our inference attack figures, the theoretical bound
predictions across multiple runs. In contrast to the logistic given by -DP falls off the graph before any measurable pri-
regression results, for the neural networks the fraction of repet- vacy leakage occurs (and at levels well before models provide
itively exposed members decreases with increasing number of acceptable utility). The value of these privacy mechanisms
runs while the number of repeated false positives drops more comes not from the theoretical guarantees, but from the im-
drastically, leading to a significant increase in PPV. The ad- pact of the mechanism on what realistic adversaries can infer.
versary identifies 2,386 true members across all the five runs Thus, for the same privacy budget, differential privacy tech-
with a PPV of 0.82. In comparison, the adversary exposes niques that produce tighter bounds and result in lower noise
7,716 members with 0.94 PPV using the single non-private requirements come with increased concrete privacy risks.
model. So, although the private models do leak significant We note that in our inference attack experiments, we use
0.25 0.25 0.25
0.20 RDP 0.20 RDP 0.20
-DP Bound -DP Bound
Privacy Leakage
-DP Bound
Privacy Leakage
Privacy Leakage
0.15 0.15 0.15
0.10 0.10 0.10
NC zCDP
0.05 zCDP 0.05 0.05 zCDP RDP
AC NC AC
0.00 0.00 AC 0.00 NC
10 2 10 1 100 101 102 103 10 2 10 1 100 101 102 103 10 2 10 1 100 101 102 103
Privacy Budget ( ) Privacy Budget ( ) Privacy Budget ( )
(a) Shokri et al. membership inference (b) Yeom et al. membership inference (c) Yeom et al. attribute inference
0.663 PPV
0.25
6177
Random,
Independent Predictions
0.00
TP (7680) TP (7655) >= 0 >= 1 >= 2 >= 3 >= 4 =5
Number of times identified as member (out of 5 runs)
(a) Overlap of membership predictions across two runs (b) Predictions across multiple runs
Figure 13: Predictions across multiple runs of neural network with RDP at = 1000 (Purchase-100)
equal numbers of member and non-member records which veal that the commonly-used combinations of values and the
provides 50-50 prior success probability to the attacker. Thus, variations of differential privacy in practical implementations
even an -DP implementation might leak even for small may provide unacceptable utility-privacy trade-offs. We hope
values, though we did not observe any such leakage. Alterna- our study will encourage more careful assessments of the
tively, a skewed prior probability may lead to smaller leakage practical privacy value of formal claims based on differential
even for large values. Our goal in this work is to evalu- privacy, and lead to deeper understanding of the privacy im-
ate scenarios where risk of inference is high, so the use of pact of design decisions when deploying differential privacy.
50-50 prior probability is justified. We also emphasize that There remains a huge gap between what the state-of-the-art
our results show the privacy leakage due to three particular inference attacks can infer, and the guarantees provided by
inference attacks. Attacks only get better, so future attacks differential privacy. Research is needed to understand the
may be able to infer more than is shown in our experiments. limitations of inference attacks, and eventually to develop
solutions that provide desirable, and well understood, utility-
privacy trade-offs.
5 Conclusion
Differential privacy has earned a well-deserved reputation pro-
viding principled and powerful mechanisms for establishing
privacy guarantees. However, when it is implemented for chal- Availability
lenging tasks such as machine learning, compromises must be
made to preserve utility. It is essential that the privacy impact
of those compromises is well understood when differential Open source code for reproducing all of our experiments is
privacy is deployed to protect sensitive data. Our results re- available at https://github.com/bargavj/EvaluatingDPML.
Acknowledgments reconstruction and its applications in private federated
learning. arXiv:1812.00984, 2018.
The authors are deeply grateful to Úlfar Erlingsson for point-
[9] Mark Bun and Thomas Steinke. Concentrated differ-
ing out some key misunderstandings in an early version of
ential privacy: Simplifications, extensions, and lower
this work and for convincing us of the importance of per-
bounds. In Theory of Cryptography Conference, 2016.
instance gradient clipping, and to Úlfar, Ilya Mironov, and
Shuang Song for help validating and improving the work. We [10] Nicholas Carlini, Chang Liu, Jernej Kos, Úlfar Erlings-
thank Brendan McMahan for giving valuable feedback and son, and Dawn Song. The Secret Sharer: Evaluating and
important suggestions on improving the clarity of the paper. testing unintended memorization in neural networks. In
We thank Vincent Bindschaedler for shepherding our paper. USENIX Security Symposium, 2019.
We thank Youssef Errami and Jonah Weissman for contribu- [11] Kamalika Chaudhuri and Claire Monteleoni. Privacy-
tions to the experiments, and Ben Livshits for feedback on the preserving logistic regression. In Advances in Neural
work. Atallah Hezbor, Faysal Shezan, Tanmoy Sen, Max Nay- Information Processing Systems, 2009.
lor, Joshua Holtzman and Nan Yang helped systematize the
related works. Finally, we thank Congzheng Song and Samuel [12] Kamalika Chaudhuri, Claire Monteleoni, and Anand D.
Yeom for providing their implementation of inference attacks. Sarwate. Differentially private Empirical Risk Mini-
This work was partially funded by grants from the National mization. Journal of Machine Learning Research, 2011.
Science Foundation SaTC program (#1717950, #1915813) [13] Zeyu Ding, Yuxin Wang, Guanhong Wang, Danfeng
and support from Intel and Amazon. Zhang, and Daniel Kifer. Detecting violations of differ-
ential privacy. In ACM Conference on Computer and
Communications Security, 2018.
References
[14] John Duchi, Elad Hazan, and Yoram Singer. Adaptive
[1] Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan subgradient methods for online learning and stochastic
McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. optimization. Journal of Machine Learning Research,
Deep learning with differential privacy. In ACM Confer- 2011.
ence on Computer and Communications Security, 2016. [15] John C Duchi, Michael I Jordan, and Martin J Wain-
[2] Galen Andrew, Steve Chien, and Nicolas Papernot. Ten- wright. Local privacy and statistical minimax rates. In
sorFlow Privacy. https://github.com/tensorflow/privacy. Symposium on Foundations of Computer Science, 2013.
[3] Giuseppe Ateniese, Luigi Mancini, Angelo Spognardi, [16] Cynthia Dwork. Differential Privacy: A Survey of Re-
Antonio Villani, Domenico Vitali, and Giovanni Felici. sults. In International Conference on Theory and Appli-
Hacking smart machines with smarter ones: How to ex- cations of Models of Computation, 2008.
tract meaningful data from machine learning classifiers. [17] Cynthia Dwork and Aaron Roth. The Algorithmic Foun-
International Journal of Security and Networks, 2015. dations of Differential Privacy. Foundations and Trends
[4] Michael Backes, Pascal Berrang, Mathias Humbert, in Theoretical Computer Science, 2014.
and Praveen Manoharan. Membership privacy in [18] Cynthia Dwork and Guy N. Rothblum. Concentrated
MicroRNA-based studies. In ACM Conference on Com- differential privacy. arXiv:1603.01887, 2016.
puter and Communications Security, 2016.
[19] Matt Fredrikson, Somesh Jha, and Thomas Ristenpart.
[5] Eugene Bagdasaryan, Andreas Veit, Yiqing Hua, Deb- Model inversion attacks that exploit confidence informa-
orah Estrin, and Vitaly Shmatikov. How to backdoor tion and basic countermeasures. In ACM Conference on
federated learning. arXiv:1807.00459, 2018. Computer and Communications Security, 2015.
[6] Brett K Beaulieu-Jones, William Yuan, Samuel G [20] Matthew Fredrikson, Eric Lantz, Somesh Jha, Simon
Finlayson, and Zhiwei Steven Wu. Privacy-pre- Lin, David Page, and Thomas Ristenpart. Privacy in
serving distributed deep learning for clinical data. pharmacogenetics: An end-to-end case study of person-
arXiv:1812.01484, 2018. alized warfarin dosing. In USENIX Security Symposium.
[7] Raghav Bhaskar, Srivatsan Laxman, Adam Smith, and [21] Arik Friedman and Assaf Schuster. Data mining with
Abhradeep Thakurta. Discovering frequent patterns differential privacy. In ACM SIGKDD Conference on
in sensitive data. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2010.
Knowledge Discovery and Data Mining, 2010.
[22] Karan Ganju, Qi Wang, Wei Yang, Carl A Gunter, and
[8] Abhishek Bhowmick, John Duchi, Julien Freudiger, Nikita Borisov. Property inference attacks on fully con-
Gaurav Kapoor, and Ryan Rogers. Protection against nected neural networks using permutation invariant rep-
resentations. In ACM Conference on Computer and Advances in Neural Information Processing Systems,
Communications Security, 2018. 2018.
[23] Joseph Geumlek, Shuang Song, and Kamalika Chaud- [36] Kaggle, Inc. Acquire Valued Shoppers Challenge. https:
huri. Rényi differential privacy mechanisms for pos- //kaggle.com/c/acquire-valued-shoppers-challenge/data,
terior sampling. In Advances in Neural Information 2014.
Processing Systems, 2017.
[37] Diederik P Kingma and Jimmy Ba. Adam: A method
[24] Robin C Geyer, Tassilo Klein, and Moin Nabi. Differen- for stochastic optimization. In International Conference
tially private federated learning: A client level perspec- on Learning Representations, 2015.
tive. arXiv:1712.07557, 2017.
[38] Alex Krizhevsky. Learning multiple layers of fea-
[25] Jihun Hamm, Paul Cao, and Mikhail Belkin. Learning
tures from tiny images. Technical report, University
privately from multiparty data. In International Confer-
of Toronto, 2009.
ence on Machine Learning, 2016.
[26] Michael Hay, Ashwin Machanavajjhala, Gerome Mik- [39] Jaewoo Lee. Differentially private variance reduced
lau, Yan Chen, and Dan Zhang. Principled evaluation stochastic gradient descent. In International Conference
of differentially private algorithms using DPBench. In on New Trends in Computing Sciences, 2017.
ACM SIGMOD Conference on Management of Data, [40] Dong-Hui Li and Masao Fukushima. A modified BFGS
2016. method and its global convergence in nonconvex mini-
[27] Nils Homer et al. Resolving individuals contributing mization. Journal of Computational and Applied Math-
trace amounts of DNA to highly complex mixtures us- ematics, 2001.
ing high-density SNP genotyping microarrays. PLoS [41] Ninghui Li, Wahbeh Qardaji, Dong Su, and Jianneng
Genetics, 2008. Cao. PrivBasis: Frequent itemset mining with differen-
[28] Zonghao Huang, Rui Hu, Yanmin Gong, and Eric Chan- tial privacy. The VLDB Journal, 2012.
Tin. DP-ADMM: ADMM-based distributed learning
[42] Ninghui Li, Wahbeh Qardaji, Dong Su, Yi Wu, and Wein-
with differential privacy. arXiv:1808.10101, 2018.
ing Yang. Membership privacy: A unifying framework
[29] Nick Hynes, Raymond Cheng, and Dawn Song. Ef- for privacy definitions. In ACM Conference on Com-
ficient deep learning on multi-source private data. puter and Communications Security, 2013.
arXiv:1807.06689, 2018.
[43] Changchang Liu, Xi He, Thee Chanyaswad, Shiqiang
[30] Ali Inan, Murat Kantarcioglu, Gabriel Ghinita, and Elisa Wang, and Prateek Mittal. Investigating statistical pri-
Bertino. Private record matching using differential vacy frameworks from the perspective of hypothesis
privacy. In International Conference on Extending testing. Proceedings on Privacy Enhancing Technolo-
Database Technology, 2010. gies, 2019.
[31] Roger Iyengar, Joseph P Near, Dawn Song, Om Thakkar, [44] Dong C Liu and Jorge Nocedal. On the limited memory
Abhradeep Thakurta, and Lun Wang. Towards practi- BFGS method for large scale optimization. Mathemati-
cal differentially private convex optimization. In IEEE cal programming, 1989.
Symposium on Security and Privacy, 2019.
[45] Yunhui Long, Vincent Bindschaedler, and Carl A.
[32] Prateek Jain, Pravesh Kothari, and Abhradeep Thakurta.
Gunter. Towards measuring membership privacy.
Differentially private online learning. In Annual Confer-
arXiv:1712.09136, 2017.
ence on Learning Theory, 2012.
[33] Prateek Jain and Abhradeep Thakurta. Differentially pri- [46] Daniel Lowd and Christopher Meek. Adversarial learn-
vate learning with kernels. In International Conference ing. In ACM SIGKDD Conference on Knowledge Dis-
on Machine Learning, 2013. covery and Data Mining, 2005.
[34] Prateek Jain and Abhradeep Guha Thakurta. (Near) [47] Frank McSherry. Statistical inference considered harm-
Dimension independent risk bounds for differentially ful. https://github.com/frankmcsherry/blog/blob/master/
private learning. In International Conference on Ma- posts/2016-06-14.md, 2016.
chine Learning, 2014.
[48] Frank McSherry and Ilya Mironov. Differentially private
[35] Bargav Jayaraman, Lingxiao Wang, David Evans, and recommender systems: Building privacy into the Netflix
Quanquan Gu. Distributed learning without distress: prize contenders. In ACM SIGKDD Conference on
Privacy-preserving Empirical Risk Minimization. In Knowledge Discovery and Data Mining, 2009.
[49] Frank McSherry and Kunal Talwar. Mechanism design [62] Reza Shokri, Marco Stronati, Congzheng Song, and Vi-
via differential privacy. In Symposium on Foundations taly Shmatikov. Membership inference attacks against
of Computer Science, 2007. machine learning models. In IEEE Symposium on Secu-
rity and Privacy, 2017.
[50] Ilya Mironov. Rényi differential privacy. In IEEE Com-
puter Security Foundations Symposium, 2017. [63] Adam Smith and Abhradeep Thakurta. Differentially
Private Feature Selection via Stability Arguments, and
[51] Luis Munoz-González, Battista Biggio, Ambra Demon- the Robustness of the Lasso. In Proceedings of Confer-
tis, Andrea Paudice, Vasin Wongrassamee, Emil C. ence on Learning Theory, 2013.
Lupu, and Fabio Roli. Towards poisoning of deep learn-
ing algorithms with back-gradient optimization. In ACM [64] Shuang Song, Kamalika Chaudhuri, and Anand D Sar-
Workshop on Artificial Intelligence and Security, 2017. wate. Stochastic gradient descent with differentially
private updates. In IEEE Global Conference on Signal
[52] Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. and Information Processing, 2013.
Smooth sensitivity and sampling in private data analysis.
In ACM Symposium on Theory of Computing, 2007. [65] Kunal Talwar, Abhradeep Thakurta, and Li Zhang.
Private Empirical Risk Minimization beyond the
[53] Nicolas Papernot, Martín Abadi, Úlfar Erlingsson, Ian worst case: The effect of the constraint set geometry.
Goodfellow, and Kunal Talwar. Semi-supervised knowl- arXiv:1411.5417, 2014.
edge transfer for deep learning from private training
data. In International Conference on Learning Repre- [66] Kunal Talwar, Abhradeep Thakurta, and Li Zhang.
sentations, 2017. Nearly Optimal Private LASSO. In Advances in Neural
Information Processing Systems, 2015.
[54] Mijung Park, Jimmy Foulds, Kamalika Chaudhuri, and
[67] Florian Tramèr, Fan Zhang, Ari Juels, Michael Reiter,
Max Welling. DP-EM: Differentially private expectation
maximization. In Artificial Intelligence and Statistics, and Thomas Ristenpart. Stealing machine learning mod-
2017. els via prediction APIs. In USENIX Security Symposium,
2016.
[55] Manas Pathak, Shantanu Rane, and Bhiksha Raj. Mul-
[68] Binghui Wang and Neil Zhenqiang Gong. Stealing hy-
tiparty Differential Privacy via Aggregation of Locally
perparameters in machine learning. In IEEE Symposium
Trained Classifiers. In Advances in Neural Information
on Security and Privacy, 2018.
Processing Systems, 2010.
[69] Di Wang, Minwei Ye, and Jinhui Xu. Differentially
[56] NhatHai Phan, Yue Wang, Xintao Wu, and Dejing Dou. private Empirical Risk Minimization revisited: Faster
Differential privacy preservation for deep auto-encoders: and more general. In Advances in Neural Information
An application of human behavior prediction. In AAAI Processing Systems, 2017.
Conference on Artificial Intelligence, 2016.
[70] Xi Wu, Matthew Fredrikson, Somesh Jha, and Jeffrey F
[57] NhatHai Phan, Xintao Wu, and Dejing Dou. Preserv- Naughton. A methodology for formalizing model-
ing differential privacy in convolutional deep belief net- inversion attacks. In IEEE Computer Security Foun-
works. Machine Learning, 2017. dations Symposium, 2016.
[58] Boris T Polyak and Anatoli B Juditsky. Acceleration of [71] Xi Wu, Fengan Li, Arun Kumar, Kamalika Chaudhuri,
stochastic approximation by averaging. SIAM Journal Somesh Jha, and Jeffrey Naughton. Bolt-on differential
on Control and Optimization, 1992. privacy for scalable stochastic gradient descent-based
[59] Md Atiqur Rahman, Tanzila Rahman, Robert Laganière, analytics. In ACM SIGMOD Conference on Manage-
Noman Mohammed, and Yang Wang. Membership in- ment of Data, 2017.
ference attack against differentially private deep learning [72] Huang Xiao, Battista Biggio, Blaine Nelson, Han Xiao,
model. Transactions on Data Privacy, 2018. Claudia Eckert, and Fabio Roli. Support vector ma-
[60] Ahmed Salem, Yang Zhang, Mathias Humbert, Pascal chines under adversarial label contamination. Neuro-
Berrang, Mario Fritz, and Michael Backes. ML-Leaks: computing, 2015.
Model and data independent membership inference at- [73] Mengjia Yan, Christopher Fletcher, and Josep Torrellas.
tacks and defenses on machine learning models. In Cache telepathy: Leveraging shared resource attacks to
Network and Distributed Systems Security Symposium. learn DNN architectures. arXiv:1808.04761, 2018.
[61] Reza Shokri and Vitaly Shmatikov. Privacy-preserving [74] Chaofei Yang, Qing Wu, Hai Li, and Yiran Chen. Gener-
deep learning. In ACM Conference on Computer and ative poisoning attack method against neural networks.
Communications Security, 2015. arXiv:1703.01340, 2017.
[75] Samuel Yeom, Irene Giacomelli, Matt Fredrikson, and Efficient private ERM for smooth objectives. In Interna-
Somesh Jha. Privacy risk in machine learning: Analyz- tional Joint Conference on Artificial Intelligence, 2017.
ing the connection to overfitting. In IEEE Computer
Security Foundations Symposium, 2018. [79] Jun Zhang, Zhenjie Zhang, Xiaokui Xiao, Yin Yang, and
[76] Lei Yu, Ling Liu, Calton Pu, Mehmet Emre Gursoy, and Marianne Winslett. Functional mechanism: Regression
Stacey Truex. Differentially private model publishing analysis under differential privacy. The VLDB Journal,
for deep learning. In IEEE Symposium on Security and 2012.
Privacy, 2019.
[80] Lingchen Zhao, Yan Zhang, Qian Wang, Yanjiao Chen,
[77] Matthew D Zeiler. ADADELTA: An adaptive learning
Cong Wang, and Qin Zou. Privacy-preserving col-
rate method. arXiv:1212.5701, 2012.
laborative deep learning with irregular participants.
[78] Jiaqi Zhang, Kai Zheng, Wenlong Mou, and Liwei Wang. arXiv:1812.10113, 2018.