Data Anonymization - SAP
Data Anonymization - SAP
PUBLIC
(5) WHAT IS THE DIFFERENCE BETWEEN DYNAMIC DATA MASKING AND DIFFERENTIAL
PRIVACY? I CAN ALSO CALCULATE AVERAGES USING DYNAMIC DATA MASKING WITHOUT
EXPOSING ANY SENSITIVE DATA.
Yes, masking will keep the utility of the data but usually does not lead to an anonymized data set. Please
refer to the question “Why is anonymization difficult?” for more information.
(8) WHAT ARE THE PREREQUISITES FOR USING THE DIFFERENT ANONYMIZATION METHODS
(E.G. SIZE/COMPLEXITY OF DATA SET)?
Making a data set k-anonymous is a computationally complex task. Thus, the service might take longer and
require more memory to k-anonymize a data set compared to applying differential privacy. Due to the nature
of a stateless service, there are limitations on the file size you can upload and on the processing time.
(9) WHAT HAPPENS IF I SHARE AN ANONYMIZED DATA SET WITH EXTERNAL RESEARCHERS,
BUT THEY ONLY USE HALF OF THE DATA? DOES ANONYMIZATION STILL WORK?
Anonymization always covers the complete data set provided regardless of how much used. Even though
only single line items might be queried and the remaining part is ignored, it does not change anything with
respect to the privacy guarantees.
(10) WHICH GUARANTEES CAN YOU GIVE THAT DATA IS TRULY ANONYMIZED? WHAT CAN I
TELL MY DATA PROTECTION OFFICER?
In a nutshell: k-anonymity makes an individual indistinguishable within a group of at least k members.
Applying the differential privacy method makes sure that an individual contribution does not change the
probability of a query result. You’ll find more detailed explanations on the guarantees in the documentation.
Additionally, both methods are included in the EU Opinion 05/2014 (http://ec.europa.eu/justice/data-
protection/article-29/documentation/opinion-recommendation/files/2014/wp216_en.pdf).
(12) WHAT ARE THE ATTACK SCENARIOS THAT ANONYMIZATION PROTECTS AGAINST?
See the documentation.
(13) I’VE HEARD THAT DIFFERENTIAL PRIVACY/K-ANONYMITY IS BROKEN. IS THIS TRUE?
“Broken” is a word usually used in the context of encryption systems. It means that it is possible to decipher
an encrypted message without requiring the appropriate key, or in other words, without having the
legitimation to read the clear text message. Anonymization however works differently: The whole purpose is
to provide clear text data for analytics. Of course, deciphering and seeing the whole original message does
not make sense.
Anonymization methods give certain guarantees. If applied correctly, differential privacy gives strong
statistical guarantees and k-anonymity gives intuitive guarantees of one individual being indistinguishable
from another within a certain group. The k-anonymity guarantee does not cover any detailed estimation on
the information within a certain group: Assuming a group contains potential terminal diseases, then an
attacker might gain some information without knowing the exact disease. However, this does not mean that
k-anonymity is broken since the intuitive guarantee still holds. It rather means that k-anonymity might not be
the method of choice or should be configured differently.
To be on the safe side, anonymization is usually not used standalone, but together with standard data
protection mechanisms like access control/authorization. For instance, even though the data set is
anonymized only a limited number of persons have access to it.
Copyright/Trademark