A Data Masking Scheme For Sensitive Big Data Based On Format
A Data Masking Scheme For Sensitive Big Data Based On Format
Abstract
Development of big data has brought us convenience and benefits, but the privacy issues
have become increasingly prominent. In order to solve the problem of personal sensitive
information leakage, this paper propose a data masking scheme based on format-preserving
encryption for privacy information. The scheme can be used to encrypt credit card number,
date, e-mail address and other data with tight format limit, and ensure the cipher text is still in
the original format constraints. In addition, this paper propose a solution for large scale of
data masking. Experiments on Spark show that our data masking scheme based on format-
preserving encryption can achieve the purpose of masking sensitive information while
preserving the data format, and the parallel computing can get high efficiency.
Existing System
In order to mask huge amounts of information, some data privacy protection schemes have been
mentioned. The traditional data encryption scheme can encrypt the data irreversibly, such as using
AES algorithm [1] to encrypt the name field. This can mask the name and distinguish different
individuals, but the result is a binary bit string, which has lost the original data format, so it can
neither be saved back to the database nor be identified as effective information.
Proposed System
System Specification
HARDWARE REQUIREMENTS
SOFTWARE REQUIREMENTS: