0% found this document useful (0 votes)
29 views14 pages

A Review of Secure and Privacy-Preserving Medical Data Sharing

Uploaded by

lila
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views14 pages

A Review of Secure and Privacy-Preserving Medical Data Sharing

Uploaded by

lila
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Received April 17, 2019, accepted May 7, 2019, date of publication May 14, 2019, date of current version

May 23, 2019.


Digital Object Identifier 10.1109/ACCESS.2019.2916503

A Review of Secure and Privacy-Preserving


Medical Data Sharing
HAO JIN 1, YAN LUO1 , PEILONG LI2 AND JOMOL MATHEW3
1 Department of Electrical and Computer Engineering, University of Massachusetts at Lowell, Lowell, MA 01854, USA
2 Department of Computer Science, Elizabethtown College, Elizabethtown, PA 17022, USA
3 Department of Quantitative Health Sciences and Information Technology, University of Massachusetts Medical School, Worcester, MA 01655, USA

Corresponding author: Yan Luo ([email protected])


This work was supported in part by the National Science Foundation of USA under Grant 1547428, Grant 1738965, Grant 1450996, and
Grant 1541434.

ABSTRACT In the digital healthcare era, it is of the utmost importance to harness medical information
scattered across healthcare institutions to support in-depth data analysis and achieve personalized healthcare.
However, the cyberinfrastructure boundaries of healthcare organizations and privacy leakage threats place
obstacles on the sharing of medical records. Blockchain, as a public ledger characterized by its transparency,
tamper-evidence, trustlessness, and decentralization, can help build a secure medical data exchange network.
This paper surveys the state-of-the-art schemes on secure and privacy-preserving medical data sharing
of the past decade with a focus on blockchain-based approaches. We classify them into permissionless
blockchain-based approaches and permissioned blockchain-based approaches and analyze their advantages
and disadvantages. We also discuss potential research topics on blockchain-based medical data sharing.

INDEX TERMS Access control, blockchain, encryption, medical data, privacy, security.

I. INTRODUCTION choose to build their healthcare systems in a closed domain


Data is an asset with value, and particularly so today when with a defensive perimeter, such as a private network
cloud computing, big data, and Internet of things are embrac- equipped with firewalls and intrusion detection systems. This
ing each other. This unprecedented era of technological con- has created the medical data silos of today that are scattered
fluence poses great challenges for data security and privacy. throughout various healthcare institutions, preventing collab-
As an example, in 2013, Yahoo experienced a data breach that orative healthcare treatment and medical research. On the
put the information of over 3 billion users at risk, which is other hand, the era of cloud computing and big data neces-
almost half of the entire human population. And this incident sitates that medical data be shared among various users and
is just one example of countless data breach events [1]. institutions to allow analysis, so that better healthcare service
Electronic Medical Record (EMR) data, especially Pro- and new treatment plans can be provided.
tected Health Information (PHI), suffers from an even greater In summary, the secure and privacy-preserving sharing of
risk. According to a recent investigation [2], there has been an clinical information mainly faces the following obstacles:
upward trend in the number of medical records exposed each
year. Healthcare data breaches are now happening at a rate
A. MASSIVE DATA INCREASING AT A RAPID SPEED
of more than one per day. To strengthen medical data gov-
Medical data such as X-ray images, computed tomogra-
ernance, privacy protection regulations, such as the Health
phy, and genetic data are large in size, and their vol-
Insurance Portability and Accountability Act (HIPAA) [3] in
umes are increasing at a rate of 20-40 percent every year.
the United States or the General Data Protection Regulation
In 2015, an average healthcare provider in United States
(GDPR) [4] in Europe, require data to be stored and shared in
needed to manage 665 terabytes of patient information,
a secure and privacy-preserving way and may inflict severe
80 percent of which was unstructured medical images. Even
penalties for events of healthcare data breach.
worse, it is estimated that big data in healthcare will reach
Consequently, to enhance security safeguards and avoid
25,000 petabytes in 2020 [5].
privacy leakage, most healthcare providers and hospitals
The challenges include not only how to store such a
The associate editor coordinating the review of this manuscript and massive amount of data with existing IT infrastructure,
approving it for publication was Sedat Akleylek. but also how to ensure its confidentiality and integrity
2169-3536 2019 IEEE. Translations and content mining are permitted for academic research only.
61656 Personal use is also permitted, but republication/redistribution requires IEEE permission. VOLUME 7, 2019
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
H. Jin et al.: Review of Secure and Privacy-Preserving Medical Data Sharing

while maintaining high availability among clinicians, med- TABLE 1. HIPAA technical safeguard requirements.
ical researchers, and collaborators.

B. CROSS-INSTITUTIONAL DATA INTEROPERABILITY


Most existing healthcare systems are built on an enclosed
domain with a network defense perimeter to prevent outside
attacks and threats. This is a huge hurdle to cross-institutional
data sharing due to two reasons:
1) domain cyberinfrastructure and its perimeter impede
data access from outside;
2) an independent domain usually has its own data man-
agement policy, making it difficult to guarantee the
compatibility of any two domains.
The direct consequence of this network defense perimeter
is the lack of data interoperability on medical information,
which further poses a barrier to medical analytics that require
a large amount of clinical information. Moreover, it also
creates an inconvenience for patients seeking better treatment
plans when their medical records are scattered across multiple The remainder of the paper is organized as follows. Section II
hospitals. Here, a healthcare domain refers to an enclosed briefly introduces the HIPAA and blockchain background
hospital ecosystem that is built on a private network, where all that is necessary to understand the schemes surveyed in
external access to internal databases and devices are through following sections. Section III describes the schemes on
authenticated connections such as VPN. It is a widely adopted healthcare data sharing based on cloud computing, cryptogra-
architecture for today’s healthcare data management. phy, and blockchain technology. Finally, we point out several
Hence, a more holistic and integrated healthcare infrastruc- potential future directions for blockchain-based approaches.
ture is needed to facilitate the secure sharing and interopera-
tion on medical data among various healthcare domains, and II. BACKGROUND
to enable collaborative healthcare service and research. A. REGULATORY COMPLIANCE REQUIREMENTS ON
SECURITY AND PRIVACY
C. SECURITY AND PRIVACY HIPAA and HITECH Act [3] extend security and privacy
Security should provide protection for medical data in transit requirements to business associates. These guidelines stipu-
and at rest, with traditional security goals on data confiden- late that all necessary measures are in place to keep patient
tiality, integrity, and availability being fulfilled. Currently, data secure whenever it is accessed, saved, or shared. Lack
the Transport Layer Security (TLS) protocol can be used to of compliance to the HIPAA security standards could lead to
guarantee the security of data in transit. For data at rest, cryp- significant fines and, in some cases, loss of medical licenses.
tography primitives such as data encryption, digital signature, Table 1 lists a collection of technical safeguard standards
and access control mechanisms can ensure secure access in a along with implementation specifications, where we can see
single domain. However, how to enforce cross-domain access that the HIPAA regulation covers almost every aspect of
control and secure sharing of medical data in a state-wide or security. Besides basic requirements such as confidentiality,
even national scale remains a challenging task. integrity, and authentication in traditional information secu-
Privacy is a closely-related concept to security but has its rity, new requirements such as access control with identity
own concentrations, i.e., it assures that personal informa- tracking and emergency access, and activity auditing are
tion is collected, used, and protected legally. For example, also included. This implies that the secure management of
the privacy compliance regulations require all electronic Pro- healthcare data is a hybrid approach, which requires various
tected Health Information (ePHI) related activities, across the mechanisms and technical means to be incorporated to meet
entirety of data storage, transfer, and provision, to consis- these security and privacy targets.
tently abide by security and privacy rules.
Generally, the difficulty primarily lies in that the security B. BLOCKCHAIN AND SMART CONTRACT
and privacy of healthcare information should be protected Since the emergence of Bitcoin [7] in 2009, blockchain tech-
not only from external attackers, but also from unauthorized nology has garnered a wide reputation in decentralized com-
access from within the network or system [6]. Therefore, puting. In essence, blockchain can be viewed as a decentral-
new methods, architectures, or computing paradigms may be ized, immutable, public ledger where transactions are stored
needed to address security and privacy problems in medical in chained blocks without the existence of a trusted central
data sharing area. authority. Many cryptographic primitives (e.g., Merkle hash
In this paper, we surveyed the state-of-the-art approaches tree, chained hash, and digital signatures) are adopted in
in secure medical data sharing and management. blockchain to guarantee its security.

VOLUME 7, 2019 61657


H. Jin et al.: Review of Secure and Privacy-Preserving Medical Data Sharing

1) PERMISSIONLESS AND PERMISSIONED BLOCKCHAINS


Generally, blockchain can be categorized into two types:
permissionless and permissioned. The difference mainly lies
in the consensus protocol executed behind the peer-to-peer
network.
Permissionless or public blockchains allow every user to
participate in the network by creating and verifying trans-
actions and adding blocks to the ledger. Bitcoin is the
most famous example of permissionless blockchains, which
applies a Proof of Work (PoW) algorithm to ensure network
consensus [8]: a mathematical puzzle needs to be solved for
each new mined block. Ethereum, the successor of Bitcoin,
uses a combination of Proof of Work and Proof of Stake [9].
FIGURE 1. Classification of the state-of-the-art schemes.
Both strategies require participating nodes to add blocks at a
certain cost, either at the expense of computation or capital.
In contrast, permissioned or consortium blockchains act Considering the fact that users and cloud providers usu-
more like a closed ecosystem: they maintain an access control ally belong to different administrative or security domains,
layer to allow certain actions to be performed only by certain the difficulty of cloud-based data sharing lies in how much
kinds of nodes. That means nodes in the network are not trust users can place on cloud service providers. Such a lack
equal to each other. In essence, they sacrifice some degree of of trust stems from the lack of transparency and the loss of
decentralization to regain some centralization so that better data control [18], [19] by users in cloud environments:
control can be enforced to achieve their goals. Hyperledger • outsourcing data to the cloud is essentially the handover
[10] is an increasingly popular, collaborative permissioned of physical control from one trust domain (local storage)
blockchain that aims at advancing cross-industry blockchain to another trust domain (cloud storage).
technologies. With Fabric being its most influential project, • user’s data are stored across many physical locations
Hyperledger adopts BFT-SMART state machine replication and web sites. Users are not aware of where their data
algorithm [11], [12], a variant of the practical byzantine actually are and whether the security mechanisms of
fault tolerance (PBFT) [13], [14] consensus algorithm, as its these sites meet their requirements.
consensus protocol. Hyperledger provides the opportunity to
Medical information management based on cloud com-
broaden the scope of blockchain technology beyond cryp-
puting faces the same problems. Moreover, due to the secu-
tocurrency transactions to other fields including healthcare
rity and privacy regulations of HIPAA, cross-institutional
data management.
medical data sharing becomes even more complicated and
challenging.
2) SMART CONTRACT
The script language embedded in Bitcoin is not Turing- 1) PROBLEMS OF CLOUD-BASED HEALTHCARE
complete, which is implemented with stack-based operations. MANAGEMENT
Hence it is difficult to extend Bitcoin to support various appli-
Existing IT infrastructure deployments within a medical orga-
cations. It was not until 2015 when Ethereum [15] pioneered
nization are usually based on private cloud architectures,
to instantiate the ‘‘Smart Contract’’ concept by designing
which bring limitations on scalability and data sharing [20].
a rich programming language and enabling it with Turing-
Private cloud [21] refers to a cloud computing model where
completeness. It has become a trend to build various decen-
IT services are provided over private infrastructure for the
tralized applications upon blockchain and smart contracts.
dedicated use of a single organization.
Smart contracts are small-size user-defined computer pro-
Because building highly scalable private clouds requires
grams that specify rules governing transactions, which run
a large investment on computing and storage devices, and
atop blockchain and are enforced by a network of peers.
the rapidly changing volume of clinical data makes it dif-
Smart contracts automatically execute whenever certain pre-
ficult to accurately estimate required cloud capacity in the
defined conditions are met. Currently, Solidity [16] under
future [6], private cloud-based approaches are inconvenient
Ethereum platform and Chaincode [17] under Hyperledger
for collaborators who reside outside of the domain perimeter
platform are the two most widely used programming lan-
to access data stored in the domain. These limitations prevent
guages for smart contracts.
the further sharing of medical information demanded by big
data analytics.
III. SURVEY ON MEDICAL DATA MANAGEMENT On the other hand, public clouds support scalability and
A. CLOUD AND CRYPTOGRAPHY IN HEALTHCARE data sharing well. However, the multi-tenancy characteristic
Since the emergence of cloud computing, secure data sharing of public cloud services decides virtual machines are shared
in a distributed setting has long been a challenging topic. among various applications that expose the data to different

61658 VOLUME 7, 2019


H. Jin et al.: Review of Secure and Privacy-Preserving Medical Data Sharing

types of attacks. Worse still, it is difficult to detect or monitor control and secure sharing of Personal Health Records
such attacks in a shared VM environment. (PHRs). Fabian et al. [29] put forward a collaborative archi-
Whether private or public clouds are adopted for healthcare tecture for inter-organizational sharing of medical data in
management to guarantee data security and privacy, a basic semi-trusted cloud computing environments, which adopts
requirement is data encryption. Unfortunately, a dilemma attribute-based encryption for selective authorization of data
comes from the key management problem. Letting cloud access and a secret sharing technique to securely distribute
users manage encryption keys certainly will enhance data data across multiple clouds.
security and provide better control; however, it would be a Guo et al. [30] combined blockchain technology with a
troublesome burden for users to distribute corresponding keys multi-authority attribute-based signature scheme to secure
to authorized users, which limits its scalability of sharing the storage and access of electronic health records. An ABE
data among a large number of institutions. This is the pri- signature reveals only that the verified message must be
mary limitation of traditional key distribution center (KDC)- signed by a signer whose attributes satisfy certain predicates,
based solutions. Allowing cloud providers to control the keys which prevents identity leakage when a user signs a message.
could potentially increase the risks of data leakage because However, their scheme encapsulates and stores health records
cloud administrators have the chance to ‘‘touch’’ the keys in on-chain blocks, which limits its scalability since the size
and further to decrypt data. This is the dilemma faced by of on-chain stored data has a great impact on the network
HIPAA-compliant clouds [22] such as Amazon, Google, and throughput.
Microsoft that provide externally-hosted clouds for medical Narayan et al. [31] presented a patient-centric EHR system
information management. to let patients selectively share portions of their health data
stored in cloud. They adopted a broadcast attribute-based
2) CRYPTOGRAPHY FOR MEDICAL DATA SHARING encryption (bABE) to enforce access control to medical files.
To address the aforementioned problems, one possible solu- Meanwhile, they provided public-key encryption with key-
tion is to enable owner-dominated security mechanisms word search (PKES) on encrypted data. However, their design
for medical data outsourced to clouds. Such mechanisms lacked algorithmic details about adopted bABE and PKES
[23]–[26] are designed to protect the security of remotely schemes.
stored data in cloud computing, which demonstrate that pro- Barua et al. [32] proposed an efficient and secure
viding owners with data access control is more important than patient-centric access control (ESPAC) scheme on the basis
letting the cloud take the full control over their data. Since of the ciphertext-policy attribute-based encryption to allow
users no longer physically possess their data, they want to at patient-centric access control. Identity-based encryption was
least be able to decide who can visit their data. This can return adopted to secure end-to-end communications where identity
some control back to data owners, therefore promoting users’ privacy, message integrity, and non-repudiation are ensured.
confidence in data security. Chen and Hoang [33] gave a cloud-based privacy-aware
Another trend revealed from these cloud-based data shar- role-based access control (CPRBAC) model for data con-
ing schemes is that traditional security means adopted in a trollability and traceability, and authorized access to health-
single administrative domain are insufficient for medical data care cloud resources. They also designed an active auditing
sharing across multiple healthcare domains. Hence, more scheme to monitor and report illegal operations. However,
advanced cryptographic primitives with rich access control their work does not contain any cryptographic primitive to
semantics and strict confidentiality enforcement are required. ensure data confidentiality and integrity.
Currently, there are some research projects that focus on
adopting advanced cryptography to secure medical data shar- 3) DISCUSSION
ing based on cloud storage platforms.
It should be noted that data interoperability remains a signifi-
Li et al. [27] used attribute-based encryption (ABE)
cant issue in cloud environments due to the incompatibility of
for secure sharing of personal health records stored in
various cloud services. Let us consider medical data sharing
semi-trusted cloud servers. Their design divides secu-
on a statewide or national scale. It involves many cloud
rity domains into public domains (physicians and medical
providers, and each provider has its own data security and
researchers) and personal domains (family members and
privacy safeguards. To what extents will these mechanisms
friends), where two types of ABE schemes (e.g., a revocable
of various providers be compatible with each other? Unfor-
key-policy ABE scheme and a multi-authority ABE scheme)
tunately, the answer is unclear. We will discuss it further in
are adopted to address data sharing in public and private
following sections.
domains, respectively. Despite patients’ full control of their
medical information, the scheme poses too much burden
on patients, since the patient side applications are required B. ANONYMIZATION-BASED PRIVACY PRESERVATION IN
to generate and distribute corresponding keys to authorized HEALTHCARE
users. 1) DATA ANONYMIZATION MODELS
Jianghua et al. [28] proposed using ciphertext-policy Privacy-preserving data publishing has gained much atten-
attribute-based signcryption to provide fine-grained access tion recently, especially when data mining and analytics is

VOLUME 7, 2019 61659


H. Jin et al.: Review of Secure and Privacy-Preserving Medical Data Sharing

becoming a mainstream technological trend in the big data identities, incurring excessive information loss, or harming
era. Researchers have designed various data anonymization data usefulness. The approach transforms data via disas-
algorithms such as generalization, suppression, and diversity sociation, which is an operation that splits health records
slicing to protect individuals’ privacy in transactional data into carefully constructed subrecords to hide combinations of
publishing. diagnosis codes.
Generally, there are three different privacy-preserving
models (i.e., k-anonymity, l-diversity, and t-closeness, in the 3) DISCUSSION
order of increasing complexity). Among them, k-anonymity Data anonymization is an ongoing research area. However,
was proposed by Sweeney and Samarati in 2001. The philos- it needs to strike a balance between anonymity and data
ophy behind it is to allow each combination of quasi iden- utility. Currently, none of k-anonymity, l-diversity, and
tifiers (non-identifiable attributes that could jointly identify t-closeness can completely ensure that no privacy leakage
an individual, e.g., birth date and zip code) be indistinctly occurs while maintaining a reasonable level of data util-
matched to at least k individuals, which means a specific ity [37], [38]. Specifically, k-anonymity and l-diversity do
person’s information cannot be distinguished from other k −1 not protect anonymity from every attack (e.g., homogeneity
persons’ information in a dataset. attack, background attack, similarity attack, and skewness
A stronger privacy protection model is l-diversity, which attack etc.) [46]. In contrast, t-closeness offers complete
requires that each sensitive attribute include at least l privacy but severely impairs the correlations between key
well-represented values in the published dataset besides keep- attributes and confidential attributes. Hence, it would be bet-
ing k-anonymity property. t-closeness is a further refinement ter to integrate data anonymization with other techniques to
of l-diversity model that preserves privacy by reducing the achieve a good trade-off between privacy preservation and
granularity of data representation, which treats values of an data utility.
attribute distinctly by taking into account the distribution of
values of the attribute. It is a trade off that leads to some loss C. BLOCKCHAIN IN HEALTHCARE
of effectiveness of data mining in order to gain some privacy. Recently, with the adoption of blockchain technology becom-
Based on these three models, various algorithms ing a widespread trend in distributed computing, many
[34]–[36] focusing on improving these anonymization mod- researchers now consider using blockchain to secure med-
els have been proposed. Comprehensive surveys of this area ical data sharing and management. Table 2 surveys the
can be found in [37], [38]. state-of-the-art medical data sharing schemes1 based on
Differential privacy [39] is another technique to provide blockchain technology. We compare security metrics (identi-
data anonymization by adding noise to a dataset so that an fication, access control, data authenticity, data encryption) to
attacker cannot determine whether a particular data portion architecture metrics (blockchain type, data storage method)
is included. Soria-Comas et al. [40], [41] proposed using and functionality metrics (smart contract, interoperability).
microaggregation-based k-anonymity to reduce the noise to Moreover, we classify these schemes into two types: per-
be added to generate differential private datasets. missioned blockchain-based approaches and permissionless
blockchain-based approaches.
2) DATA ANONYMIZATION IN HEALTHCARE
HIDE [42] is an integrated health data de-identification sys- 1) APPROACHES BASED ON PERMISSIONLESS
tem for both structured and unstructured data. Basically, BLOCKCHAIN
it deploys a conditional random fields-based technique to Zyskind et al. [58] proposed using blockchain to provide
extract identifiable attributes from unstructured data and a secure and privacy-preserving data sharing among mobile
k-anonymity-based technique to de-identify data while main- users and service providers. Their design proposes two types
taining maximum data utility. of transactions, i.e., transaction Tdata is used for data storage
El Emam et al. [43] proposed an optimal lattice and retrieval, and transaction Taccess is used for access control.
anonymization (OLA) algorithm based on k-anonymity. MedRec [47] is a decentralized EMR management system
It produces a globally optimal de-identification solution suit- based on blockchain technology that provides a functional
able for health datasets. Their evaluation on six datasets prototype implementation. MedRec has designed three kinds
shows that OLA results in less information loss and faster per- of Ethereum smart contracts to associate patients’ medical
formance compared to existing de-identification algorithms. information stored in various healthcare providers to allow
Belsis and Pantziou [44] presented a clustering-based third-party users to access the data after successful authen-
anonymity scheme for sensor data collection and aggregation tication. Specifically, registrar contracts maps node iden-
in wireless medical monitoring environments. Their design tity strings to their Ethereum addresses. A patient-provider
is based on k-anonymity since it protects user privacy by relationship (PPR) contract defines the stewardship and
making an entity indistinguishable from other k − 1 similar
1 We only include in the table schemes with a complete framework or
entities.
system addressing secure medical data sharing. Schemes focusing on a single
Loukides et al. [45] proposed an approach to allow data security functionality are not included in the table, they are discussed in the
owners to share personal health data without disclosing paper instead.

61660 VOLUME 7, 2019


H. Jin et al.: Review of Secure and Privacy-Preserving Medical Data Sharing

TABLE 2. Metrics of surveyed schemes.

ownership of a patient’s clinical data, where access permis- incorporates an authenticated association protocol to initiate
sions and query strings indicating data positions are also a secure link between medical sensors. Afterwards a coordi-
included. A summary contract holds a list of PPR references nator node in the PSN area can broadcast a transaction and
to denote its engagements with other patient nodes or hospital add it to new blocks. However, the authors did not provide
nodes. In implementation, four software components (e.g., details about their consensus protocol and smart contracts.
backend library, Ethereum client, database gatekeeper, and Zhao et al. [59] proposed using fuzzy vault technology to
EMR manager) are deployed on a system node to implement design a lightweight backup and recovery scheme to man-
the business logic of medical data sharing and management. age keys, which are used to encrypt health signals collected
Based on the work of MedRec, Yang and Yang [48] pro- from body sensor networks (BSN) and stored on a health
posed using signcryption and attribute-based authentication blockchain. But their work lack details of how their health
to enable the secure sharing of healthcare data. EHRs are blockchain works.
encrypted with a symmetric key, which is further encrypted Modelchain [60] was designed to adapt blockchain for
with an attribute key set. The concatenation of both cipher- privacy-preserving machine learning to accelerate medical
texts(encrypted EHRs and encrypted key) is signed with a research and facilitate quality improvements. In the design,
private key. For data accessing, a user verifies the signature a proof-of-information algorithm on top of PoW consensus
and performs key decryption and EHRs decryption to get the protocol determines the order of online machine learning to
plaintext EHRs. increase efficiency and accuracy.
Yue et al. [49] presented a healthcare data gateway, which These schemes are proposed to adopt a permissionless or
is a blockchain-based architecture equipped with a purpose- public blockchain to secure medical data sharing and vari-
centric access control policy to let patients own, control, and ous applications (e.g., healthcare sensors, machine learning).
share their medical information without violating privacy. But However, public blockchain is usually crypto-currency driven
their scheme lacks the details of how a service is prevented (bitcoins in Bitcoin or ether in Ethereum), which means a cer-
from knowing the data content when a computation runs on tain amount of cryptocurrency2 has to be paid for transaction
the raw medical data. inclusion and block mining. According to Ethereum yellow
Zhang et al. [54] proposed a pervasive social network 2 Actually, a user has to use real money to buy cryptocurrenty, e.g.,
(PSN)-based healthcare environment, which consists of a $5892 for one BTC(bitcoin currency) in Bitcoin (May 8, 2019). So here we
wireless body area network and a PSN area. Their design regard cryptocurrency expenses as real monetary expenses.

VOLUME 7, 2019 61661


H. Jin et al.: Review of Secure and Privacy-Preserving Medical Data Sharing

paper [15], storing a kilobyte cost 640 thousand gas, which predictive intelligence, and prescriptive intelligence in
amounts to $2.3 even at a relatively low gas price of 20G wei healthcare are achieved on the basis of artificial systems,
(1 Ether = 109 Gwei) and with Ether recently valued at $168 computational experiments, and parallel executions. In their
(May 8, 2019). framework, a consortium blockchain containing patients,
Storing data on a public blockchain can be very expensive. hospitals, health bureaus and communities, and medical
It is financially impractical to store detailed clinical informa- researchers, is deployed. Smart contracts are implemented to
tion of millions of patients on chain. Instead, only a very tiny enable medical records sharing, review, and auditing.
subset of critical metadata can be stored on the blockchain. Liang et al. [55] proposed a user-centric framework on
Data-related behavior in a public blockchain, like access a permissioned blockchain for personal health data sharing,
request, access policy validation, and message transferring, where the Hyperledger Fabric membership service and chan-
can all be costly since they require transactions that describe nel formation scheme are used to ensure privacy protection
them to be generated and included in blocks. and identity management. They implement a mobile app to
collect health data from wearable devices and synchronize
2) APPROACHES BASED ON PERMISSIONED BLOCKCHAIN data to the cloud for storage and sharing with healthcare
Peterson et al. [53] proposed a blockchain-based approach for providers. Zhang and Lin [57] designed a hybrid blockchain-
cross-institutional health information sharing. They designed based secure and privacy-preserving (BSPP) PHI sharing
new transaction and block structures to enable secure access scheme, where a private blockchain is used to store PHI for
of fast healthcare interoperability resources (FHIR) that were each hospital and a consortium blockchain is used to keep
stored off-chain. Moreover, they designed a new consensus secure indices of the PHI. In their design, a public encryption-
algorithm that avoids the expensive computational resources based keyword search scheme [63] is adopted to secure the
consumed by the PoW consensus in Bitcoin. In their design, search of PHI and to ensure identity privacy.
a block would undergo a transaction distribution phase, Patientory [56] is a healthcare peer-to-peer EMR stor-
a block verification request phase, a signed block return age network that leverages blockchain and smart contracts
phase, and a new blockchain distribution phase before being to provide HIPAA compliant health information exchange.
added to the blockchain. A proof-of-interoperability con- The authors also proposed a software framework to address
cept was proposed in their consensus mechanism to ensure the authentication, authorization, access control, and data
transaction data be in conformance with FHIR structural and encryption in system implementation, as well as interoper-
semantic constraints. They also designed a random miner ability enhancement and token management.
election algorithm where each node in the network has an The ChainAnchor [64] system provides anonymous
equal probability to become a miner in the future. However, identity verification for entities performing transactions in
the paper does not mention how the medical data are orga- a permissioned blockchain. The system employs Enhanced
nized, stored, and accessed. The privacy-preserving keyword Privacy ID (EPID) zero-knowledge proof scheme to prove
searches adopted in their framework lack algorithmic details. participants’ anonymity and membership.
Xia et al. proposed BBDS [50], a high-level blockchain- The aforementioned schemes choose consortium or
based framework that permits data users and owners to access permissioned blockchain to secure the storage of medical
medical records from a shared repository after successful information. This is different from approaches based on
verification of their identities and keys. An identity-based public blockchains, such as Bitcoin and Ethereum, which
authentication and key agreement protocol in [61] is used are totally decentralized. Instead, consortium blockchain
to achieve user membership authentication. However, their requires certain permission to access the blockchain. This
secure sharing of sensitive medical information is limited means that participants are selected in advance and only those
to invited and verified users. The authors also proposed authorized nodes can be allowed to access information stored
MedShare [51], a similar blockchain-based framework for on the blockchain. Such a setting is similar to the medi-
medical data sharing that provides data provenance, auditing, cal data sharing scenario, where only healthcare stakehold-
and control in cloud repositories among healthcare providers. ers (patients, healthcare providers, and authorized medical
Fan et al. proposed MedBlock [52], a hybrid blockchain- researchers) can be allowed to access that information based
based architecture to secure electronic medical records on their authorized permissions.
(EMR), where nodes are divided into endorsers, orderers However, in spite of its high throughput, permissioned
and committers. Its consensus protocol is a variant of the blockchain is far from a perfect solution for secure medical
PBFT [14] consensus protocol. However, the authors did data sharing. The most notable disadvantage is the neces-
not explicitly explain the access control policy to allow sity of a central authority, which is usually comprised of a
third-party researchers to access medical data. Moreover, group of companies with a shared interest that will run the
their proposal of using asymmetric encryption algorithms to blockchain network and oversee the whole system. Therefore,
encrypt medical information does not seem to be a good the data immutability in public blockchain is discounted in
option considering the encryption/decryption performance of consortium blockchain, which opens up the possibility of
asymmetric encryption. Wang et al. [62] presented a paral- blockchain rollback by an attacker or a certain authority
lel healthcare system (PHS) where descriptive intelligence, member.

61662 VOLUME 7, 2019


H. Jin et al.: Review of Secure and Privacy-Preserving Medical Data Sharing

D. SOFTWARE-DEFINED INFRASTRUCTURES FOR data management should provide secure storage of raw
HEALTHCARE medical information (confidentiality, integrity), privacy-
While cloud platforms provide flexible and cost-effective preserving data provision (data authenticity, user authen-
computing resources on demand, Software-Defined Infras- tication, access control), auditability, traceability, and data
tructures (SDIs) provisioned at the network edge sup- interoperability. Besides, when blockchain is adopted for
port applications with significant performance requirements, healthcare data sharing, the following key features may need
especially in terms of throughput and latency. SDI technolo- further investigation.
gies are fundamental to many home-based medical applica-
tions [65], [66] due to their programmability of networks via 1) ON-CHAIN OR OFF-CHAIN STORAGE OF MEDICAL DATA?
software-defined networking (SDN), and the feasibility of Blockchain was originally designed to record small size trad-
resource management in the cloud via OpenStack. ing transactions, so its data capacity is usually limited. For
Software-defined networking (SDN), with its capability of instance, the block size in Bitcoin is limited to one megabyte,
decoupling data and control planes, can provide centralized which is insufficient to store medical data such as X-ray
network provisioning and management, accelerate service images. Furthermore, there remain other aspects with regard
delivery, and provide more agility. Thus, it has gained wide to the data cycle that need to be seriously considered.
attention in network-based data management systems. A typi- • On-chain stored data cannot be altered or deleted
cal example is home-based medical applications, where abun- because blockchain is a continually growing public
dant programmable resources are installed at a given patient’s ledger. However, some regulations such as GDPR in
premises, such as desktops, embedded controllers, and smart Europe have strengthened patient rights to erase their
routers, by which apps are allowed to interface with various personal health information since a patient owns his or
home sensors that capture a patient’s real-time activities. All her medical records.
of the heterogeneous resources at every part of the network, • Most data has its life cycle, which makes it unnecessary
including the end point, the edge, and the core, enable the to store these data permanently. This is also enforced by
deployment of high-performance medical services. many data privacy protection laws [6].
Li et al. proposed CareNet [65], [66], a regulation compli- Blockchain itself is a secured and transparent public ledger
ant framework for home-based healthcare, where software- that can guarantee the integrity of on-chain stored data (trans-
defined infrastructure are adopted at the network edge to filter actions and blocks). That means blockchain can be leveraged
and secure health information from home nodes, and further to secure the storage of medical information if we choose on-
to enable a hybrid home-edge-core cloud architecture with chain data storage. However, this naive approach will lead to
high performance and real-time responsiveness for home- poor throughput and performance since on-chain transactions
based healthcare services. Hu et al. [67] proposed a smart and blocks need to be downloaded locally by every peer node,
health monitoring method on the basis of software-defined which leads to a great bandwidth waste. This explains why
networking, where a centralized smart controller is designed most of the state-of-the-art approaches [47], [48], [50]–[52],
to manage all physical devices and provide interfaces to data [55] on medical data sharing chose to store medical infor-
collection, transmission and processing. mation off-chain while data query strings and hash values
are stored on-chain for authenticity and integrity verification.
In such an architecture, medical data can be secured, modified
IV. A BLOCKCHAIN FUTURE FOR and deleted as necessary.
MEDICAL DATA SHARING
A. BLOCKCHAIN FOR MEDICAL DATA SHARING 2) DATA ENCRYPTION OR NOT?
In the previous section, we surveyed the state-of-the-art From the above analysis, it can be seen that on-chain storage
approaches on secure medical data sharing with a focus of medical information is not a good choice due to the limited
on blockchain technology adoption. Regardless of whether block size in current blockchains and the bandwidth waste
the adopted blockchain is permissioned or permissionless, to achieve network consensus. Off-chain storage of medi-
these schemes [30], [47]–[53], [55], [57] shed a light on the cal information seems to be a feasible alternative. However,
blockchain application in medical data sharing and manage- in this case, we should be aware of one fact: blockchain
ment. However, blockchain itself is not a panacea to solve can only guarantee the security of on-chain stored data.
all security and privacy problems in medical data sharing. Hence, for those off-chain stored data, we still need to design
In truth, we should be more aware of the limitations of data storage and access mechanisms with appropriate crypto-
blockchain technology than of its advantages so that we can graphic primitives to fulfill its security and privacy goals.
compensate for its disadvantages by integrating with other Before going ahead, a basic question should be answered:
techniques (e.g., cryptographic primitives) to address the should off-chain stored medical data be encrypted? Accord-
security problems of medical information management. ing to a 2014 study [68], over 50% of security breaches
Secure sharing of medical data involves patients, health- occur in the medical industry, with up to 90% of healthcare
care providers, and third-party medical researchers. Due organizations having exposed their data or had it stolen. It is
to the privacy and security regulations of HIPAA, medical obvious that storing plain-text medical records in a medical

VOLUME 7, 2019 61663


H. Jin et al.: Review of Secure and Privacy-Preserving Medical Data Sharing

TABLE 3. Characteristics of permissioned and public blockchain. B. CRYPTOGRAPHY FOR MEDICAL DATA SHARING
Considering that current blockchain cannot accommodate
medical information due to its limited block size, storing
medical data off-chain seems to be the only feasible solution.
Securing the storage of these off-chain stored data becomes a
challenge. This section briefly introduces some mainstream
cryptographic primitives used for access control, key and
privilege management.

1) BROADCAST ENCRYPTION
Broadcast encryption was first introduced in [70] and
database undoubtedly will increase leakage risks, which is improved in [71], [72], which let an owner encrypt a small
primarily due to following reasons: piece of data to a subset of users. Only users in the subset can
1) once a healthcare system is compromised, then all med- decrypt the broadcast message to recover the data. In crypto-
ical information could be leaked; graphic cloud storage [24]–[26], instead of directly encrypt-
2) despite the strict access control policy deployed in a ing data content, keys are encrypted by broadcast encryption
healthcare system, an internal IT technical staff mem- schemes to enforce access control where authorized users
ber still can easily ‘‘touch’’ the data, which makes data can recover the key by decrypting the broadcast message,
confidentiality difficult to guarantee. whereas unauthorized or revoked users cannot find sufficient
In this context, we believe that encryption of medical data information to decrypt the message.
and secured key storage are two necessary steps to enhance
the security and privacy of medical information. Data encryp- 2) IDENTITY-BASED ENCRYPTION
tion can be the last line of defense when a healthcare system The concept of identity-based encryption (IBE) was first
is compromised because an attacker can learn nothing about proposed by Shamir [73] in 1984, who suggested that a public
the encrypted data if one cannot obtain the corresponding key can be an arbitrary string, and then improved by Boneh
encryption key. and Franklin [74] using Weil pairing over elliptic curves.
In IBE, a trusted third party called the Private Key Generator
3) PERMISSIONED OR PERMISSIONLESS BLOCKCHAIN? (PKG), generates a master public-private key pair for each
As we have introduced in Section II, permissioned and identity string. In practice, given a master public key, any
permissionless blockchain primarily differ in their adopted party can compute a public key corresponding to the identity
consensus protocols, which in turn have a great impact on by combining the master public key with the identity string.
throughput, block mining time, access policies, and privacy. To obtain a corresponding private key, the authorized party
Table 3 shows the main difference between the two types. with identity ID needs to contact the PKG, which uses the
Currently, the most concerned performance metric is master private key to generate the private key for identity ID.
throughput. For example, Hyperledger Fabric can process up IBE eliminates the need for a public key distribution
to 10000 transactions per second (TPS), which is much faster infrastructure. It allows any pair of users to communicate
than Ethereum’s 20 TPS and Bitcoin’s 7 TPS. The last two are securely without exchanging private or public keys, which is
insufficient to address the data access events that happen in a ideal for data sharing among a closed group (e.g., within an
real-world healthcare management system. Fortunately, with organization).
the evolution of new consensus protocols and technologies,
the blockchain throughput undoubtedly will increase. For 3) ATTRIBUTE-BASED ENCRYPTION
example, the Casper version of Ethereum (Ethereum 2.0) In many applications, there is the need to share data according
that adopts the Proof of Stake(PoS) consensus and a sharing to a specific policy without prior knowledge of who will be
technique can attain a 8-million TPS throughput [69]. the data receiver. Suppose a patient wants to share his medical
Another challenge of adopting permissionless blockchain records only with a user who has the attribute of ‘‘PHYSI-
for medical data sharing would be cryptocurrency, which CIAN’’ issued by a medical organization and the attribute
is the incentive that makes the behind consensus protocol ‘‘RESEARCHER’’ issued by a clinical research institute.
work. In medical data management, data access happens With attribute-based encryption [75], the patient can define
very frequently. That means a great amount of money (cryp- an access policy (‘‘PHYSICIAN’’ AND ‘‘RESEARCHER’’)
tocurrency) is needed to run the network for healthcare data and encrypt his medical records with this policy, so that only
management. A possible option is to issue an altercoin in the users with attributes matching this policy can decrypt the
system to pay contributors (miners). When a contributor has records.
accumulated a certain amount of altercoin, whose level of Attribute-based encryption is a promising cryptographic
trustworthiness will be promoted and, as a result, the con- technique for access control of encrypted data. Generally,
tributor can get better service in the system. it can be divided into two categories: (a) key-policy

61664 VOLUME 7, 2019


H. Jin et al.: Review of Secure and Privacy-Preserving Medical Data Sharing

attribute-based encryption (KP-ABE) [76] where keys are C. FUTURE RESEARCH WORK
associated with access policies and ciphertext is associated According to the analysis in SectionIV-A, it is clear
with an attributes set; and (b) ciphertext-policy attribute- that medical data should be stored off-chain in encrypted
based encryption (CP-ABE) [76] where keys are associated form due to network throughput and security reasons.
with an attributes set and ciphertext is associated with access Yet the third question—whether to adopt permissioned or
policies. In both schemes, a central authority is required to permissionless blockchain—remains open with no appar-
issue and validate private keys, rendering them unsuitable ent solution. Despite the debate over permissioned and per-
for a distributed environment where data sharing takes place missionless blockchains throughout academia and industry,
across different administrative domains. there is no strong evidence showing that one type can com-
To address the single authority problem in existing pletely substitute the other type. One possible method is
ABE, Multi-Authority Attribute-Based Encryption (MA- that researchers can leverage the advantages of both types
ABE) [77]–[79] schemes are proposed, where no central by constructing a hybrid blockchain architecture as in [57].
authority is needed and collusion resistance is guaranteed. However, this may cause great complexities in the manage-
ment of consensus executions, including block mining, data
4) PROXY RE-ENCRYPTION access control, and data provision.
Proxy re-encryption (PRE), proposed by Blaze et al. [80] Therefore, future research on designing blockchain-based
in 1998 and improved by Ateniese et al. [81], [82] in 2006, approaches for secure medical data sharing can focus on
is a cryptosystem that allows a third party (proxy) to alter a following areas.
ciphertext encrypted by one party so that it can be decrypted 1) Cryptography-Based Access and Privacy Control To
by another authorized party. The basic idea behind it is that ensure the security and privacy required by HIPAA
two parties publish a proxy key that allows a semi-trusted regulations, cryptography needs to be embedded
intermediate proxy to convert ciphertext, which avoids data in the design to enforce strict access control and
decryption and re-encryption at the sender side. Thus, it is privacy preservation. The state-of-the-art schemes
suitable for data sharing across multiple domains where data [30], [47]–[52], [55], [57] in medical data areas rely
owners can leave the task of data re-encryption to a proxy more or less on the adoption of certain cryptographic
(e.g., cloud) after user revocations. primitives to implement authentication, access control,
key management, and privacy protection for medical
5) SEARCH ON ENCRYPTED DATA information.
Searchable symmetric encryption (SSE) [83] can enforce 2) Smart Contract-Driven Business Logic Smart con-
keyword search on outsourced encrypted data, which avoids tracts, as a series of self-executing contractual states
the decryption process and thereby enhances query efficiency without third parties, are the core element to imple-
without the risk of data leakage. Otherwise, data owners menting the business logic of blockchain-based med-
either have to send service providers the keys for data decryp- ical data sharing. By designing smart contracts
tion before executing a query, or download encrypted data specific to certain requirements, the creation of
locally and decrypt it to perform a query. Both approaches are medical records, authorization and revocation of
unacceptable due to security or efficiency reasons. The idea access permissions, and auditing and provenance
behind SSE is to deploy a masked index table as metadata of access behavior can be implemented on the
[84], [85] that facilitates searches on encrypted data. The data blockchain.
owner needs to create an index table based on pre-processed Figure 2 depicts a general architecture for blockchain-
message-keyword pairs. To perform a search, a search token based healthcare data management, where three layers (i.e.,
is provided by the user with which the server searches through health domain layer, blockchain layer, and user layer) are
the index. If a match is found, then the matching encrypted included. A healthcare system residing in an enclosed net-
data is returned to the user. work domain is regarded as a health domain, which usually
has one or more databases to store patients’ medical records
6) DISCUSSION and clinical trials. The blockchain layer is used to connect
As we have pointed out, that relying on blockchain tech- scattered health domains, where smart contracts are responsi-
nology to secure off-chain stored medical data is infeasible. ble for the implementation and execution of the business logic
Hence, a secure healthcare system still needs to employ of cross-institutional data sharing. The user layer consists
appropriate cryptographic primitives to achieve confidential- of patients, doctors and medical researches from different
ity, integrity, access control, and privacy protection. Specifi- healthcare organizations.
cally, for encrypted data, advanced cryptographic primitives Blockchain-based medical information sharing is an ongo-
(e.g., IBE, ABE, PRE) is becoming widely deployed to ing field that requires a vast amount of techniques to coop-
enforce strict and flexible access control of encryption keys. erate with one another in order to achieve HIPAA compliant
Hence, in the near future, it can be expected that cryptography data sharing. In the future, new architectures and security and
will play a more important role in blockchain-based data privacy-related cryptographic primitives may appear and can
sharing. be seamlessly integrated with blockchain. Here, we briefly

VOLUME 7, 2019 61665


H. Jin et al.: Review of Secure and Privacy-Preserving Medical Data Sharing

FIGURE 2. Architecture of blockchain-based medical data sharing.

discuss some challenges in blockchain-based medical data attribute-based encryption) to enforce strict and flexible
sharing that need further investigation and exploration. access control for medical data access. Specifically, they
focus on the construction of access policies with rich seman-
1) QUERY ON SCATTERED MEDICAL DATABASES tics, which, of course, is necessary in access policy cus-
Traditionally, EMR data are organized in SQL-based rela- tomization. However, the differentiation of various EMR
tional databases for storage, and queries are performed within fields in sensitivity is also of critical importance for privacy
an independent administrative domain where the database control. A naive approach is to segment a record into multiple
resides. However, in a blockchain setting where various parts according to sensitivities and encrypt each part with
healthcare institutes are connected through the blockchain, a different key, however, which complicates the task of key
it is inconvenient to make such a SQL query due to data management when the separation is fine-grained. To address
stewardship and network boundaries. this problem, some key derivation mechanisms [24], [26] can
Most existing schemes choose to store encrypted meta- be integrated with access control policies to facilitate key
data (e.g., query strings in [47], [48] and secure indices management.
in [53], [57]) indicating data locations on chain. When a client
wants to perform a global query on all connected databases, 3) COMPATIBILITY OF SECURITY MECHANISMS AMONG
a further challenge would be in how to efficiently perform HEALTHCARE DOMAINS
the query on all independently managed databases simulta- Since each healthcare institute can be regarded as an inde-
neously and get an aggregated query result. This problem pendent domain equipped with its own security and privacy
remains unaddressed in existing schemes. A possible solution mechanism, it is difficult to predict the extent to which these
is to let some partially centralized servers distributed in the mechanisms will be compatible with each other. Furthermore,
network to collect and aggregate parallelly computed queries one should also consider how to address the compatibility
and return the aggregated result to the querier. However, problem caused by different or even contradictory data pri-
strong security and recover mechanisms may need to be care- vacy laws of various states or nations.
fully deployed on these servers to protect them from denial-
of-service attacks. 4) SOFTWARE-DEFINED NETWORKING IS NEEDED TO
FACILITATE DOMAIN MANAGEMENT
2) FINER-GRAINED ACCESS AND PRIVACY CONTROL The SDN controller provides a central point of control to
Currently, there are some methods [30], [48] that distribute policy information. However, centralized control
have adopted advanced cryptographic primitives (e.g., by one entity has the disadvantage of creating a central point

61666 VOLUME 7, 2019


H. Jin et al.: Review of Secure and Privacy-Preserving Medical Data Sharing

of attack. Moreover, the programmability associated with the [11] J. Sousa, E. Alchieri, and A. Bessani, ‘‘State machine replication for
SDN platform adds security risks. Therefore, properly and the masses with BFT-SMaRt,’’ in Proc. 44th Annu. IEEE/IFIP Int. Conf.
Dependable Syst. Netw., Atlanta, GA, USA, 2014, pp. 355–362.
securely implementing a SDN controller to cooperate with [12] J. Sousa, A. Bessani, and M. Vukolic, ‘‘A byzantine fault-tolerant ordering
blockchain and facilitate the management and collaboration service for the hyperledger fabric blockchain platform,’’ in Proc. 48th
among various healthcare domains is of great importance. Annual IEEE/IFIP Int. Conf. Dependable Syst. Netw. (DSN), Jun. 2018,
pp. 51–58.
This should simplify the management of existing legacy [13] M. Castro and B. Liskov, ‘‘Practical Byzantine fault tolerance,’’ in Proc.
healthcare systems to let them be easily added to the new OSDI, vol. 99. 1999, pp. 173–186.
blockchain-based architectures. [14] M. Castro and B. Liskov, ‘‘Practical byzantine fault tolerance and
proactive recovery,’’ ACM Trans. Comput. Syst., vol. 20, no. 4,
pp. 398–461, 2002.
V. CONCLUSION [15] G. Wood, ‘‘Ethereum: A secure decentralised generalised transaction
Medical information sharing without violating security and ledger. Ethereum project yellow paper 151 (2014),’’ Tech. Rep., 2014.
[16] Ethereum. (2016). Solidity Programming Documentation. [Online]. Avail-
privacy regulations has long been a challenging topic. This
able: https://solidity.readthedocs.io/
paper reviews related solutions in this area, including cloud- [17] R. Raja. (2017). Chaincode on the Go—Smart Contracts on the
based approaches, blockchain-based approaches, and SDN- Hyperledger Fabric Blockchain. [Online]. Available: http://medium.
based approaches. We observed that security and privacy com/coinmonks/chaincde-on-the-go-smart-contracts-on-the-hyperledger-
fabric-blockchain-82dd61b3c669
protection of medical information covers confidentiality, [18] K. M. Khan and Q. Malluhi, ‘‘Establishing trust in cloud computing,’’ IT
integrity, and authenticity of data in transit and at rest, access Prof., vol. 12, no. 5, pp. 20–27, 2010.
and privacy control, etc. Therefore, a practical approach for [19] K. Ren, C. Wang, and Q. Wang, ‘‘Security challenges for the public cloud,’’
IEEE Internet Comput., vol. 16, no. 1, pp. 69–73, Jan./Feb. 2012.
medical data sharing may need to integrate many different [20] S. Nepal, R. Ranjan, and K.-K. R. Choo, ‘‘Trustworthy processing of
techniques to achieve its design goals. healthcare big data in hybrid clouds,’’ IEEE Trans. Cloud Comput., vol. 2,
As a new computing paradigm, blockchain has its advan- no. 2, pp. 78–84, Mar./Apr. 2015.
[21] T. Dillon, C. Wu, and E. Chang, ‘‘Cloud computing: Issues and chal-
tages over traditional technologies. However, as we have lenges,’’ in Proc. 24th IEEE Int. Conf. Adv. Inf. Netw. Appl. (AINA),
analyzed in this paper, it is important to choose the right type Apr. 2010, pp. 27–33.
of blockchain (permissioned or permissionless) for medical [22] (2018). Architecting for HIPAA Security and Compliance on Amazon
Web Services. [Online]. Available: https://d1.awsstatic.com/whitepapers/
data sharing. Moreover, there are still some problems call-
compliance/AWS-HIPAA-Compliance-Whitepaper.pdf
ing for further investigation and exploration in blockchain- [23] E.-J. Goh, H. Shacham, N. Modadugu, and D. Boneh, ‘‘Sirius: Securing
based medical data management. We shed a light on these remote untrusted storage,’’ in Proc. NDSS, vol. 3, 2003, pp. 131–145.
challenges by pointing out potential research directions and [24] R. A. Popa, J. R. Lorch, D. Molnar, H. J. Wang, and L. Zhuang, ‘‘Enabling
security in cloud storage slas with cloudproof,’’ in Proc. USENIX Annu.
methodologies that may further secure and facilitate the shar- Tech. Conf., vol. 242, 2011, pp. 355–368.
ing of healthcare information. [25] A. Kumbhare, Y. Simmhan, and V. Prasanna, ‘‘Cryptonite: A secure and
performant data repository on public clouds,’’ in Proc. IEEE 5th Int. Conf.
Cloud Comput. (CLOUD), Jun. 2012, pp. 510–517.
ACKNOWLEDGMENT [26] H. Jin, K. Zhou, H. Jiang, D. Lei, R. Wei, and C. Li, ‘‘Full integrity
The authors would like to thank the anonymous referees for and freshness for cloud data,’’ Future Gener. Comput. Syst., vol. 80,
their reviews and insightful suggestions to improve this paper. pp. 640–652, Mar. 2018.
[27] M. Li, S. Yu, Y. Zheng, K. Ren, and W. Lou, ‘‘Scalable and secure
sharing of personal health records in cloud computing using attribute-
REFERENCES based encryption,’’ IEEE Trans. Parallel Distrib. Syst., vol. 24, no. 1,
[1] Data Leakage Events. Accessed: Mar. 2, 2019. [Online]. Available: pp. 131–143, Jan. 2013.
https://informationisbeautiful.net/visualizations/worlds-biggest-data-brea [28] J. Liu, X. Huang, and J. K. Liu, ‘‘Secure sharing of personal health
ches-hacks/ records in cloud computing: Ciphertext-policy attribute-based signcryp-
[2] (2018). Healthcare Industry Ranks 8th for Cybersecurity but Poor tion,’’ Future Generat. Comput. Syst., vol. 52, pp. 67–76, Nov. 2015.
DNS Health and Endpoint Security of Concern. [Online]. Available: [29] B. Fabian, T. Ermakova, and P. Junghanns, ‘‘Collaborative and secure
https://www.hipaajournal.com/healthcare-data-breach-statistics/ sharing of healthcare data in multi-clouds,’’ Inf. Syst., vol. 48, pp. 132–150,
[3] (2017). Summary of the HIPAA Security Rule. [Online]. Available: Mar. 2015.
https://www.hhs.gov/hipaa/for-professionals/security/laws-regulations/ [30] R. Guo, H. Shi, Q. Zhao, and D. Zheng, ‘‘Secure attribute-based signa-
[4] (2016). General Data Protection Regulation. [Online]. Available: ture scheme with multiple authorities for blockchain in electronic health
https://eugdpr.org/the-regulation/ records systems,’’ IEEE Access, vol. 6, pp. 11676–11686, 2018.
[5] W. Raghupathi and V. Raghupathi, ‘‘Big data analytics in healthcare: [31] S. Narayan and M. Gagné, and R. Safavi-Naini, ‘‘Privacy preserving
Promise and potential,’’ Health Inf. Sci. Syst., vol. 2, no. 1, p. 3, 2014. EHR system using attribute-based infrastructure,’’ in Proc. ACM Workshop
[6] C. Esposito, A. De Santis, G. Tortora, H. Chang, and K.-K. R. Choo, Cloud Comput. Secur. Workshop. New York, NY, USA: ACM, 2010,
‘‘Blockchain: A panacea for healthcare cloud-based data security and pp. 47–52.
privacy?’’ IEEE Cloud Comput., vol. 5, no. 1, pp. 31–37, Jan./Feb. 2018. [32] M. Barua, X. Liang, R. Lu, and X. Shen, ‘‘ESPAC: Enabling security and
[7] S. Nakamoto. (2009). Bitcoin: A Peer-to-Peer Electronic Cash system. patient-centric access control for ehealth in cloud computing,’’ Int. J. Secur.
[Online]. Available: http://bitcoin.org/bitcoin.pdf Netw., vol. 6, nos. 2–3, pp. 67–76, 2011.
[8] A. Gervais, G. O. Karame, K. Wüst, V. Glykantzis, H. Ritzdorf, and [33] L. Chen and D. B. Hoang, ‘‘Novel data protection model in health-
S. Capkun, ‘‘On the security and performance of proof of work care cloud,’’ in Proc. IEEE Int. Conf. High Perform. Comput. Commun.,
blockchains,’’ in Proc. ACM SIGSAC Conf. Comput. Commun. Secur. Sep. 2011, pp. 550–555.
New York, NY, USA: ACM, 2016, pp. 3–16. [34] M. Terrovitis, N. Mamoulis, and P. Kalnis, ‘‘Privacy-preserving
[9] Ethereum. (2014). Proof of Stake FAQ. [Online]. Available: https://github. anonymization of set-valued data,’’ Proc. Very Large Data Bases
com/ethereum/wiki/wiki/Proof-of-Stake-FAQ Endowment, vol. 1, no. 1, pp. 115–125, 2008.
[10] C. Cachin, ‘‘Architecture of the hyperledger blockchain fabric,’’ in Proc. [35] Y. Xu, K. Wang, A. W.-C. Fu, and P. S. Yu, ‘‘Anonymizing transaction
Workshop Distrib. Cryptocurrencies Consensus Ledgers, vol. 310, 2016, databases for publication,’’ in Proc. 14th ACM SIGKDD Int. Conf. Knowl.
pp. 1–4. Discovery Data Mining. New York, NY, USA: ACM, 2008, pp. 767–775.

VOLUME 7, 2019 61667


H. Jin et al.: Review of Secure and Privacy-Preserving Medical Data Sharing

[36] T. Li, N. Li, J. Zhang, and I. Molloy, ‘‘Slicing: A new approach for privacy [60] T.-T. Kuo and L. Ohno-Machado. (2018). ‘‘Modelchain: Decentral-
preserving data publishing,’’ IEEE Trans. Knowl. Data Eng., vol. 24, no. 3, ized privacy-preserving healthcare predictive modeling framework on
pp. 561–574, Mar. 2012. private blockchain networks.’’ [Online]. Available: https://arxiv.org/
[37] B. Zhou, J. Pei, and W. Luk, ‘‘A brief survey on anonymization abs/1802.01746
techniques for privacy preserving publishing of social network data,’’ [61] L. Wu, Y. Zhang, Y. Xie, A. Alelaiw, and J. Shen, ‘‘An efficient and
ACM SIGKDD Explorations Newslett., vol. 10, no. 2, pp. 12–22, secure identity-based authentication and key agreement protocol with user
Dec. 2008. anonymity for mobile devices,’’ Wireless Pers. Commun., vol. 94, no. 4,
[38] I. J. Vergara-Laurens, L. G. Jaimes, and M. A. Labrador, ‘‘Privacy- pp. 3371–3387, 2017.
preserving mechanisms for crowdsensing: Survey and research chal- [62] S. Wang et al., ‘‘Blockchain-powered parallel healthcare systems based
lenges,’’ IEEE Internet Things J., vol. 4, no. 4, pp. 855–869, Aug. 2017. on the ACP approach,’’ IEEE Trans. Comput. Social Syst., vol. 5, no. 4,
[39] C. Dwork ‘‘Differential privacy,’’ in Automata, Languages and pp. 942–950, Dec. 2018.
Programming—ICALP (Lecture Notes in Computer Science), vol 4052, [63] D. Boneh, G. Di Crescenzo, R. Ostrovsky, and G. Persiano, ‘‘Public
M. Bugliesi, B. Preneel, V. Sassone, and I. Wegener, Eds. Berlin, Germany: key encryption with keyword search,’’ in Proc. Int. Conf. Theory Appl.
Springer, 2006. Cryptograph. Techn. Interlaken, Switzerland: Springer, 2004, pp. 506–522.
[40] J. Soria-Comas, J. Domingo-Ferrer, and D. Sánchez, and S. Martínez, [64] T. Hardjono and A. Pentland. (2019). ‘‘Verifiable anonymous identities
‘‘Enhancing data utility in differential privacy via microaggregation-based and access control in permissioned blockchains.’’ [Online]. Available:
k-anonymity,’’ VLDB J. Int. J. Very Large Data Bases, vol. 23, no. 5, https://arxiv.org/abs/1903.04584
pp. 771–794, 2014. [65] P. Li, C. Xu, Y. Luo, Y. Cao, J. Mathew, and Y. Ma, ‘‘CareNet:
[41] D. Sánchez, J. Domingo-Ferrer, S. Martínez, and J. Soria-Comas, ‘‘Utility- Building regulation-compliant home-based healthcare services with
preserving differentially private data releases via individual ranking software-defined infrastructure,’’ in Proc. IEEE/ACM Int. Conf.
microaggregation,’’ Inf. Fusion, vol. 30, pp. 1–14, Jul. 2016. Connected Health, Appl., Syst. Eng. Technol. (CHASE), Jul. 2017,
[42] J. Gardner and L. Xiong, ‘‘Hide: An integrated system for health infor- pp. 373–382.
mation DE-identification,’’ in Proc. 21st IEEE Int. Symp. Comput.-Based [66] P. Li, C. Xu, Y. Luo, Y. Cao, J. Mathew, and Y. Ma, ‘‘CareNet:
Med. Syst., Jun. 2008, pp. 254–259. Building a secure software-defined infrastructure for home-based health-
[43] K. El Emam et al., ‘‘A globally optimal k-anonymity method for the de- care,’’ in Proc. ACM Int. Workshop Secur. Softw. Defined Netw.
identification of health data,’’ J. Amer. Med. Inf. Assoc., vol. 16, no. 5, Netw. Function Virtualization. New York, NY, USA: ACM, 2017,
pp. 670–682, 2009. pp. 69–72.
[44] P. Belsis and G. Pantziou, ‘‘A k-anonymity privacy-preserving approach [67] L. Hu et al., ‘‘Software defined healthcare networks,’’ IEEE Wireless
in wireless medical monitoring environments,’’ Pers. Ubiquitous Comput., Commun. Mag., vol. 22, no. 6, pp. 67–75, Jun. 2015.
vol. 18, no. 1, pp. 61–74, 2014. [68] (2018). Survey: 90 Percent of Healthcare Organizations use
[45] G. Loukides, J. Liagouris, A. Gkoulalas-Divanis, and M. Terrovitis, ‘‘Dis- or Plan to use Mobile Devices. [Online]. Available: https://
association for electronic health record privacy,’’ J. Biomed. Informat., www.mobihealthnews.com/content/survey-90-percent-healthcare-organiz
vol. 50, pp. 46–61, Aug. 2014. ations-use-or-plan-use-mobile-devices
[46] M. Wang, Z. Jiang, Y. Zhang, and H. Yang, ‘‘T-closeness slicing: A new [69] Ethereum 2.0 Phase 0–The Beacon Chain. Accessed: Mar. 20, 2019.
privacy-preserving approach for transactional data publishing,’’ INFORMS [Online]. Available: https://github.com/ethereum/eth2.0-specs/blob/
J. Comput., vol. 30, no. 3, pp. 438–453, 2018. master/specs/core/0_beacon-chain.md#introduction
[47] A. Azaria, A. Ekblaw, T. Vieira, and A. Lippman, ‘‘Medrec: Using [70] A. Fiat and M. Naor, ‘‘Broadcast encryption,’’ in Proc. Annu. Int. Cryptol.
blockchain for medical data access and permission management,’’ in Proc. Conf. Santa Barbara, CA, USA: Springer, 1993, pp. 480–491.
2nd Int. Conf. Open Big Data (OBD), Aug. 2016, pp. 25–30. [71] J. A. Garay, J. Staddon, and A. Wool, ‘‘Long-lived broadcast encryption,’’
[48] H. Yang and B. Yang, ‘‘A blockchain-based approach to the secure sharing in Proc. Annu. Int. Cryptol. Conf. Santa Barbara, CA, USA: Springer, 2000,
of healthcare data,’’ in Proc. Norwegian Inf. Secur. Conf., 2017, pp. 1–12. pp. 333–352.
[49] X. Yue, H. Wang, D. Jin, M. Li, and W. Jiang, ‘‘Healthcare data gateways: [72] D. Boneh, C. Gentry, and B. Waters, ‘‘Collusion resistant broadcast encryp-
Found healthcare intelligence on blockchain with novel privacy risk con- tion with short ciphertexts and private keys,’’ in Proc. Annu. Int. Cryptol.
trol,’’ J. Med. Syst., vol. 40, no. 10, pp. 218–225, 2016. Conf. Santa Barbara, CA, USA: Springer, 2005, pp. 258–275.
[50] Q. Xia, E. B. Sifah, A. Smahi, S. Amofa, and X. Zhang, ‘‘BBDS: [73] A. Shamir, ‘‘Identity-based cryptosystems and signature schemes,’’ in
Blockchain-based data sharing for electronic medical records in cloud Proc. Workshop Theory Appl. Cryptograph. Techn. Paris, France: Springer,
environments,’’ Information, vol. 8, no. 2, p. 44, 2017. 1984, pp. 47–53.
[51] Q. Xia, E. B. Sifah, K. O. Asamoah, J. Gao, X. Du, and M. Guizani, ‘‘MeD- [74] D. Boneh and M. Franklin, ‘‘Identity-based encryption from the weil
Share: Trust-less medical data sharing among cloud service providers via pairing,’’ in Proc. Annu. Int. Cryptol. Conf. Santa Barbara, CA, USA:
blockchain,’’ IEEE Access, vol. 5, pp. 14757–14767, 2017. Springer, 2001, pp. 213–229.
[52] K. Fan, S. Wang, Y. Ren, H. Li, and Y. Yang, ‘‘MedBlock: Efficient and [75] A. Sahai and B. Waters, ‘‘Fuzzy identity-based encryption,’’ in Proc. Annu.
secure medical data sharing via blockchain,’’ J. Med. Syst., vol. 42, no. 8, Int. Conf. Theory Appl. Cryptograph. Techn. Aarhus, Denmark: Springer,
pp. 136–146, 2018. 2005, pp. 457–473.
[53] K. Peterson, R. Deeduvanu, P. Kanjamala, and K. Boles, ‘‘A blockchain- [76] V. Goyal, O. Pandey, A. Sahai, and B. Waters, ‘‘Attribute-based encryp-
based approach to health information exchange networks,’’ in Proc. NIST tion for fine-grained access control of encrypted data,’’ in Proc. 13th
Workshop Blockchain Healthcare, vol. 1, 2016, pp. 1–10. ACM Conf. Comput. Commun. Secur. New York, NY, USA: ACM, 2006,
[54] J. Zhang, N. Xue, and X. Huang, ‘‘A secure system for pervasive social pp. 89–98.
network-based healthcare,’’ IEEE Access, vol. 4, pp. 9239–9250, 2016. [77] M. Chase, ‘‘Multi-authority attribute based encryption,’’ in Proc. The-
[55] X. Liang, J. Zhao, S. Shetty, J. Liu, and D. Li, ‘‘Integrating blockchain ory Cryptogr. Conf. Amsterdam, The Netherlands: Springer, 2007,
for data sharing and collaboration in mobile healthcare applications,’’ pp. 515–534.
in Proc. IEEE 28th Annu. Int. Symp. Pers., Indoor, Mobile Radio Com- [78] M. Chase and S. S. Chow, ‘‘Improving privacy and security in
mun. (PIMRC), Oct. 2017, pp. 1–5. multi-authority attribute-based encryption,’’ in Proc. 16th ACM
[56] C. McFarlane, M. Beer, J. Brown, and N. Prendergast, Patientory: Conf. Comput. Commun. Secur. New York, NY, USA: ACM, 2009,
A Healthcare Peer-to-Peer EMR Storage Network v1. Addison, TX, USA: pp. 121–130.
Entrust, 2017. [79] A. Lewko and B. Waters, ‘‘Decentralizing attribute-based encryption,’’ in
[57] A. Zhang and X. Lin, ‘‘Towards secure and privacy-preserving data sharing Proc. Annu. Int. Conf. Theory Appl. Cryptograph. Techn. Tallinn, Estonia:
in e-health systems via consortium blockchain,’’ J. Med. Syst., vol. 42, Springer, 2011, pp. 568–588.
no. 8, p. 140, 2018. [80] M. Blaze, G. Bleumer, and M. Strauss, ‘‘Divertible protocols and atomic
[58] G. Zyskind, O. Nathan, and A. S. Pentland, ‘‘Decentralizing privacy: proxy cryptography,’’ in Proc. Int. Conf. Theory Appl. Cryptograph. Techn.
Using blockchain to protect personal data,’’ in Proc. IEEE Secur. Privacy Espoo, Finland: Springer, 1998, pp. 127–144.
Workshops (SPW), May 2015, pp. 180–184. [81] G. Ateniese, K. Fu, M. Green, and S. Hohenberger, ‘‘Improved
[59] H. Zhao, Y. Zhang, Y. Peng, and R. Xu, ‘‘Lightweight backup and efficient proxy re-encryption schemes with applications to secure distributed
recovery scheme for health blockchain keys,’’ in Proc. IEEE 13th Int. storage,’’ ACM Trans. Inf. Syst. Secur., vol. 9, no. 1, pp. 1–30,
Symp. Auton. Decentralized Syst. (ISADS), Mar. 2017, pp. 229–234. 2006.

61668 VOLUME 7, 2019


H. Jin et al.: Review of Secure and Privacy-Preserving Medical Data Sharing

[82] M. Green and G. Ateniese, ‘‘Identity-based proxy re-encryption,’’ in Proc. PEILONG LI received the Ph.D. degree in com-
Int. Conf. Appl. Cryptogr. Netw. Secur. Berlin, Germany: Springer, 2007. puter engineering from the University of Mas-
[83] G. S. Poh, J.-J. Chin, W.-C. Yau, K.-K. R. Choo, and M. S. Mohamad, sachusetts at Lowell, in 2016, where he is currently
‘‘Searchable Symmetric Encryption: Designs and Challenges,’’ ACM Com- a Research Assistant Professor with the Depart-
put. Surv., vol. 50, no. 3, pp. 40:1–40:37, 2017. ment of Electrical and Computer Engineering. His
[84] D. X. Song, D. Wagner, and A. Perrig, ‘‘Practical techniques for searches research interests include heterogeneous and par-
on encrypted data,’’ in Proc. IEEE Symp. Secur. Privacy, May 2000, allel computer architecture, big data analytics with
pp. 44–55.
distributed computing framework, and data plane
[85] J. Li, Q. Wang, C. Wang, N. Cao, K. Ren, and W. Lou, ‘‘Fuzzy keyword
innovation in software-defined networking.
search over encrypted data in cloud computing,’’ in Proc. IEEE INFO-
COM, Mar. 2010, pp. 1–5.

HAO JIN received the Ph.D. degree in computer


science and technology from the Huazhong Uni-
versity of Science and Technology, China, in 2016.
He is currently a Research Associate with the
Department of Electrical and Computer Engineer-
ing, University of Massachusetts at Lowell. His
research interests include cloud data security, dig-
ital forensics, auditing and accountability, and
blockchain technology.

JOMOL MATHEW received the Ph.D. degree


in plant and soil sciences from the Univer-
sity of Massachusetts at Amherst. She is cur-
YAN LUO received the Ph.D. degree in com- rently the Chief Research Informatics Officer at
puter science from the University of California the University of Massachusetts Medical School
at Riverside, in 2005. He is currently a Professor (UMMS). She oversees Data Science and Tech-
with the Department of Electrical and Computer nology, which includes clinical and research data
Engineering, University of Massachusetts at Low- integration, analytics, data visualization, and high-
ell. While his research interest spans broadly in performance computing at the UMMS. She also
computer architecture and network systems, his directs the Clinical Informatics Component of the
current researches focus on heterogeneous archi- Clinical and Translational Science Center (UMCCT) at UMMS. In 2017, she
tecture and systems, software-defined network- co-founded D3Health, a Center for Improving Patient and Population Health
ing, and deep learning. He and his team aim to through the Integration of Advanced Digital Technologies, Analytics, and
design novel architecture and systems to facilitate programmable network- Decision Support.
ing, deeply embedded sensing, and healthcare applications.

VOLUME 7, 2019 61669

You might also like